Context - The goal of this blog is to understand the nitty gritty of the original “Boltzmann Machine” [1]. I will call it “BM” throughout. Style of this blog is best suited for the ones who wants to start doing research but it does touches advanced aspects of Boltzmann machine, so it will be helpful for some experienced people too. This is my first ever blog. I chose BM as a topic for the blog because first of all it is the “generative” model, if you don’t know how important generative model are for AI then check this small commentary by Max Welling. Secondly, it is one of the “early” model, hence it is easy to understand as it is built on basic concepts as compared to the current generative models like VAE or Diffusion. It is still non-trivial for the beginners but easy to get started, so understanding the nitty gritty of BM will boost their confidence.

Introduction

BM is made by keeping in mind the model which can solve “general” problems which typically require (human) intelligence. We refer to such model as AI model. Lets get some feeling about how BM is expected to work (given enough network size). BM tries to “understand” the environment it lies in (here I am humanizing it as it makes it easy to feel it). For example, if BM as a person stands in a place where it sees lot of dogs around. Humans learns about dogs (or any thing) by not memorizing each individual dogs. By doing so, if a human sees a new breed of dog, he won’t recognize it as a dog at all. Instead, humans captures the regularity among all the dogs. It learns the set of features which are common to all the dogs and defines what dogs are. These features could be body shape, size, color, face shape, ears shape etc. Similarly, BM will learn the defining set of features of the dogs.

This is what it can do, once it has “understand” the dog environment. It can generate the images (think of this as if our humanized model is seeing the mental visuals, imagining) of dogs with the same probability as it was in the environment. If population of Labrador breed is more in the environment then it generates the images of Labrador with the same high probability. And probability of imagining other animals is zero. Such kind of imagination is when the model is “free”. But when model is given some particular image of dog then it will try to explain it by calculating the features values. It will return body shape, size, color, face shape etc. of the dog in the image. When it is given partial image of the dog, it will first interpret the given partial image based on this interpretation it will generate the rest of the image. If along with the model “seeing” the dogs, it is also provided with the breed name then when partial input with only dog image is given then it will complete the input by generating the breed name. Just like how it has completed the partial image.

Hinton in one of the interview [Talking Nets_chap.16] said this after he figured out the learning rule for BM:

That must be how the brain works.

But after actually trying it, it didn’t worked as expected. Theoretically BM can do what mentioned in the above para. But the learning algorithm is very very slow.

But we are here to understand the elegant theory behind it.

[bmtr] goes even far to discuss plausibility of one-shot learning in BM in 7th section.

[lecunn in here]

This proposal is important to the whole ML community as it was the first introduction of hidden units, i.e. neurons whose inputs and outputs were unobserved.

History and Inspiration

Explain the terminology used in the paper.