Mean Field Approximation

Return to Sumio Watanabe Homepage

Let the set of paramaters of a learning machine is a d-dimensional Euclidean space. In learning theory, the Bayes a posteriori distribution p(w1,w2,...,wd) is approximated by the independent distribution r1(w1)r2(w2)...rd(wd).

p(w1,w2,...,wd) \approx r1(w1)r2(w2)...rd(wd)

The probability distributions r1, r2, ..., rd are optimized so that they minimze the Kullback information or Relative Entropy

D(r1 r2 ... rd||p).

Mathematically speaking, this procedure is called a variational method, which is equivalent to the mean field approximation in mathematical physics. This method is called the vaiational Bayes in learning theory. It is easy to see that the Kullback information D is equal to the difference between the free energy and the approximated free energy. The free energy is called the marginal likelihood or the stochastic complexity in learning theory. We have found the difference between two free energies in singular learning machines, for example, normal mixtures, linear neural networks, and hidden Markov models. These result shows how precise the mean field approxmation is in learning theory.

Mean Field Approximation

As is well known, the incease of the free energy for the number of training samples is equal to the generalization error in the true Bayes estimation. However, in the variational Bayes, they are quite diffenent. In order to derive the learning curve of the mean field approximation, we are now searching the new method. You can find our results in the following papers.

Variational Bayes

If you are looking for rigorous mathematical foundation of statistical physics, the following book is appropriate. This book teachs us that the thermodynamic formalism can be extended to infinite dimensional probability distribution.

David Ruelle, ``Thermodynamic Formalism," Addison Wesley, Massachusetts,1978.

We are now studying how precise the mean field approximation is in singular learning machines. The difference of the true and the approximation is equal to the difference of two free energies. For some singular learning machines, we obtained mathematical results. However,the difference of the Bayes predictive distribution and the variational Bayes predictive distribution is still an open probelm except for Nakajima's paper.

K.Watanabe, S.Watanabe, ``Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation," Journal of Machine Learning Research, Vol.7, (Apr),pp. 625--644, 2006.

S. Nakajima, S. Watanabe, ``Variational Bayes Solution of Linear Neural Networks and its Generalization Performance," Neural Computation, Vol.19, No.4, pp.1112-1153, 2007.

K.Watanabe, S.Watanabe,``Stochastic complexities of general mixture models in variational Bayesian leaning," Neural Networks, Vol.20, No.2, March, pp.210-219, 2007.

K.Watanabe, S.Watanabe, ``Stochastic Complexity for Mixture of Exponential Families in Generalized Variational Bayes," Theoretical Computer Science, to appear, (Invited Paper).

T. Hosino, K. Watanabe, S.Watanabe, ``"Stochastic Complexity of Variational Bayesian Hidden Markov Models, " Proc. of IJCNN, CD-ROM, (Motreal, Canada), USA, 2005.