How to calculate evidence

Return to Sumio Watanabe Homepage

Experiment and Theory

In order to calculate the evidence or the stochastic complexity, we can apply the Exchange Monte Calro method. Dr. Nagata proposed that it works very well in singular learning machines.

Kenji Nagata, Sumio Watanabe, "The Exchange Monte Carlo Method for Bayesian Learning in Singular Learning Machines," Proc. of WCCI2006, 2006. Paper (PDF in Dr. Nagata's page).

Recently he proved that the log canonical threshold \lambda ( -\lambda is the largest pole of the zeta function) determines the exchange ratio in the exchange Monte Calro method. PPT presented by Dr. Nagata . The following paper was identified by the Best Student Paper Award in FOCI IEEE 2007. He also proved that the optimal sequence of temperatures \beta is given by the geometrical progession.

Kenji Nagata, Sumio Watanabe, "Analysis of Exchange Ratio for Exchange Monte Carlo Method," Proc. of FOCI2007, 2007. Paper (PDF file in Dr. Nagata's page)

By comparing the experimental results and the theoretical results, we can estimate the true probability distribution from which the training samples are taken. This process is called statistical inference or statistical learning.

(Remark to Statistical Physists)--------------
We know very well that the exchange Monte Carlo has been analyzed very well in statistical physics. However, its important property in machine learning was not sufficiently studied. In machine learning, especially in singular learning machines, the ground state of the a posteriori distribution is the set of zeros of a random Hamiltonian, whose expectation is equal to the analytic set with singularities or an algebraic variety in a finite dimensional affine space. Such a probability distribution is quite different from the equilibrium state of any physical phenomenon. In statistical physics, thermodynamical limit is important, whereas, in machine learning, the zero temperature limit is important. Note that, even in the zero temperature limit, the a posteriori distribution does not converge to one point. It converges to an analytic set with singularities or an algebraic variety. Therefore the partition function can not be obtained either by gaussian approximation (Laplace approximation) or by the mean field approximation. The state density function has a quite different form from statistical physics. The asymptotic behavior of the free energy was firstly clarified by algebraic analysis and algebraic geometry, resolution of singularities and Bernstein-Sato's b-function.

We know that statistical physics has a lot of applications in machine learning. However, if you are a statistical physist, you had better know that they are not simple applications of statistical physics. In machine learning, there are different purpose and different Hamiltonian. Echange Monte Calro in machine learning has different property from that in statistical physics.

(Remark to Theoretical Physists) --------------
If you are a theoretical physist, you can find the mathematical relation between renormalization in Hamiltonian and Bernstein-Sato's b-function. For a given analytic function H(w) of a finite dimensional w, there exists a polynomial b(z) such that

P(w,z) H(w)^{z+1}=b(z)H(w)^{z}

for an arbitrary w and z. (Here z is a one-complex variable and P(w,z) is a diffenetial operator of w and polynomial of z). This formula enables us to make analytic continuation of the zeta function

\zeta(z)=\int H(w)^{z}dw

where dw is a probability distirbution with C-infinite form. If you are a theoretical physist, you immediately derive that the state density function

v(t)=\int \delta(t-H(w))dw

has an asymptotic expansion when t goes to zero by using the inverse Mellin transform.