Singularities and Statistics


Written by Sumio Watanabe, Tokyo Institute of Technology.

(A) Singularities in Layered Models

The fact that the a posteriori distribution does not converge to the normal distribution is shown in the following reports. The algebraic geometrical method was not yet used. The Concrete calculation shows that the problem is quite complicated if we do not apply algebraic methods.

(A1-First Report) S.Watanabe, ``A generalized Bayesian framework for neural networks with singular Fisher information matrices," Proc. of International Symposium on Nonlinear Theory and Its applications, (Las Vegas), pp.207-210, 1995.

(A2-Article) S.Watanabe, "On the generalization error by a layered statistical model with Bayesian estimation," IEICE Trans., Vol.J81-A, No.10, pp.1442-1452, 1998.
This article was translated into English : Electronics and Communications in Japan,(2000) pp.95-104.

(A3-Generalized results for general three-layer neural networks) S. Watanabe, "Learning efficiency of redundant neural networks in Bayesian estimation," IEEE Transactions on Neural Networks, Vol.12, No.6, 1475-1486,2001.
Errata: IEEE Transactions on Neural Networks, Vol.13, No.1,pp.254, 2002.

(B) Zeta function of Kullback information

The general formula which enables us to calculate the asymptotic behavior of the Bayes marginal likelihood was obtained in the mathematically rigourous way. The zeta function of Kullback information plays the cetral role. The algebraic geometrical method was firstly applied to statistics.

(B1-Frist Report) S.Watanabe,"Algebraic analysis for singular statistical estimation," Lecture Notes in Computer Sciences, Vol.1720, pp.39-50, 1999.

(B2-Article, Communicate by Professor D.Mumford) S.Watanabe,"Algebraic Analysis for Non-identifiable Learning Machines," Neural Computation, Vol.13, No.4, pp.899-933, 2001.

(C) Singularities and Jeffreys' Prior

The asymptotic behavoir of Bayes marginal likelihood is clarified when the Jeffreys' prior is applied.

(C1-First Report) S.Watanabe,"Algebraic information geometry for learning machines with singularities", Advances in Neural Information Processing Systems,(Denver, USA), pp.329-336. 2001.

(C2-Article) S. Watanabe, "Algebraic geometry of learning machines with singularities and their prior distributions," Journal of Japanese Society of Artificial Intelligence, Vol.16, No.2, pp.308-315, March, 2001 (Invited Paper).

(C3-Experimental Results comparing Jeffreys' prior with the Uniform prior) K.Nishiue, S.Watanabe,"Effects of priors in model selection of learning machines with singularities," To appear in IEICE Trans., D-II.

(D) The case when the true is not contained

Even if the true distribution is not contained in a finite learning machine, the singularities strongly affect learning because the variance of the singularities is far smaller than the ordinary point whent the Bayes estimation is applied.

(D-1, Article) S. Watanabe, "Algebraic geometrical methods for hierarchical learning machines," Neural Networks, Vol.14, No.8,pp.1049-1060, 2001.

(E) Volume-Dimension of Singularities

The pole of the zeta function of the Kullback information is equal to the volume-dimension of the singularities.

(E-1, First Report) Keisuke Yamazaki and Sumio Watanabe, ``Resolution of singularities in mixture models and the upper bounds of the stochastic complexity." Proc. of International conference on Neural Information Processing, CD-ROM, 2002.

(E-2, Article) K.Yamazaki, S.Watanabe,"A probabilistic algorithm to calculate the learning curves of hierarchical learning machines with singularities," Trans. on IEICE, Vol.J85-D-II,No.3,pp.363-372,Mar. 2002.
This article will be translated into English.

(F) The Effect of Singularities

In order to clarify the effect of singularities, we proposed a new scaling method in which the distance between the singularities and the true distirbution is equal to C/n, where C is the controlling parameter and n is the number of empirical samples. Very interesting behaviors of the training and generalization errors were found.

(F-1,First Report) S. Watanabe, S-I. Amari,"The effect of singularities when the true parameter do not lie on such singularities," NIPS*2002, Vacouver, canada, 2002.

(F-2, Article) S.Watanabe, S.-I.Amari,"Learning coefficients of layered models when the true distribution mismatches the singularities", to appear in Neural Computation.

(G) Mixture Models

The asymptotic behavior of the marigianl likelihood of the mixture models of an arbitrary probability distribution was clarified, based on the algebraic geometrical method.

(G-1, First Report) Keisuke Yamazaki and Sumio Watanabe, ``Resolution of singularities in mixture models and the upper bounds of the stochastic complexity." Proc. of International conference on Neural Information Processing, CD-ROM, 2002.

(G-2, Article) K.Yamazaki, S.Watanabe "Singularities in mixture models and upper bounds of stochastic complexity," to appear in International Journal of Neural Networks.

(H) Reduced Rank Regresssion

The asymptotic behavior of the marginal likelihood of the reduced rank regression was clarified, based on the algebraic geometrical method. The paper (H-2) gives the complete desingularization of reduced rank regression.

(H-1, Article) K.Watanabe, S.Watanabe,"Upper bounds of Bayes generalization errors in reduced rank regression," To appear in IEICE Trans.,A.

(H-2, Article) M.Aoyagi, S.Watanabe,"Stochastic complexities of reduced rank regression in Bayesian estimation," to appear in Neural Networks, Paper(PDF) .

(I) Hidden Markov Models

The asymptotic behavior of the marginal likelihood of the hidden Markov model was clarified, based on the algebraic geometrical method.

Please refer to Dr. Keisuke Yamazaki .

(J) Variational Bayes

The variational Bayes approximation of the a posteriori distribution is called the mean field approximation in physics. It has been applied to information sceice, resulting in the EM-like learning algorithm. We clarified the approximation bounds of the variational Bayes in sigular statistical models.

(J-1)K.Watanabe, S.Watanabe,"Lower bounds of stocastic compelxities in variational Bayes learning of gaussian mixture models," Proceedings of IEEE International Conference on Cybernetics and Intelligence Systems,pp.99-104,2004.

(J-2) K.Watanabe, S.Watanabe," Stochastic complexity for mixture of exponential families in variational bayes", to appear in Algorithmic Learning Theory.

(J-3) Kazuho Watanabe, Sumio Watanabe, "Stochastic complexities of gaussian mixtures in variational bayesian approximation," Journal of Machine Learning Research, Vol.7, pp.625-644, 2006.

(J-4) Shinichi Nakajima, Sumio Watanabe, "Generalization Performance of Subspace Bayes Approach in Linear Neural Networks," IEICE Transactions, Vol.E89-D, no.3, pp.1128-1138, 2006.

(K) Recent Results

For recent results, see Singular Learning Theorey