Singular Learning Theory

Written by Sumio Watanabe

In singular learning theory, almost all statistical concepts should be studied once more.

(a) Cramer Rao inequality has no meaning.
(b) Fisher information matrix is not positive definite.
(c) Log likelihood function can not be approximated by any quadratic form.
(d) Asymptotic normality of the maximum likelihood estimator (MLE) does not hold.
(e) MLE is not efficient even asymptotically.
(f) Bayes a posteriori distribution does not converge to the normal distribution.
(g) AIC is not equal to the average generalization error.
(h) BIC is not equal to the log marginal likelihood or stochastic complexity.
(i) MDL is not equal to the minimum description length.
(j) Log likelihood ratio does not converge to X^{2}-distribution.
(k) A statistical model is not equivalent to a smooth manifold.

For singular statistical models, almost all statistics should be improved from the viewpoint of functional probability. Algebraic geometry gives us the standard base for such studies. We gave the standard answers for the above problems (a),(b),...,(h). If you are a researcher of mathematics, statistics, or machine learning, you will understand the importance of these fundamental results.

[Examples of singular statistical models]
Normal mixtures, binomial mixtures, multinomial mixtures, Bayes networks, neural networks, radial basis functions, hidden Markov models, stochastic context-free grammars, reduced rank regressions, Boltzmann machines, ...

Algebraic Geometry and Statistical Learning Theory

Sumio Watanabe, ``Algebraic Geometry and Statistical Learning Theory", 2009, August, Cambridge University Press.

0. Invitaiton to Learning Theory
Section 0 is an introduction for high school students. If you are a university student, please start from the following section.

1. Introduction to Singular Learning Theory

1.1 Singular Learning Machines
1.2 Singularities in Learning Machines
1.3 Singularities and Statistics

2. How to calculate Evidence

3. Mean Field Approximation

4. Maximum likelihood and Maximum A Posteriori

5. Learning Curves

6. Model Selection in Singular Learning Machines

7. Equations of States in Singular Statistical Estimation

8. Singular regression problem .

9. Fast Explanation of equation of state in Bayes Learning Theory . For detail, see Equations of states holds even if the true is not contained in parametric models. This paper was published in IEICE Transactions.

10. Equations of states are asymptotically equivalent to Bayes cross-validation.