Model Selection in Singular Learning Machines

Return to Sumio Watanabe Homepage

AIC is not appropriate for model selection in singular learning machines, because it is quite different from the expectation vaule of the asymptotic generalization error of the maximum likelihood estimator.

In singular learning machines, BIC is not equal to the asymptotic Bayes marginal likelihood.

In singular learning machines, MDL is not equal to the minimum description Length.

That is to say, in neural networks, hidden Markov models, Bayesian networks, reduced rank regressions, normal mixtures, binomial mixtures, we need to establish the new model evaluation method.

If you are a statistician, you must know this fact very well. If you are a researcher of machine learning, you should understand this fact, because almost all learning machines in information science are singular. If you have some lectures of probability and statistics in your university, please do not teach younger students that AIC, BIC, and MDL can be used in model selection in machine learning.

If the true model is rigorously a finite model in a given model family, then the generalization errors are given as follows.


If the true model is not contained in any finite model, then the generalization errors are given as follows. This is the case of practical applications.


Only when we employ the Bayes estimation with Jeffreys' prior, the asymptotic marginal likelihood is equal to BIC and the asymptotic generalization error is equal to AIC. However, Jeffreys' prior is not appropriate in practical applications because it makes generalization error very large. This fact is written in the following paper.

S. Watanabe,"Algebraic information geometry for learning machines with singularities," Advances in Neural Information Processing Systems, (Denver, USA), pp.329-336. 2001.