Sumio Watanabe




(Postal Mail)


Sumio Watanabe, Ph.D.

Professor of
Department of Mathematical and Computing Science
Tokyo Institute of Technology,
Mail-Box W8-42, 2-12-1, Oookayama, Meguro-ku, Tokyo,
152-8552, Japan


(E-mail)

swatanab (AT) c . titech . ac . jp
Japanese Homepage
DBLP: Computer Science Bibliography
Paper Information

Return to Watanabe Lab.

Algebraic Geometry and Learning Theory

In 1998, we found a bridge between algebraic geometry and learning theory.



Please search WAIC in studies of covid-19.

WAIC in Practical Problems





Sumio Watanabe (2021) Information criteria and cross validation for Bayesian inference in regular and singular cases.
Japanese Journal of Statistics and Data Science.
https://doi.org/10.1007/s42081-021-00121-3


In this paper, in order to establish a mathematical foundation for developing a measure of a statistical model and a prior, we show the relation among the generalization loss, the information criteria, and the cross-validation loss, then compare them from three different points of view. First, their performances are compared in singular problems where the posterior distribution is far from any normal distribution. Second, they are studied in the case when a leverage sample point is contained in data. And last, their stochastic properties are clarified when they are used for the prior optimization problem. The mathematical and experimental comparison shows the equivalence and the difference among them, which we expect useful in practical applications.




Sumio Watanabe (2021) WAIC and WBIC for mixture models. Bihaviormetrika,
https://doi.org/10.1007/s41237-021-00133-z


In this paper, we introduce mathematical foundation and computing methods of WAIC and WBIC in a normal mixture which is a typically singular statistical model, and discuss their properties in statistical inference. Also, we study the case that samples are not independently and identically distributed, for example, they are conditional independent or exchangeable. Furthermore, a simple calculation method of WBIC in mixture models is proposed.

Note: In a Gibbs sampler, it was difficult to calculate WBIC. However, we made a simple calculation method of WBIC in the above paper. The following is a Matlab file.

Matlab file : WAIC and WBIC for a normal mixture



Mathematical Theory of Bayesian Statistics

In statistics and machine learning, a data-generating distributuon is estimated by a predictive distribution defined from a statistical model and a prior. In an older Bayesian framework, it was explained that the Bayesian predictive distribution should be the best on the assumption that a statistical model is convinced to be correct and a prior is given by a subjective belief in a small world. However, such a restricted formalism of Bayesian inference cannot be applied to highly complicated statistical models and learning machines in a large world.

In this book, in order to establish the mathematical foundation for Bayesian stistsics in the large world, it is shown that mathematical theorems universally hold for an arbitrary triple (data-generating distribution, a statistical model, a prior).

This book may be useful for the readers who are interested in the followings.

(1) All models are wrong. If a model is wrong, coherent inference has no meaning.

(2) If we think that there is no data generating process (DGP), then we are automatically convinced that our own model and prior are DGP.

(3) Thus, we had better check or test our model and prior, even in Bayesian data analysis.

(4) There are new mathematical theorems in Bayesian statistics. We had better know them in statistics.

Statistical Leaning Theory




The reason why singular learning theory is necessary in deep learning.

A small neural network was trained so that it classifies O and X.
NN is singular

Eigenvales of the neural network are almost equal to zero.
NN is singular


Singular learning theory can be applied to neural networks, because it holds even if Fisher information matrix is highly singular. The generalization loss and log marginal likelihood are given by the real log canonical threshold, which is the volume dimension of singularities. WAIC and WBIC can be used in neural networks. AIC is far larger than the Generalization loss.
WAIC, LOO, AIC, and Generalization Loss in Neural Network (mp4) .

LOO requires independency of (Xi,Yi), whereas information criteria do only conditional independency of (Yi|Xi). Thus information criteria can be used more widely than LOO, for example AIC and WAIC can be applied to AR model in time series analysis.
(See: S. Watanabe, Mathematical theory of Bayesian statistics, 2018, CRC Press).







Lecture Note 2021 on Statstical Leaning Theory


Singular learning theory is explained in this lecture. Mathematical learning theory was establised based on algebraic geometry, which is truly useful in the real world.




Sumio Watanabe, homepage continued