Return (Recent Paper) If the posterior can be approximated by some normal distribution, then CV and WAIC are asymptotically equivalent to each other in the second order. Sumio Watanabe, Higher Order Equivalence of Bayes Cross Validation and WAIC , Springer Proceedings of Mathematics and Statistics, Information geometry and its applications (4), pp.47-73, 2018. Thus, minimizations of CV and WAIC are asymptotically equivalent in prior optimization. In experiments, the variance by WAIC is smaller than that by CV. Prior optimization by CV and WAIC (mp4) Hyperparameter Optimization In hyperparameter optimization, WAIC is better than ISCV (importance sampling cross validation). The reason why WAIC is preferable is that the variance of WAIC in MCMC is smaller than that of ISCV. The above figure shows that, in a simple regression problem Y=aX+N(0,1/s), the MCMC fluctuation is not small even if the posterior sample size of MCMC is 10000 (sample size of data is n=30). sample data, sample program (MATLAB). The following figure shows the case that the MCMC sample size is 100000 for the same problem. Remark: In simple artificial simulations, WAIC and ISCV have the almost same values, however, in practical applications, the fluctuation of WAIC in MCMC is often smaller than that of ISCV. In hyperparameter otpimization problems, such phenomenon is often observed. In practical cases, I recommend that you had better calculate both WAIC and ISCV and compare their fluctuations in MCMC. The computational costs of WAIC and ISCV are equal to each other. Summary of Singular Learning Theory Bayes Theory Essential . You can learn Bayes theory in 5 minutes. Neural Networks and Singular Learning Theory . Singular Learning Theory and Information Criterion . (NEW BOOK) Mathematical theory of deep learning was already discovered. A neural network in Bayes (mp4) . Sumio Watanabe, Mathematical Theory of Bayesian Statistics, CRC Press, 2018 Sumio Watanabe, Algebraic geometry and statistical learning theory, Cambridge University Press, 2009. Watanabe, S. Algebraic geometrical methods for hierarchical learning machines. Neural Networks. Vol.14,No.8,pp.1049-1060. 2001. DOI: 10.1016/S0893-6080(01)00069-7 Watanabe, S. Algebraic analysis for nonidentifiable learning machines Neural Computation. Vol.13, No.4. pp.899-933, 2001. DOI: 10.1162/089976601300014402 Applications to WAIC and WBIC |
Beyond Laplace and Fisher
WAIC and WBIC WAIC(2010) is the generalized version of AIC. WBIC(2013) is the generalized version of BIC. |
WAIC and WBIC can be used even if the posterior distribution is far from any normal distribution. |
Sumio Watanabe, Algebraic Geometry and Statistical Learning Theory, Cambridge University Press, 2009. New statistical theory is established that holds even for non-regular models such as a normal mixture, a neural network, and hidden Markov models. The resolution theorem in algebraic geometry transforms the likelihood function to a new standard form in statistics. The asymptotic behavior of the log likelihood ratio function is given by the limit empirical process on algebraic variety. This theory contains regular statistical theory as a very special part. We can make generalized concepts of AIC and BIC, even if a true distribution is unrealizable by or singular for a statistical model. In fact, WAIC and WBIC are derived. It is very easy to apply them to practical applications. Both WAIC and WBIC are based on the completely new statistical theory. Neither positive definiteness of Fisher information matrix, asymptotic normality of MLE, nor Laplace approximation is necessary in our new theory. Thus, our theory holds for wide range of statistical models. |
Let's compare WAIC with DIC. Let's compare CV with WAIC. Let's compare PSISCV with WAIC. Let's try WBIC |