The current interpretation of the MDL principle is to represent a class
of models, either given by probability distributions or induced by
prediction error criteria, by a single universal model, which enables us
to encode the observed data with a code length that is as short as it
can be. Such a minimum code length is called the stochastic complexity.
The universal model allows us to decompose the data into the learnable
information bearing part defined by the optimal model and the rest,
which is just noise having no useful information that can be described
by the models in the suggested model class. We call the result a
universal sufficient statistics decomposition in analogy with the
Kolmogorov sufficient statistics decomposition in the algorithmic theory
of information. |
There are several ways to construct universal models, of which we discuss in detail only one, the so-called Normalized Maximum Likelihood(NML) model. For the linear least squares regression problem the NML model with its universal sufficient statistics decomposition can be calculated exactly with a three-fold normalization process. In the important special case of the denoising problem this gives the noise to be removed as that part in the data which cannot be compressed with the suggested models, while the information bearing and hence learnable signal is the compressible part. As a numerical example of denoising we process speech data with wavelets.