Abstract:
The Bayesian information criterion (BIC) and Krichevsky- Trofimov (KT) version of minimum description length (MDL) principle are popular in the study of model selection. For order estimation of Markov chains, both are known to be strongly consistent when there is an upper-bound on the order. In the unbounded case, the BIC is also known to be consistent, but the KT estimator is consistent only with a bound o(log n) on the order. For context trees, a flexible generalization of Markov models widely used in data processing, the problem is more complicated both in theory and practice, given the substantially higher number of possible candidate models. Imre Csiszar and Zsolt Talata proved the consistency of BIC and KT when the hypothetical tree depths are allowed to grow as o(log n). This correspondence proves that such a restriction is not necessary for finite context sources: the BIC context tree estimator is strongly consistent even if there is no constraint at all on the size of the chosen tree. Moreover, an algorithm computing the tree minimizing the BIC criterion among all context trees in linear time is provided.