Monday 19 December 2011, 10h40 AM
INRIA: Salle du conseil
Directeur : Philippe Preux, Professeur, Université Lille 3
Rapporteurs : Peter Auer, Professeur, Université de Leoben (Autriche)
Leon Bottou, Microsoft AdCenter
Laszlo Gyorfi, Professeur, Université de Budapest (Hongrie)
Membres : Max Dauchet, Professeur émérite, Université Lille 1
Jan Beirlant, Professeur, Université Catholique de Louvain (Belgique)
Remi Munos, Directeur de Recherche, INRIA
The problem considered is as follows. Given a growing sequence of observations x1,...,xn...., one is required, at each time step n, to make some inference about the stochastic mechanism generating the sequence. Several problems that have numerous applications in different branches of mathematics and computer science can be formulated in this way. For example, one may want to forecast probabilities of the next outcome xn+1 (sequence prediction); to make a decision on whether the mechanism generating the sequence belongs to a certain family H_0 versus it belongs to a different family H_1 (hypothesis testing); to take an action in order to maximize some utility function.
In each of these problems, as well as in many others, in order to be able to make inference, one has to make some assumptions on the probabilistic mechanism generating the data. Typical assumptions are that xi are independent and identically distributed, or that the distribution generating the sequence belongs to a certain parametric family. The central question addressed in this work is: under which assumptions is inference possible? This question is considered for several problems of inference, including sequence prediction, hypothesis testing, classification and reinforcement learning.
The most important results presented here are as follows. For the problem of hypothesis testing, a topological characterization (necessary and sufficient conditions) of those (composite) hypotheses H_0, contained in the set E of all stationary ergodic discrete-valued process measures, that can be consistently tested against the complement E \ H_0 is obtained. The developed approach, which is based on empirical estimates of the distributional distance, is also used to obtain consistent procedures for change point estimation, process classification and clustering, under the only assumption that the data (real-valued, in this case) is generated by stationary ergodic distributions: a setting that is much more general than those in which consistent procedures were known before. I have also demonstrated that a consistent test for homogeneity does not exist for the general case of stationary ergodic (discrete-valued) sequences. For the problem of sequence prediction, it is shown that if there is a consistent predictor for a set of process distributions C, then there is a Bayesian predictor consistent for this set. This is a no-assumption result: the distributions in C can be arbitrary (non-i.i.d., non-stationary, etc.) and the set itself does not even have to be measurable. Several descriptions (sufficient conditions) of those sets C of process distributions for which consistent predictors exist. For the problem of selecting an optimal strategy in a reactive environment (perhaps, the most general inference problem considered) some sufficient conditions on the environments under which it is possible to find a universal asymptotically optimal strategy are identified.
14 Feb 2013
14 Mar 2013
21 Mar 2013
4 Apr 2013
1 - 5 Jul 2013
18 Feb 2013
UMR 8022 - Laboratoire d'Informatique Fondamentale de Lille - Copyright © 2012 Sophie TISON - Crédits & Mentions légales
Pour tout commentaire / Comments and remarks : webmaster