spadro.eu http://spadro.eu fr Colloque de clôture http://spadro.eu/?q=node/47 <div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>Le <strong>Colloque de clôture du projet SPADRO</strong> aura lieu</p> <p>du 30 octobre au 3 novembre 2017</p> <p>au Centre Lazaret, à Sète.</p> <p>Y participeront:</p> <ul><li>Mastane Achab (IMT)</li> <li>Cristina Butucea (ENSAE)</li> <li>Antoine Chambaz (MAP5)</li> <li>Olivier Collier (Modal'X)</li> <li>Rémy Degenne (LPMA)</li> <li>Aurélien Garivier (IMT)</li> <li>Emilie Kaufmann (SequeL)</li> <li>Jonas Khan (IMT)</li> <li>Wouter M. Koolen (CWI Amsterdam)</li> <li>Tor Lattimore (DeepMind)</li> <li>Pierre Ménard (IMT)</li> <li>Vianney Perchet (ENS Cachan)</li> <li>Léonard Torrossian (IMT)</li> <li>Claire Vernade (IMT)</li> </ul><p>A ce jour, le programme se développe ainsi:</p> <ul><li><strong>Lundi</strong></li> <ul><li>arrivée, accueil, repas</li> </ul><li><strong>Mardi</strong> <ul><li>9:00-10:00, trois exposés de 10 minutes: Pierre M., Jonas, Claire</li> <li>10:30-12:30, sessions de travail sur les thèmes «<em>Delayed Bandits</em>» et «<em>Inférence dans des modèles de réseaux/graphes</em>»</li> <li>14:00-15:00, trois exposés de 10 minutes: Wouter, Rémy, Aurélien</li> <li>15:30-18:00, sessions de travail sur les thèmes «<em>Inférence dans des modèles de réseaux/graphes</em>» et «<em>Rank-1 bandits</em>»</li> </ul></li><li><strong>Mercredi</strong></li> <ul><li>9:00-10:00, cinq exposés de 10 minutes: Vianney, Mastane, Tor, Olivier</li> <li>10:30-12:30, sessions de travail sur les thèmes «<em>Estimation de paramètres et de fonctionnelles en théorie des jeux</em>», «<em>Practical asymptotically optimal linear bandit algorithms: can we get a practical algorithm for finite-armed linear bandits with optimal regret asymptotically?</em>» et «<em>Unrealizable linear bandits: what happens when the linear assumption is wrong, how much do we have to pay and how do we make the algorithms robust?</em>»</li> <li>14:00-15:00, trois exposés de 10 minutes: Emilie, Antoine, Cristina</li> <li>15:30-18:00, sessions de travail en petits groupes</li> </ul><li><strong>Jeudi</strong> <ul><li>9:00-10:00, sessions de travail sur les thèmes «<em>Problèmes sparses en théorie des jeux</em>»,<br /> «<em>Versions non-asymptotiques de Track-and-Stop</em>» et «<em>Problème de recherche de dose dans les essais cliniques</em>»</li> <li>Après-midi, activités libres</li> </ul></li><li><strong>Vendredi</strong> <ul><li>9:00-10:00, sessions de travail en petit groupes</li> <li>10:30-12:30, sessions de travail en petits groupes</li> </ul></li></ul></div></div></div> Fri, 13 Oct 2017 15:13:39 +0000 zenno 47 at http://spadro.eu http://spadro.eu/?q=node/47#comments Faster rates for policy learning http://spadro.eu/?q=node/45 <div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>This article improves the existing proven rates of regret decay in optimal policy estimation. We give a margin-free result showing that the regret decay for estimating a within-class optimal policy is second-order for empirical risk minimizers over Donsker classes, with regret decaying at a faster rate than the standard error of an efficient estimator of the value of an optimal policy. We also give a result from the classification literature that shows that faster regret decay is possible via plug-in estimation provided a margin condition holds. Four examples are considered. In these examples, the regret is expressed in terms of either the mean value or the median value; the number of possible actions is either two or finitely many; and the sampling scheme is either independent and identically distributed or sequential, where the latter represents a contextual bandit sampling scheme.</p> </div></div></div><div class="field field-name-field-ref field-type-text field-label-above"><div class="field-label">Référence Bibliographique:&nbsp;</div><div class="field-items"><div class="field-item even">https://hal.archives-ouvertes.fr/hal-01511409</div></div></div><div class="field field-name-field-auteurs field-type-text field-label-above"><div class="field-label">Auteurs:&nbsp;</div><div class="field-items"><div class="field-item even">Alexander R Luedtke and Antoine Chambaz</div></div></div> Fri, 21 Apr 2017 05:45:17 +0000 zenno 45 at http://spadro.eu http://spadro.eu/?q=node/45#comments On the estimation of the mean of a random vector http://spadro.eu/?q=node/44 <div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>We study the problem of estimating the mean of a multivariate distribution based on independent samples. The main result is the proof of existence of an estimator with a non-asymptotic sub-Gaussian performance for all distributions satisfying some mild moment assumptions.</p> </div></div></div><div class="field field-name-field-ref field-type-text field-label-above"><div class="field-label">Référence Bibliographique:&nbsp;</div><div class="field-items"><div class="field-item even">Emilien Joly and Gábor Lugosi and Roberto Imbuzeiro Oliveira, On the estimation of the mean of a random vector, Electron. J. Statist., 11(1):440-451, 2017</div></div></div><div class="field field-name-field-auteurs field-type-text field-label-above"><div class="field-label">Auteurs:&nbsp;</div><div class="field-items"><div class="field-item even">Emilien Joly, Gábor Lugosi, Roberto Imbuzeiro Oliveira</div></div></div> Mon, 13 Mar 2017 14:40:12 +0000 zenno 44 at http://spadro.eu http://spadro.eu/?q=node/44#comments A Minkowski Theorem for Quasicrystals http://spadro.eu/?q=node/43 <div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>The aim of this paper is to generalize Minkowski’s theorem. This theorem is usually stated for a centrally symmetric convex body and a lattice both included in R^n. In some situations, one may replace the lattice by a more general set for which a notion of density exists. In this paper, we prove a Minkowski theorem for quasicrystals, which bounds from below the frequency of differences appearing in the quasicrystal and belonging to a centrally symmetric convex body. The last part of the paper is devoted to quite natural applications of this theorem to Diophantine approximation and to discretization of linear maps.</p> </div></div></div><div class="field field-name-field-ref field-type-text field-label-above"><div class="field-label">Référence Bibliographique:&nbsp;</div><div class="field-items"><div class="field-item even">Pierre-Antoine Guihéneuf and Emilien Joly, A Minkowski Theorem for Quasicrystals, Discrete Comput Geom (2017)</div></div></div><div class="field field-name-field-auteurs field-type-text field-label-above"><div class="field-label">Auteurs:&nbsp;</div><div class="field-items"><div class="field-item even">Pierre-Antoine Guihéneuf, Emilien Joly</div></div></div> Mon, 13 Mar 2017 14:38:06 +0000 zenno 43 at http://spadro.eu http://spadro.eu/?q=node/43#comments Targeted sequential design for targeted learning inference of the optimal treatment rule and its mean reward http://spadro.eu/?q=node/42 <div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>This article studies the targeted sequential inference of an optimal treatment rule (TR) and its mean reward in the non-exceptional case, i.e., assuming that there is no stratum of the baseline covariates where treatment is neither beneficial nor harmful, and under a companion margin assumption. Our pivotal estimator, whose definition hinges on the targeted minimum loss estimation (TMLE) principle, actually infers the mean reward under the current estimate of the optimal TR. This data-adaptive statistical parameter is worthy of interest on its own. Our main result is a central limit theorem which enables the instruction of confidence intervals on both mean rewards under the current estimate of the optimal TR and under the optimal TR itself. The asymptotic variance of the estimator takes the form of the variance of an efficient influence curve at a limiting distribution, allowing to discuss the efficiency of inference. As a by product, we also derive confidence intervals on two cumulated pseudo-regrets, a key notion in the study of bandits problems. A simulation study illustrates the procedure. One of the cornerstones of the theoretical study is a new maximal inequality for martingales with respect to the uniform entropy integral.</p> </div></div></div><div class="field field-name-field-ref field-type-text field-label-above"><div class="field-label">Référence Bibliographique:&nbsp;</div><div class="field-items"><div class="field-item even">To appear in Ann. Statist.</div></div></div><div class="field field-name-field-auteurs field-type-text field-label-above"><div class="field-label">Auteurs:&nbsp;</div><div class="field-items"><div class="field-item even">Antoine Chambaz, Wenjing Zheng and M. J. van der Laan</div></div></div> Tue, 13 Dec 2016 01:13:49 +0000 zenno 42 at http://spadro.eu http://spadro.eu/?q=node/42#comments Extension de SPADRO pour six mois supplémentaires http://spadro.eu/?q=node/41 <div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>Chers collègues,</p> <p>Nous avons une excellente nouvelle à partager avec vous en cette fin d'année 2016. L'Agence Nationale de la Recherche a donné un avis favorable au prolongement de six mois de SPADRO. La fin du projet est donc reportée au 31 décembre 2017, dans un peu plus d'un an. C'est une belle opportunité qui nous est donnée de pousser plus avant nos explorations!</p> <p>En vous souhaitant de bonnes fêtes, certes un peu prématurément, amitiés,</p> <p> Antoine et Aurélien</p> </div></div></div> Mon, 12 Dec 2016 23:41:33 +0000 zenno 41 at http://spadro.eu http://spadro.eu/?q=node/41#comments Refined Lower Bounds for Adversarial Bandits http://spadro.eu/?q=node/40 <div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>We provide new lower bounds on the regret that must be suffered by adversarial bandit algorithms. The new results show that recent upper bounds that either (a) hold with high-probability or (b) depend on the total lossof the best arm or (c) depend on the quadratic variation of the losses, are close to tight. Besides this we prove two impossibility results. First, the existence of a single arm that is optimal in every round cannot improve the regret in the worst case. Second, the regret cannot scale with the effective range of the losses. In contrast, both results are possible in the full-information setting.</p> </div></div></div><div class="field field-name-field-ref field-type-text field-label-above"><div class="field-label">Référence Bibliographique:&nbsp;</div><div class="field-items"><div class="field-item even"> Neural Information Processing Systems n°30 Dec. 2016, ArXiv:1605.07416</div></div></div><div class="field field-name-field-auteurs field-type-text field-label-above"><div class="field-label">Auteurs:&nbsp;</div><div class="field-items"><div class="field-item even">Sébastien Gerchinovitz, Tor Lattimore</div></div></div> Sat, 27 Aug 2016 08:05:58 +0000 zenno 40 at http://spadro.eu http://spadro.eu/?q=node/40#comments Conditional quantile sequential estimation for stochastic codes http://spadro.eu/?q=node/39 <div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>This paper is devoted to the estimation of conditional quantile, more precisely the quantile of the output of a real stochastic code whose inputs are in R d. In this purpose, we introduce a stochastic algorithm based on Robbins-Monro algorithm and on k-nearest neighbors theory. We propose conditions on the code for that algorithm to be convergent and study the non-asymptotic rate of convergence of the means square error. Finally, we give optimal parameters of the algorithm to obtain the best rate of convergence.</p> </div></div></div><div class="field field-name-field-ref field-type-text field-label-above"><div class="field-label">Référence Bibliographique:&nbsp;</div><div class="field-items"><div class="field-item even"> ArXiv:1508.06505 hal-01187329</div></div></div><div class="field field-name-field-auteurs field-type-text field-label-above"><div class="field-label">Auteurs:&nbsp;</div><div class="field-items"><div class="field-item even"> Tatiana Labopin-Richard, Aurélien Garivier, Fabrice Gamboa</div></div></div> Mon, 22 Aug 2016 09:41:53 +0000 zenno 39 at http://spadro.eu http://spadro.eu/?q=node/39#comments On Explore-Then-Commit Strategies http://spadro.eu/?q=node/38 <div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known. Besides the main message, we also refine existing deviation inequalities, which allow us to design fully sequential strategies with finite-time regret guarantees that are (a) asymptotically optimal as the horizon grows and (b) order-optimal in the minimax sense. Furthermore we provide empirical evidence that the theory also holds in practice and discuss extensions to non-gaussian and multiple-armed case.</p> </div></div></div><div class="field field-name-field-ref field-type-text field-label-above"><div class="field-label">Référence Bibliographique:&nbsp;</div><div class="field-items"><div class="field-item even"> Neural Information Processing Systems n°30 Dec. 2016, ArXiv:1605.08988 hal-01322906</div></div></div><div class="field field-name-field-auteurs field-type-text field-label-above"><div class="field-label">Auteurs:&nbsp;</div><div class="field-items"><div class="field-item even"> Aurélien Garivier, Emilie Kaufmann, Tor Lattimore</div></div></div> Mon, 22 Aug 2016 09:36:21 +0000 zenno 38 at http://spadro.eu http://spadro.eu/?q=node/38#comments Explore First, Exploit Next: The True Shape of Regret in Bandit Problems http://spadro.eu/?q=node/37 <div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.</p> </div></div></div><div class="field field-name-field-ref field-type-text field-label-above"><div class="field-label">Référence Bibliographique:&nbsp;</div><div class="field-items"><div class="field-item even"> ArXiv:1602.07182 hal-01276324</div></div></div><div class="field field-name-field-auteurs field-type-text field-label-above"><div class="field-label">Auteurs:&nbsp;</div><div class="field-items"><div class="field-item even"> Aurélien Garivier, Gilles Stoltz, Pierre Ménard</div></div></div> Mon, 22 Aug 2016 09:35:16 +0000 zenno 37 at http://spadro.eu http://spadro.eu/?q=node/37#comments