Skip to Main content Skip to Navigation
New interface

Analysis of bayesian and frequentist strategies for sequential resource allocation

Abstract : In this thesis, we study strategies for sequential resource allocation, under the so-called stochastic multi-armed bandit model. In this model, when an agent draws an arm, he receives as a reward a realization from a probability distribution associated to the arm. In this document, we consider two different bandit problems. In the reward maximization objective, the agent aims at maximizing the sum of rewards obtained during his interaction with the bandit, whereas in the best arm identification objective, his goal is to find the set of m best arms (i.e. arms with highest mean reward), without suffering a loss when drawing ‘bad’ arms. For these two objectives, we propose strategies, also called bandit algorithms, that are optimal (or close to optimal), in a sense precised below. Maximizing the sum of rewards is equivalent to minimizing a quantity called regret. Thanks to an asymptotic lower bound on the regret of any uniformly efficient algorithm given by Lai and Robbins, one can define asymptotically optimal algorithms as algorithms whose regret reaches this lower bound. In this thesis, we propose, for two Bayesian algorithms, Bayes-UCB and Thompson Sampling, a finite-time analysis, that is a non-asymptotic upper bound on their regret, in the particular case of bandits with binary rewards. This upper bound allows to establish the asymptotic optimality of both algorithms. In the best arm identification framework, a possible goal is to determine the number of samples of the armsneeded to identify, with high probability, the set of m best arms. We define a notion of complexity for best arm identification in two different settings considered in the literature: the fixed-budget and fixed-confidence settings. We provide new lower bounds on these complexity terms and we analyse new algorithms, some of which reach the lower bound in particular cases of two-armed bandit models and are therefore optimal
Complete list of metadata

Cited literature [110 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Friday, December 9, 2016 - 2:36:06 PM
Last modification on : Tuesday, December 8, 2020 - 10:06:05 AM
Long-term archiving on: : Thursday, March 23, 2017 - 10:16:25 AM


Version validated by the jury (STAR)


  • HAL Id : tel-01413183, version 1



Emilie Kaufmann. Analysis of bayesian and frequentist strategies for sequential resource allocation. Machine Learning [cs.LG]. Télécom ParisTech, 2014. English. ⟨NNT : 2014ENST0056⟩. ⟨tel-01413183⟩



Record views


Files downloads