Analysis of bayesian and frequentist strategies for sequential resource allocation

Abstract : In this thesis, we study strategies for sequential resource allocation, under the so-called stochastic multi-armed bandit model. In this model, when an agent draws an arm, he receives as a reward a realization from a probability distribution associated to the arm. In this document, we consider two different bandit problems. In the reward maximization objective, the agent aims at maximizing the sum of rewards obtained during his interaction with the bandit, whereas in the best arm identification objective, his goal is to find the set of m best arms (i.e. arms with highest mean reward), without suffering a loss when drawing ‘bad’ arms. For these two objectives, we propose strategies, also called bandit algorithms, that are optimal (or close to optimal), in a sense precised below. Maximizing the sum of rewards is equivalent to minimizing a quantity called regret. Thanks to an asymptotic lower bound on the regret of any uniformly efficient algorithm given by Lai and Robbins, one can define asymptotically optimal algorithms as algorithms whose regret reaches this lower bound. In this thesis, we propose, for two Bayesian algorithms, Bayes-UCB and Thompson Sampling, a finite-time analysis, that is a non-asymptotic upper bound on their regret, in the particular case of bandits with binary rewards. This upper bound allows to establish the asymptotic optimality of both algorithms. In the best arm identification framework, a possible goal is to determine the number of samples of the armsneeded to identify, with high probability, the set of m best arms. We define a notion of complexity for best arm identification in two different settings considered in the literature: the fixed-budget and fixed-confidence settings. We provide new lower bounds on these complexity terms and we analyse new algorithms, some of which reach the lower bound in particular cases of two-armed bandit models and are therefore optimal
Complete list of metadatas

Cited literature [110 references]  Display  Hide  Download

https://pastel.archives-ouvertes.fr/tel-01413183
Contributor : Abes Star <>
Submitted on : Friday, December 9, 2016 - 2:36:06 PM
Last modification on : Thursday, October 17, 2019 - 12:36:09 PM
Long-term archiving on: Thursday, March 23, 2017 - 10:16:25 AM

File

TheseKaufmann2.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01413183, version 1

Collections

Citation

Emilie Kaufmann. Analysis of bayesian and frequentist strategies for sequential resource allocation. Machine Learning [cs.LG]. Télécom ParisTech, 2014. English. ⟨NNT : 2014ENST0056⟩. ⟨tel-01413183⟩

Share

Metrics

Record views

567

Files downloads

489