Improved Algorithms for Linear Stochastic Bandits, Advances in Neural Information Processing Systems, 2011. ,
Sample mean based index policies with O(log n) regret for the multi-armed bandit problem, Advances in Applied Probability, vol.27, issue.4, pp.1054-1078, 1995. ,
Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: finite parameter space, IEEE Transactions on Automatic Control, vol.34, issue.3, pp.258-267, 1989. ,
DOI : 10.1109/9.16415
Analysis of Thompson Sampling for the multi-armed bandit problem, Proceedings of the 25th Conference On Learning Theory, 2012. ,
Further Optimal Regret Bounds for Thompson Sampling, Proceedings of the 16th Conference on Artificial Intelligence and Statistics, 2013. ,
Thompson Sampling for Contextual Bandits with Linear Payoffs, International Conference on Machine Learning (ICML), 2013. ,
A Bayesian sampling approach to exploration in reinforcement learning, Uncertainty in Artificial Intelligence (UAI), 2009. ,
Regret Bounds and Minimax Policies under Partial Monitoring, Journal of Machine Learning Research, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00654356
Best Arm Identification in Multi-armed Bandits, Proceedings of the 23rd Conference on Learning Theory, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00654404
Exploration-exploitation trade-off using variance estimates in multi-armed bandits, Theoretical Computer Science, p.410, 2009. ,
DOI : 10.1016/j.tcs.2009.01.016
Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002. ,
DOI : 10.1137/S0097539701398375
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.158
Sequential identification and ranking procedures, 1968. ,
The theory of dynamic programming, Bulletin of the American Mathematical Society, vol.60, issue.6, pp.503-515, 1954. ,
DOI : 10.1090/S0002-9904-1954-09848-8
A problem in the sequential design of experiments. The indian journal of statistics, pp.221-229, 1956. ,
Bandit Problems. Sequential allocation of experiments, 1985. ,
Mathematical Statistics, Basic Ideas and Selected Topics, 2001. ,
Pattern Recognition and Machine Learning, 2006. ,
Concentration inequalities. A non asymptotic theory of independence, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00751496
On Sequential Designs for Maximizing the Sum of $n$ Observations, The Annals of Mathematical Statistics, vol.27, issue.4, pp.1060-1074, 1956. ,
DOI : 10.1214/aoms/1177728073
Jeux de bandits et fondation du clustering, 2010. ,
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Machine Learning, pp.1-122, 2012. ,
DOI : 10.1561/2200000024
Towards Minimax Policies for Online Linear Opimization with Bandit Feedback, Proceedings of the 25th Conference On Learning Theory, 2012. ,
Prior-free and prior-dependent regret bounds for Thompson Sampling, 2014 48th Annual Conference on Information Sciences and Systems (CISS), 2013. ,
DOI : 10.1109/CISS.2014.6814158
Pure exploration in finitely-armed and continuous-armed bandits, Theoretical Computer Science, vol.412, issue.19, pp.1832-18521832, 2011. ,
DOI : 10.1016/j.tcs.2010.12.059
URL : https://hal.archives-ouvertes.fr/hal-00609550
Bounded regret in stochastic multi-armed bandits, Proceedings of the 26th Conference On Leaning Theory, 2013. ,
Multiple Identifications in multi-armed bandits, International Conference on Machine Learning (ICML), 2013. ,
Optimal Adaptive Policies for Sequential Allocation Problems, Advances in Applied Mathematics, vol.17, issue.2, pp.122-142, 1996. ,
DOI : 10.1006/aama.1996.0007
URL : http://doi.org/10.1006/aama.1996.0007
ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM, Probability in the Engineering and Informational Sciences, pp.53-82, 2003. ,
DOI : 10.1017/S0269964803171045
Prediction, Learning and Games, 2006. ,
DOI : 10.1017/CBO9780511546921
Combinatorial bandits, Journal of Computer and System Sciences, vol.78, issue.5, pp.1404-1422, 2012. ,
DOI : 10.1016/j.jcss.2012.01.001
Finding a most biaised coin with fewest flips, Proceeding of the 27th Conference on Learning Theory, 2014. ,
Optimal stopping and dynamic allocation, Advances in Applied Probability, vol.19, pp.829-853, 1987. ,
DOI : 10.2307/1427104
An empirical evaluation of Thompson Sampling, Advances in Neural Information Processing Systems, 2011. ,
Simple and Scalable Response Prediction for Display Advertising, Transactions on Intelligent Systems and Technology, 2014. ,
DOI : 10.1145/2532128
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.389.7316
Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration, Proceedings of the European Conference on Machine Learning, 2013. ,
DOI : 10.1007/978-3-642-40988-2_15
URL : http://arxiv.org/abs/1304.5350
The Price of Bandit Information in Online Optimization, Advances in Neural Information and Signal Processing, 2007. ,
Stochastic Linear Optimization under Bandit Feedback, Advances in Neural Information and Signal Processing, pp.355-366, 2008. ,
Self-Normalized Processes: Exponential inequalities, moment bounds and iterated logarithm laws. The Annals of Probability, pp.321902-1933, 2004. ,
Self-normalized processes. Limit Theory and Statistical applications, 2009. ,
Probability: Theory and Examples, 2010. ,
DOI : 10.1017/CBO9780511779398
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems, Journal of Machine Learning Research, vol.7, pp.1079-1105, 2006. ,
Contributions to the " two-armed bandit, The Annals of Mathematical Statistics, vol.33, issue.3, pp.947-956, 1962. ,
DOI : 10.1214/aoms/1177704454
Optimism in reinforcement learning and Kullback-Leibler divergence, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2010. ,
DOI : 10.1109/ALLERTON.2010.5706896
URL : https://hal.archives-ouvertes.fr/hal-00476116
Parametric Bandits : The Generalized Linear case, Advances in Neural Information Processing Systems, 2010. ,
An optimistic posterior sampling strategy for Bayesian reinforcement learning, Workshop on Bayesian Optimization, NIPS, 2013. ,
Four proofs of Gittins??? multiarmed bandit theorem, Annals of Operations Research, vol.25, issue.12, 1999. ,
DOI : 10.1007/s10479-013-1523-0
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.295.444
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence, Advances in Neural Information Processing Systems, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00747005
Informational confidence bounds for self-normalized averages and applications, 2013 IEEE Information Theory Workshop (ITW), 2013. ,
DOI : 10.1109/ITW.2013.6691311
URL : https://hal.archives-ouvertes.fr/hal-00862062
The KL-UCB algorithm for bounded stochastic bandits and beyond, Proceedings of the 24th Conference on Learning Theory, 2011. ,
On Upper-Confidence Bound Policies for Switching Bandit Problems, Proceedings of the 22nd conference on Algorithmic Learning Theory, 2011. ,
DOI : 10.1007/978-3-642-24412-4_16
Small-sample frequentist properties of Bernoulli two-armed bandit Bayesian strategies, 1994. ,
Small-sample performance of Bernoulli two-armed bandit Bayesian strategies, Journal of Statistical Planning and Inference, vol.79, issue.1, pp.107-122, 1999. ,
DOI : 10.1016/S0378-3758(98)00230-4
Bandit processes and dynamic allocation indices, Journal of the Royal Statistical Society, Series B, vol.41, issue.2, pp.148-177, 1979. ,
DOI : 10.1002/9780470980033
A dynamic allocation index for the sequential design of experiments, Progress in Statistics (proceedings of the 1972 European Meeting of Statisticians), 1974. ,
Thompson Sampling for Complex Online Problems, International Conference on Machine Learning (ICML), 2014. ,
Solving two???armed Bernoulli bandit problems using a Bayesian learning automaton, International Journal of Intelligent Computing and Cybernetics, vol.3, issue.2, pp.207-234, 2010. ,
DOI : 10.1108/17563781011049179
Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains, SIAM Journal on Control and Optimization, vol.35, issue.3, pp.715-743, 1997. ,
DOI : 10.1137/S0363012994275440
Stochastic Regret Minimization via Thompson Sampling, Proceedings of the 27th Conference On Learning Theory, 2014. ,
Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009. ,
DOI : 10.1145/1553374.1553426
Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, vol.58, p.1330, 1963. ,
On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning, Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, 2014. ,
An Asymptotically Optimal Bandit Algorithm for Bounded Support Models, Proceedings of the 23rd Conference on Learning Theory, 2010. ,
Optimality of Thompson Sampling for Gaussian Bandits depends on priors, Proceedings of the 17th conference on Artificial Intelligence and Statistics, 2014. ,
Near-Optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010. ,
lil'UCB: an Optimal Exploration Algorithm for Multi-Armed Bandits, Proceedings of the 27th Conference on Learning Theory, 2014. ,
An Invariant Form for the Prior Probability in Estimation Problems, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol.186, issue.1007, pp.453-461, 1946. ,
DOI : 10.1098/rspa.1946.0056
Asymptotically optimal procedures for sequential adaptive selection of the best of several normal means. Statistical Decision Theory and Related Topics III, pp.55-86, 1982. ,
Efficient Selection in Multiple Bandit Arms: Theory and Practice, International Conference on Machine Learning (ICML), 2010. ,
PAC subset selection in stochastic multi-armed bandits, International Conference on Machine Learning (ICML), 2012. ,
Almost optimal Exploration in multi-armed bandits, International Conference on Machine Learning (ICML), 2013. ,
Sequential choice from several populations., Proceedings of the National Academy of Science, pp.8584-8585, 1995. ,
DOI : 10.1073/pnas.92.19.8584
URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC41010
On Bayesian Upper- Confidence Bounds for Bandit Problems, Proceedings of the 15th conference on Artificial Intelligence and Statistics, 2012. ,
On the Complexity of A/B Testing, Proceedings of the 27th Conference On Learning Theory, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00990254
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01024894
Information complexity in bandit subset selection, Proceeding of the 26th Conference On Learning Theory, 2013. ,
Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis, Proceedings of the 23rd conference on Algorithmic Learning Theory, 2012. ,
DOI : 10.1007/978-3-642-34106-9_18
URL : https://hal.archives-ouvertes.fr/hal-00830033
Thompson Sampling for 1- dimensional Exponential family bandits, Advances in Neural Information Processing Systems, 2013. ,
Contextual Gaussian Process Bandit Optimization, Advances in Neural Information Processing Systems, 2011. ,
Adaptive Treatment Allocation and the Multi-Armed Bandit Problem, The Annals of Statistics, vol.15, issue.3, pp.1091-1114, 1987. ,
DOI : 10.1214/aos/1176350495
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985. ,
DOI : 10.1016/0196-8858(85)90002-8
Adaptive estimation of a quadratic functional of a density by model selection, ESAIM: Probability and Statistics, vol.9, issue.5, pp.1302-1338, 2000. ,
DOI : 10.1051/ps:2005001
Spectrum bandit optimization, 2013 IEEE Information Theory Workshop (ITW), 2013. ,
DOI : 10.1109/ITW.2013.6691221
URL : https://hal.archives-ouvertes.fr/hal-00917427
On a Conjecture of Bechhofer, Kiefer, and Sobel for the Levin-Robbins-Leu Binomial Subset Selection Procedures, Sequential Analysis, vol.27, pp.106-125, 2008. ,
A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences, Proceedings of the 24th Conference On Learning Theory, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00574987
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem, Journal of Machine Learning Research, pp.623-648, 2004. ,
The Racing Algorithm: Model Selection for Lazy Learners, Artificial Intelligence Review, vol.11, issue.15, pp.113-131, 1997. ,
DOI : 10.1007/978-94-017-2053-3_8
Thompson Sampling in Switching Environments with Bayesian Online Change Point Detection, Proceeding of the 16th Conference on Artificial Intelligence and Statistics, 2013. ,
Empirical Bernstein stopping, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008. ,
DOI : 10.1145/1390156.1390241
URL : https://hal.archives-ouvertes.fr/hal-00834983
Active sequential hypothesis testing, The Annals of Statistics, vol.41, issue.6, pp.2703-2738, 2013. ,
DOI : 10.1214/13-AOS1144SUPP
URL : http://arxiv.org/abs/1203.4626
MartingalesàMartingales`Martingalesà temps discret, 1972. ,
Computing a Classic Index for Finite-Horizon Bandits, INFORMS Journal on Computing, vol.23, issue.2, pp.254-267, 2011. ,
DOI : 10.1287/ijoc.1100.0398
(More) Efficient Reinforcement Learning Via Posterior Sampling, Advances in Neural Information Processing Systems, 2013. ,
A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations, The Annals of Mathematical Statistics, vol.35, issue.1, pp.174-180, 1964. ,
DOI : 10.1214/aoms/1177703739
Simulation studies of multi-armed bandits with covariates, 10th Proceedings of the International Conference on Computer Modeling, 2008. ,
Markov Decision Processes. Discrete Stochastic, 1994. ,
Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952. ,
DOI : 10.1090/S0002-9904-1952-09620-8
Statistical Methods Related to the Law of the Iterated Logarithm, The Annals of Mathematical Statistics, vol.41, issue.5, pp.1397-1409, 1970. ,
DOI : 10.1214/aoms/1177696786
Linearly Parameterized Bandits, Mathematics of Operations Research, vol.35, issue.2, pp.395-411, 2010. ,
DOI : 10.1287/moor.1100.0446
URL : http://arxiv.org/abs/0812.3465
Learning to Optimize via Posterior Sampling, Mathematics of Operations Research, vol.39, issue.4, 2014. ,
DOI : 10.1287/moor.2014.0650
Deviations of Stochastic Bandit Regret, Proceedings of the 22nd conference on Algorithmic Learning Theory, 2011. ,
DOI : 10.1007/978-3-642-24412-4_15
URL : https://hal.archives-ouvertes.fr/hal-00624461
A Shrinkage-Thresholding Metropolis Adjusted Langevin Algorithm for Bayesian Variable Selection, IEEE Journal of Selected Topics in Signal Processing, vol.10, issue.2, 2013. ,
DOI : 10.1109/JSTSP.2015.2496546
URL : https://hal.archives-ouvertes.fr/hal-00921130
A modern Bayesian look at the multi-armed bandit, Applied Stochastic Models in Business and Industry, vol.9, issue.2, pp.639-658, 2010. ,
DOI : 10.1002/asmb.874
Sequential Analysis, 1985. ,
DOI : 10.1007/978-1-4757-1862-1
ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES, Biometrika, vol.25, issue.3-4, pp.285-294, 1933. ,
DOI : 10.1093/biomet/25.3-4.285
On the Theory of Apportionment, American Journal of Mathematics, vol.57, issue.2, pp.450-456, 1935. ,
DOI : 10.2307/2371219
Finite-time analysis of kernelized contextual bandits, 29th Conference on Uncertainty in Artificial Intelligence (UAI), 2013. ,
Sequential Tests of Statistical Hypotheses, The Annals of Mathematical Statistics, vol.16, issue.2, pp.117-186, 1945. ,
DOI : 10.1214/aoms/1177731118
All of Statistics: A concise course in statistical inference, 2010. ,
DOI : 10.1007/978-0-387-21736-9
On the Gittins Index for Multiarmed Bandits, The Annals of Applied Probability, vol.2, issue.4, pp.1024-1033, 1992. ,
DOI : 10.1214/aoap/1177005588