. Semafour,

R. Sutton and A. Barto, Reinforcement learning: An introduction, vol.1, 1998.

D. Kreutz, Software-defined networking: A comprehensive survey, Proceedings of the IEEE, vol.103, issue.1, pp.14-76, 2015.

. Cisco, Cisco visual networking index: global mobile data traffic forecast update, 2016-2021 white paper, 2017.

P. Horn, Autonomic computing: Ibm's perspective on the state of information technology, 2001.

, Umts; lte; telecommunication management; son; nrm; irp; requirements (Release 8), 2013.

S. Hämäläinen, H. Sanneck, and C. Sartori, Lte self-organising networks (son): network management automation for operational efficiency, 2012.

A. Osianoh-glenn-aliu, M. A. Imran, B. Imran, and . Evans, A survey of self organisation in future cellular networks, IEEE Communications Surveys & Tutorials, vol.15, issue.1, pp.336-361, 2013.

, Evolved universal terrestrial radio access network (e-utran); selfconfiguring and self-optimizing network (son) use cases and solutions, 2012.

, Self-optimizing networks-the benefits of son in lte. technical report, 4G Americas, 2011.

, Self-optimizing networks in 3gpp release 11: The benefits of son in lte. technical report, 4G Americas, 2013.

, Study on next generation self-optimizing network (son) for utran and e-utran, 2015.

, Telecommunication management; study on enhancements of operations, administration and maintenance (oam) aspects of distributed self-organizing networks (son) functions, 2016.

, Digital cellular telecommunications system (phase 2+); umts; lte; telecommunication management; son; nrm; irp; information system, 2013.

T. Bandh, Policy-based coordination and management of son functions, IFIP/IEEE International Symposium on Integrated Network Management (IM), pp.827-840, 2011.

H. Sanneck, T. Bandh, and R. Romeikat, An experimental system for son function coordination, Vehicular Technology Conference (VTC Spring), pp.1-2, 2011.

L. C. Schmelz, A coordination framework for self-organisation in lte networks, IFIP/IEEE International Symposium on Integrated Network Management (IM), pp.193-200, 2011.

O. Iacoboaiea, Coordinating son instances: A reinforcement learning framework, Vehicular Technology Conference (VTC Fall), pp.1-5, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01179552

O. Iacoboaiea, Son conflict diagnosis in heterogeneous networks, 26th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), pp.1459-1463, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01179556

O. Iacoboaiea, Son coordination in heterogeneous networks: A reinforcement learning framework, IEEE Transactions on Wireless Communications, vol.15, issue.9, pp.5835-5847, 2016.

S. Lohmuller, C. Frenzel, and L. C. Schmelz, Dynamic, context-specific son management driven by operator objectives, Network Operations and Management Symposium (NOMS), pp.1-8, 2014.

S. Hahn and T. Kürner, Managing and altering mobile radio networks by using son function performance models, 11th International Symposium on Wireless Communications Systems (ISWCS), pp.214-218, 2014.

S. Lohmuller, C. Frenzel, and L. C. Schmelz, Son management based on weighted objectives and combined son function models, 11th International Symposium on Wireless Communications Systems (ISWCS), pp.149-153, 2014.

L. C. Schmelz, S. Lohmüller, and S. Hahn, Adaptive son management using kpi measurements, IEEE/IFIP Network Operations and Management Symposium (NOMS), pp.625-631, 2016.

5. ,

M. Ii,

, Study on new services and markets technology enablers, 2016.

, View on 5g architecture, 5G PPP Architecture Working Group, 2016.

J. Mitola, Cognitive radio, 2000.
URL : https://hal.archives-ouvertes.fr/hal-00735049

P. M. Luengo, Fuzzy rule-based reinforcement learning for load balancing techniques in enterprise lte femtocells, IEEE Trans. Vehicular Technology, vol.62, issue.5, pp.1962-1973, 2013.

S. Mwanje and A. Mitschele-thiel, A q-learning strategy for lte mobility load balancing, 24th International Symposium on Personal Indoor and Mobile Radio Communications (PIMRC), pp.2154-2158, 2013.

T. Kudo and T. Ohtsuki, Cell range expansion using distributed q-learning in heterogeneous networks, EURASIP Journal on Wireless Communications and Networking, vol.2013, issue.1, p.61, 2013.

P. Coucheney, M. S. Ali, and M. Coupechoux, Load balancing in heterogeneous networks based on distributed learning in near-potential games, IEEE Transactions on Wireless Communications, vol.15, issue.7, pp.5046-5059, 2016.

M. Bennis, M. Simsek, and A. Czylwik, Dynamic inter-cell interference coordination in hetnets: A reinforcement learning approach, Global Communications Conference (GLOBECOM), pp.5446-5450, 2012.

T. , C. N. Morozs, and D. Grace, Distributed heuristically accelerated q-learning for robust cognitive spectrum management in lte cellular systems, IEEE Transactions on Mobile Computing, vol.15, issue.4, pp.817-825, 2016.

A. De-domenico and D. Kténas, Reinforcement learning for interference-aware cell dtx in heterogeneous networks, Wireless Communications and Networking Conference (WCNC), pp.1-6, 2018.

M. Wildemeersch, Cognitive small cell networks: Energy efficiency and trade-offs, IEEE Transactions on Communications, vol.61, issue.9, pp.4016-4029, 2013.

R. C. Qiu, Cognitive radio communication and networking: Principles and practice, 2012.

T. Hastie, J. Friedman, and R. Tibshiranit, The elements of statistical learning, Springer series in statistics, vol.1, 2001.

J. Christopher, P. Watkins, and . Dayan, Machine learning, vol.8, issue.3-4, pp.279-292, 1992.

V. Mnih, Human-level control through deep reinforcement learning, Nature, vol.518, issue.7540, p.529, 2015.

L. Jouffe, Fuzzy inference system learning by reinforcement methods, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol.28, issue.3, pp.338-355, 1998.

G. , L. L. Matignon, and N. Le-fort-piat, Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams, IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS'07, pp.64-69, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00187279

P. Stone and M. Veloso, Multiagent systems: A survey from a machine learning perspective, Autonomous Robots, vol.8, issue.3, pp.345-383, 2000.

M. Dirani and Z. Altman, A cooperative reinforcement learning approach for inter-cell interference coordination in ofdma cellular networks, Proceedings of the 8th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt), pp.170-176, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00503866

A. Galindo-serrano and L. Giupponi, Distributed q-learning for aggregated interference control in cognitive radio networks, IEEE Transactions on Vehicular Technology, vol.59, issue.4, pp.1823-1834, 2010.

N. Cesa-bianchi, P. Auer, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine learning, vol.47, issue.2-3, pp.235-256, 2002.

S. Bubeck, Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, vol.5, pp.1-122, 2012.

P. Auer, Using confidence bounds for exploitation-exploration trade-offs, Journal of Machine Learning Research, vol.3, pp.397-422, 2002.

N. Korda, E. Kaufmann, and R. Munos, Thompson sampling: An asymptotically optimal finite-time analysis, International Conference on Algorithmic Learning Theory, pp.199-213, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00830033

H. Robbins, Some aspects of the sequential design of experiments, Herbert Robbins Selected Papers, pp.169-177, 1985.

A. Garivier and O. Cappé, The kl-ucb algorithm for bounded stochastic bandits and beyond, Proceedings of the 24th annual Conference On Learning Theory, pp.359-376, 2011.

S. Agrawal and N. Goyal, Analysis of thompson sampling for the multi-armed bandit problem, Conference on Learning Theory, pp.39-40, 2012.

S. Mannor, E. Even-dar, and Y. Mansour, Pac bounds for multi-armed bandit and markov decision processes, International Conference on Computational Learning Theory, pp.255-270, 2002.

S. Mannor, E. Even-dar, and Y. Mansour, Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems, Journal of machine learning research, vol.7, pp.1079-1105, 2006.

V. Dani, Stochastic linear optimization under bandit feedback, 2008.

C. András-antos-yasin-abbasi-yadkori and . Szepesvári, Forced-exploration based algorithms for playing in stochastic linear bandits, COLT Workshop on On-line Learning with Limited Feedback, 2009.

A. J. Mersereau, A structured multiarmed bandit problem and the greedy policy, IEEE Transactions on Automatic Control, vol.54, issue.12, pp.2787-2802, 2009.

P. Rusmevichientong and J. N. Tsitsiklis, Linearly parameterized bandits, Mathematics of Operations Research, vol.35, issue.2, pp.395-411, 2010.

W. Chu, Contextual bandits with linear payoff functions, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp.208-214, 2011.

A. Slivkins, Contextual bandits with similarity information, The Journal of Machine Learning Research, vol.15, issue.1, pp.2533-2568, 2014.

D. Pál, T. Lu, and M. Pál, Contextual multi-armed bandits, Proceedings of the Thirteenth international conference on Artificial Intelligence and Statistics, pp.485-492, 2010.

T. Urvoy, R. Féraud, R. Allesiardo, and F. Clérot, Random forest for the contextual bandit problem, Artificial Intelligence and Statistics, pp.93-101, 2016.

A. Baddeley, Case studies in spatial point process modeling, vol.185, 2006.

, Evolved Universal Terrestrial Radio Access: Radio resource control; protocol specifications, 2012.

A. Adeyemi and D. Ike, A review of load balancing techniques in 3gpp lte system, Int. J. Comput. Sci. Eng, vol.2, issue.4, pp.112-116, 2013.

C. Yang, Concurrent mobility load balancing in lte self-organized networks, IEEE 21st International Conference on Telecommunications (ICT), pp.288-292, 2014.

A. Lobinger and S. Stefanski, Coordinating handover parameter optimization and load balancing in lte self-optimization networks, Vehicular Technology Conference, 2011.

R. Nasri and Z. Altman, Handover adaptation for dynamic load balancing in 3gpp long term evolution systems, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00840737

F. Ict-socratest, , 2012.

K. Sandrasegaran, A. Daeinabi, and P. Ghosal, A dynamic cell range expansion scheme based on fuzzy logic system in lte-advanced heterogeneous networks, Australasian Telecommunication Networks and Applications Conference (ATNAC), pp.6-11, 2014.

P. Tian, Deployment analysis and optimization of macro-pico heterogeneous networks in lte-a system, 15th International Symposium on Wireless Personal Multimedia Communications (WPMC), pp.246-250, 2012.

K. Kikuchi and H. Otsuka, Parameter optimization for adaptive control cre in hetnet, 24th International Symposium on Personal Indoor and Mobile Radio Communications (PIMRC), pp.3334-3338, 2013.

G. Su, B. A. Yasir, and N. Bachache, Range expansion for pico cell in heterogeneous lte-a cellular networks, 2nd International Conference on Computer Science and Network Technology (ICCSNT), pp.1235-1240, 2012.

W. Liao, S. S. Sun, and W. T. Chen, Traffic offloading with rate-based cell range expansion offsets in heterogeneous networks, Wireless Communications and Networking Conference (WCNC), pp.2833-2838, 2014.

Z. Lei, T. Quek, and S. Sun, Adaptive interference coordination in multi-cell ofdma systems, 20th International Symposium on Personal, Indoor and Mobile Radio Communications, pp.2380-2384, 2009.

W. Tang, Joint resource allocation for eicic in heterogeneous networks, Global Communications Conference (GLOBECOM), pp.2011-2016, 2014.

N. A. Rácz, N. Reider, and G. Fodor, On the impact of inter-cell interference in lte, Global Telecommunications Conference (GLOBECOM), pp.1-6, 2008.

B. J. Veancy, S. G. Ruben, and P. Yogesh, Scheduling for interference mitigation using enhanced intercell interference coordination, vol.IJRET, 2014.

A. Weber and O. Stanze, Scheduling strategies for hetnets using eicic, International Conference on Communications (ICC), pp.6787-6791, 2012.

G. Bartoli, Adaptive muting ratio in enhanced inter-cell interference coordination for lte-a systems, International Wireless Communications and Mobile Computing Conference (IWCMC), pp.990-995, 2014.

M. Behjati and J. Cosmas, Self-organizing interference coordination for future lte-advanced network qos improvements, International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), pp.1-6, 2014.

F. E. Salem, Advanced sleep modes and their impact on flow-level performance of 5g networks, IEEE 86th Vehicular Technology Conference (VTC-Fall), pp.1-7, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01711991

S. H. Cheng, L. C. Wang, and A. H. Tsai, Bi-son: Big-data self organizing network for energy efficient ultra-dense small cells, 84th Vehicular Technology Conference (VTC-Fall), pp.1-5, 2016.

K. Samdanis, Self organized network management functions for energy efficient cellular urban infrastructures. Mobile networks and Applications, vol.17, pp.119-131, 2012.

E. Kisielius, Energy efficiency in self organizing networks, OPNETWORK, 2013.

S. E. Elayoubi, L. Saker, and T. Chahed, Minimizing energy consumption via sleep mode in green base station, Wireless Communications and Networking Conference (WCNC), pp.1-6, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01306040

L. Saker, Optimal control of wake up mechanisms of femtocells in heterogeneous networks, IEEE Journal on Selected Areas in Communications, vol.30, issue.3, pp.664-672, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01300128

F. Boccardi, I. Ashraf, and L. Ho, Power savings in small cell deployments via sleep mode techniques, 21st international symposium on Personal, indoor and mobile radio communications workshops (PIMRC Workshops), pp.307-311, 2010.

F. Boccardi, I. Ashraf, and L. Ho, Sleep mode techniques for small cell deployments, IEEE Communications Magazine, vol.49, issue.8, 2011.

O. Iacoboaiea, Coordination of SelfOrganizing Network (SON) functions in next generation radio access networks, 2015.

, Evolved Universal Terrestrial Radio Access: Further advancements for E-UTRA physical layer aspects, 2010.

M. , Guidelines for evaluation of radio interface technologies for imt-advanced, vol.638, 2009.

, 3GPP TS 36.214. Lte; e-utra

S. Hahn, Classification of cells based on mobile network context information for the management of son systems, 81st Vehicular Technology Conference (VTC Spring), pp.1-5, 2015.

S. B. Jemaa, T. Daher, and L. Decreusefond, Cognitive management of self organized radio networks based on multi armed bandit, 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), pp.1-5, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01574283

Y. , W. W. Chen, and Y. Yuan, Combinatorial multi-armed bandit: General framework and applications, International Conference on Machine Learning, pp.151-159, 2013.

D. Chakrabarti, S. Pandey, and D. Agarwal, Multi-armed bandit problems with dependent arms, Proceedings of the 24th international conference on Machine learning, pp.721-728, 2007.

L. Li, A contextual-bandit approach to personalized news article recommendation, Proceedings of the 19th international conference on World wide web, pp.661-670, 2010.

N. G. Pavlidis, Simulation studies of multi-armed bandits with covariates, Tenth International Conference on Computer Modeling and Simulation, pp.493-498, 2008.

W. Chu and S. Park, Personalized recommendation on dynamic content using predictive bilinear models, Proceedings of the 18th international conference on World wide web, pp.691-700, 2009.

H. Wang, Understanding mobile traffic patterns of large scale cellular towers in urban environment, Proceedings of the Internet Measurement Conference, pp.225-238, 2015.

P. Bonnel, Passive mobile phone dataset to construct origin-destination matrix: potentials and limitations, Transportation Research Procedia, vol.11, pp.381-398, 2015.
URL : https://hal.archives-ouvertes.fr/halshs-01664219

X. Wu, Data mining with big data, IEEE transactions on knowledge and data engineering, vol.26, issue.1, pp.97-107, 2014.

P. Berkhin, A survey of clustering data mining techniques, Grouping multidimensional data, pp.25-71, 2006.

N. Mckeown, Openflow: enabling innovation in campus networks, ACM SIGCOMM Computer Communication Review, vol.38, issue.2, pp.69-74, 2008.

. Sushant, B4: Experience with a globally-deployed software defined wan, ACM SIGCOMM Computer Communication Review, vol.43, pp.3-14, 2013.

M. Bansal, Openradio: a programmable wireless dataplane, Proceedings of the first workshop on Hot topics in software defined networks, pp.109-114, 2012.

A. Gudipati, Softran: Software defined radio access network, Proceedings of the second ACM SIGCOMM workshop on Hot topics in software defined networking, pp.25-30, 2013.

G. Poulios, Autonomics and sdn for self-organizing networks, Wireless Communications Systems (ISWCS), pp.830-835, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01067258

K. Tsagkaris, An open framework for programmable, self-managed radio access networks, IEEE Communications Magazine, vol.53, issue.7, pp.154-161, 2015.

K. Tsagkaris, Customizable autonomic network management: integrating autonomic network management and software-defined networking, IEEE Vehicular Technology Magazine, vol.10, issue.1, pp.61-68, 2015.

L. Panait and S. Luke, Cooperative multi-agent learning: The state of the art. Autonomous agents and multi-agent systems, vol.11, pp.387-434, 2005.

I. Da-silva, Impact of network slicing on 5g radio access networks, European Conference on Networks and Communications (EuCNC), pp.153-157, 2016.

, Telecommunication management;study on management and orchestration of network slicing for next generation network (Release 14), 2017.

L. Breiman, Classification and regression trees. Routledge, 2017.

P. Domingos and G. Hulten, Mining high-speed data streams, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.71-80, 2000.

L. Breiman, ;. Daher, S. Ben-jemaa, and L. Decreusefond, Contextual Bandit for Cognitive SON Management, List of Publications and Communications Journals ?, vol.45, pp.5-32, 2001.

?. T. Daher, S. Ben-jemaa, and L. Decreusefond, Q-Learning for Policy Based SON Management in Wireless Access Networks, 15th IFIP/IEEE International Symposium on Integrated Network Management, pp.1091-1096, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01579149

?. T. Daher, S. Ben-jemaa, and L. Decreusefond, Cognitive management of self-Organized radio networks based on multi armed bandit, IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications, pp.1-5, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01574283

?. T. Daher, S. Ben-jemaa, and L. Decreusefond, Cognitive policy based SON management demonstrator, IEEE 21st Conference on Innovation in Clouds, Internet and Networks and Workshops, pp.1-3, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01815821

?. T. Daher, S. Ben-jemaa, and L. Decreusefond, Softwarized and distributed learning for SON management systems, IEEE/IFIP Network Operations and Management Symposium, pp.1-7, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01815829

?. T. Daher, S. Ben-jemaa, and L. Decreusefond, Linear UCB for Online SON Management, IEEE 87th Vehicular Technology Conference, pp.1-5, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01815834

, Demonstrators ? Part of Cockpit 5G Demonstrator for Network Management Optimization Through Artificial Intelligence, 2017.