J. Friedman, T. Hastie, and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, vol.9, issue.3, 2007.
DOI : 10.1093/biostatistics/kxm045
URL : https://academic.oup.com/biostatistics/article-pdf/9/3/432/17742149/kxm045.pdf

C. Hennig, M. Meila, F. Murtagh, and R. Rocci, Handbook of Cluster Analysis. Chapman & Hall/CRC Handbooks of Modern Statistical Methods, 2015.

J. Macqueen, Some methods for classification and analysis of multivariate observations, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp.281-297, 1967.

S. Dasgupta, The Hardness of K-means Clustering, 2008.

D. Aloise, A. Deshpande, P. Hansen, and P. Popat, NP-hardness of Euclidean sum-of-squares clustering, Machine Learning, vol.27, issue.2, pp.245-248, 2009.
DOI : 10.1007/s10994-009-5103-0

M. Inaba, N. Katoh, and H. Imai, -clustering, Proceedings of the tenth annual symposium on Computational geometry , SCG '94, pp.332-339, 1994.
DOI : 10.1145/177424.178042

S. Lloyd, Least squares quantization in PCM, IEEE Transactions on Information Theory, vol.28, issue.2, pp.129-137, 1982.
DOI : 10.1109/TIT.1982.1056489
URL : http://www.cs.toronto.edu/~roweis/csc2515-2006/readings/lloyd57.pdf

D. Arthur and S. Vassilvitskii, -means method?, Proceedings of the twenty-second annual symposium on Computational geometry , SCG '06, pp.144-153, 2006.
DOI : 10.1145/1137856.1137880

D. Arthur and S. Vassilvitskii, K-means++: The advantages of careful seeding, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '07, pp.1027-1035, 2007.

V. Makarenkov and P. Legendre, Optimal variable weighting for ultrametric and additive trees and k-means partitioning: Methods and software, Journal of Classification, vol.18, issue.2, pp.245-271, 2001.

J. Huang, J. Xu, M. Ng, and Y. Ye, Weighting Method for Feature Selection in K-Means, 2007.

R. Cordeiro, D. Amorim, and B. Mirkin, Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering. Pattern Recogn, pp.1061-1075, 2012.

L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, 1990.
DOI : 10.1002/9780470316801

D. Sculley, Web-scale k-means clustering, Proceedings of the 19th international conference on World wide web, WWW '10, pp.1177-1178, 2010.
DOI : 10.1145/1772690.1772862

R. Ostrovsky, Y. Rabani, L. J. Schulman, and C. Swamy, The effectiveness of lloydtype methods for the k-means problem, 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), pp.165-17675, 2006.

A. Guénoche, P. Hansen, and B. Jaumard, Efficient algorithms for divisive hierarchical clustering with the diameter criterion, Journal of Classification, vol.7, issue.6, pp.5-30, 1991.
DOI : 10.1002/j.1538-7305.1957.tb01515.x

R. L. Graham, On the History of the Minimum Spanning Tree Problem, IEEE Annals of the History of Computing, vol.7, issue.1, pp.43-57, 1985.
DOI : 10.1109/MAHC.1985.10011

F. Murtagh and P. Contreras, Algorithms for hierarchical clustering: an overview Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, pp.86-97
DOI : 10.1002/widm.1219

H. Joe and . Ward-jr, Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, vol.58, issue.301, pp.236-244, 1963.

G. N. Lance and W. T. Williams, A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems, The Computer Journal, vol.9, issue.4, pp.373-380, 1967.
DOI : 10.1093/comjnl/9.4.373

V. Batagelj, Generalized ward and related clustering problems Classification and Related Methods of Data Analysis, pp.67-74, 1988.

F. Murtagh, Multidimensional clustering algorithms, 1985.

M. Jambu, Exploration informatique et statistique des données. Collection technique et scientifique des télécommunications. Dunod, 1989.

W. E. Donath and A. J. Hoffman, Lower Bounds for the Partitioning of Graphs, IBM Journal of Research and Development, vol.17, issue.5, pp.420-425, 1973.
DOI : 10.1147/rd.175.0420

M. Fiedler, Algebraic connectivity of graphs, Czechoslovak Mathematical Journal, vol.23, issue.2, pp.298-305, 1973.

L. Ulrike-von, A tutorial on spectral clustering, Statistics and Computing, vol.17, issue.4, pp.395-416, 2007.

A. Daniel, S. Spielman, and . Teng, Spectral partitioning works: Planar graphs and finite element meshes. Linear Algebra and its Applications, pp.284-305, 2007.

J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell, vol.22, issue.8, pp.888-905, 2000.

Y. Andrew, M. I. Ng, Y. Jordan, and . Weiss, On spectral clustering: Analysis and an algorithm, In ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, pp.849-856, 2001.

L. Hagen and A. B. Kahng, New spectral methods for ratio cut partitioning and clustering, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.11, issue.9, pp.1074-1085, 1992.
DOI : 10.1109/43.159993

D. Wagner and F. Wagner, Between Min Cut and Graph Bisection, pp.744-750, 1993.
DOI : 10.1007/3-540-57182-5_65
URL : ftp://ftp.math.tu-berlin.de/pub/Preprints/combi/Report-307-1991.ps.Z

S. Guattery and G. L. Miller, On the Quality of Spectral Separators, SIAM Journal on Matrix Analysis and Applications, vol.19, issue.3, pp.701-719, 1998.
DOI : 10.1137/S0895479896312262

B. Nadler and M. Galun, Fundamental limitations of spectral clustering, Advanced in Neural Information Processing Systems 19, B. Schölkopf and, pp.1017-1024, 2007.

A. Hardy, On the number of clusters, Computational Statistics & Data Analysis, vol.23, issue.1, pp.83-9600022, 1996.
DOI : 10.1016/S0167-9473(96)00022-9

W. Glenn, M. C. Milligan, and . Cooper, An examination of procedures for determining the number of clusters in a data set, Psychometrika, vol.50, issue.2, pp.159-179, 1985.

P. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, vol.20, issue.1, pp.53-65, 1987.
DOI : 10.1016/0377-0427(87)90125-7

R. Tibshirani, G. Walther, and T. Hastie, Estimating the number of clusters in a data set via the gap statistic, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.63, issue.2, pp.411-423, 2001.
DOI : 10.1111/1467-9868.00293

G. Schwarz, Estimating the Dimension of a Model, The Annals of Statistics, vol.6, issue.2, pp.461-464, 1978.
DOI : 10.1214/aos/1176344136

H. Akaike, Information theory and an extension of the maximum likelihood principle, Second International Symposium on Information Theory, pp.267-281, 1973.

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society. Series B (Methodological), vol.39, issue.1, pp.1-38, 1977.

R. Bellman, Dynamic Programming, 1957.

A. Zimek, E. Schubert, and H. Kriegel, A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining, pp.363-387

C. Bouveyron and C. Brunet, Model-based clustering of high-dimensional data: A review, Computational Statistics & Data Analysis, vol.71, pp.52-78, 2013.
DOI : 10.1016/j.csda.2012.12.008
URL : https://hal.archives-ouvertes.fr/hal-00750909

L. Parsons, E. Haque, and H. Liu, Subspace clustering for high dimensional data, ACM SIGKDD Explorations Newsletter, vol.6, issue.1, pp.90-105, 2004.
DOI : 10.1145/1007730.1007731

C. Giraud, Introduction to High-Dimensional Statistics. Chapman & Hall/CRC Monographs on Statistics & Applied Probability, 2014.

P. Bühlmann and S. Van-de-geer, Statistics for High-Dimensional Data: Methods, Theory and Applications, 2011.
DOI : 10.1007/978-3-642-20192-9

O. Banerjee and L. , El Ghaoui, and A. d'Aspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data, Journal of Machine Learning Research, 2008.

M. Yuan and Y. Lin, Model selection and estimation in the Gaussian graphical model, Biometrika, vol.94, issue.1, 2007.
DOI : 10.1093/biomet/asm018

N. Meinshausen and P. Bühlmann, High-dimensional graphs and variable selection with the Lasso, The Annals of Statistics, vol.34, issue.3, pp.1436-1462, 2006.
DOI : 10.1214/009053606000000281
URL : http://doi.org/10.1214/009053606000000281

D. Edwards, Introduction to Graphical Modelling. Springer Texts in Statistics, 2000.

A. P. Dempster, Covariance Selection, Biometrics, vol.28, issue.1, pp.157-175, 1972.
DOI : 10.2307/2528966

L. Breiman, Heuristics of instability and stabilization in model selection, The Annals of Statistics, vol.24, issue.6, pp.2350-2383, 1996.
DOI : 10.1214/aos/1032181158

R. Mazumder, Topics in sparse multivariate statistics (thesis), 2012.

N. Parikh and S. Boyd, Proximal Algorithms, Foundations and Trends?? in Optimization, vol.1, issue.3, pp.127-239, 2014.
DOI : 10.1561/2400000003
URL : http://www.nowpublishers.com/article/DownloadSummary/OPT-003

A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542

A. B. Tsybakov, Aggregation and minimax optimality in high-dimensional estimation, Proceedings of the International Congress of Mathematicians, pp.225-246, 2014.

O. Catoni, The mixture approach to universal model selection, 1997.

Y. Yang, Mixing strategies for density estimation, The Annals of Statistics, vol.28, issue.1, pp.75-87, 2000.
DOI : 10.1214/aos/1016120365
URL : http://doi.org/10.1214/aos/1016120365

A. Juditsky, P. Rigollet, and A. B. Tsybakov, Learning by mirror averaging, The Annals of Statistics, vol.36, issue.5, pp.2183-2206, 2008.
DOI : 10.1214/07-AOS546
URL : https://hal.archives-ouvertes.fr/hal-00341026

A. B. Yuditski?-i, A. V. Nazin, A. B. Tsybakov, and N. Vayatis, Recursive aggregation of estimators by the mirror descent method with averaging, Problemy Peredachi Informatsii, vol.41, issue.4, pp.78-96, 2005.

S. Arnak, A. B. Dalalyan, and . Tsybakov, Mirror averaging with sparsity priors, Bernoulli, vol.18, issue.3, pp.914-944, 2012.

P. C. Bellec, Optimal exponential bounds for aggregation of density estimators, Bernoulli, vol.23, issue.1, 2014.
DOI : 10.3150/15-BEJ742

C. Butucea, J. Delmas, A. Dutfoy, and R. Fischer, Optimal exponential bounds for aggregation of estimators for the Kullback-Leibler loss, Electronic Journal of Statistics, vol.11, issue.1, 2016.
DOI : 10.1214/17-EJS1269

D. Dai, P. Rigollet, and T. Zhang, Deviation optimal learning using greedy $Q$-aggregation, The Annals of Statistics, vol.40, issue.3, pp.1878-1905, 2012.
DOI : 10.1214/12-AOS1025
URL : http://doi.org/10.1214/12-aos1025

P. Rigollet, Kullback???Leibler aggregation and misspecified generalized linear models, The Annals of Statistics, vol.40, issue.2, pp.639-665, 2012.
DOI : 10.1214/11-AOS961SUPP
URL : http://doi.org/10.1214/11-aos961

. Ph, A. B. Rigollet, and . Tsybakov, Linear and convex aggregation of density estimators, Math. Methods Statist, vol.16, issue.3, pp.260-280, 2007.

K. Lounici, Generalized mirror averaging and D-convex aggregation, Mathematical Methods of Statistics, vol.16, issue.3, pp.246-259, 2007.
DOI : 10.3103/S1066530707030040
URL : https://hal.archives-ouvertes.fr/hal-00204674

F. Bunea, A. B. Tsybakov, and M. H. Wegkamp, Aggregation for Gaussian regression, The Annals of Statistics, vol.35, issue.4, pp.1674-1697, 2007.
DOI : 10.1214/009053606000001587
URL : http://doi.org/10.1214/009053606000001587

F. Bunea, A. B. Tsybakov, M. H. Wegkamp, and A. Barbu, SPADES and mixture models, The Annals of Statistics, vol.38, issue.4, pp.2525-2558, 2010.
DOI : 10.1214/09-AOS790
URL : https://hal.archives-ouvertes.fr/hal-00514124

K. Bertin, E. L. Pennec, and V. Rivoirard, Adaptive Dantzig density estimation, Annales de l'Institut Henri Poincar??, Probabilit??s et Statistiques, vol.47, issue.1, pp.43-74, 2011.
DOI : 10.1214/09-AIHP351
URL : https://hal.archives-ouvertes.fr/hal-00381984

J. Q. Li and A. R. Barron, Mixture density estimation, Advances in Neural Information Processing Systems 12, pp.279-285, 1999.

J. Q. Li, Estimation of Mixture Models, 1999.

A. Rakhlin, D. Panchenko, and S. Mukherjee, Risk bounds for mixture density estimation, ESAIM: Probability and Statistics, vol.23, pp.220-229, 2005.
DOI : 10.1214/aos/1176324524
URL : http://www.stat.duke.edu/~sayan/webPub/density_estimation_esaim.pdf

A. Juditsky and A. Nemirovski, Functional aggregation for nonparametric regression . The Annals of Statistics, pp.681-712, 2000.

A. B. Tsybakov, Optimal Rates of Aggregation, Computational Learning Theory and Kernel Machines, COLT/Kernel, Proceedings, pp.303-313, 2003.
DOI : 10.1007/978-3-540-45167-9_23
URL : https://hal.archives-ouvertes.fr/hal-00104867

G. Lecué, Lower bounds and aggregation in density estimation, J. Mach. Learn. Res, vol.7, pp.971-981, 2006.

D. Xia and V. Koltchinskii, Estimation of low rank density matrices: Bounds in Schatten norms and other distances, Electronic Journal of Statistics, vol.10, issue.2, pp.2717-2745, 2016.
DOI : 10.1214/16-EJS1192

S. Van-de-geer and P. Bühlmann, On the conditions used to prove oracle results for the Lasso, Electronic Journal of Statistics, vol.3, issue.0, pp.1360-1392, 2009.
DOI : 10.1214/09-EJS506

G. Lecué and S. Mendelson, On the optimality of the empirical risk minimization procedure for the Convex aggregation problem, Annales de l'Institut Henri Poincar??, Probabilit??s et Statistiques, vol.49, issue.1, pp.288-306, 2013.
DOI : 10.1214/11-AIHP458

G. Lecué, Empirical risk minimization is optimal for the convex aggregation problem, Bernoulli, vol.19, issue.5B, pp.2153-2166, 2013.
DOI : 10.3150/12-BEJ447

P. Rigollet and A. Tsybakov, Exponential Screening and optimal rates of sparse estimation, The Annals of Statistics, vol.39, issue.2, pp.731-771, 2011.
DOI : 10.1214/10-AOS854
URL : https://hal.archives-ouvertes.fr/hal-00606059

P. Rigollet, Oracle inequalities, aggregation and adaptation, 2006.
URL : https://hal.archives-ouvertes.fr/tel-00115494

G. Raskutti, M. J. Wainwright, and B. Yu, Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$-Balls, IEEE Transactions on Information Theory, vol.57, issue.10, pp.6976-6994, 2011.
DOI : 10.1109/TIT.2011.2165799

Z. Wang, S. Paterlini, F. Gao, and Y. Yang, Adaptive minimax regression estimation over sparse q -hulls, J. Mach. Learn. Res, vol.15, pp.1675-1711, 2014.

C. Pierre, A. S. Bellec, E. Dalalyan, Q. Grappin, and . Paris, On the prediction loss of the lasso in the partially labeled setting, 2016.

S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence, OUP Oxford, 2013.
DOI : 10.1093/acprof:oso/9780199535255.001.0001
URL : https://hal.archives-ouvertes.fr/hal-00794821

V. Koltchinskii, Oracle inequalities in empirical risk minimization and sparse recovery problems, Lectures from the 38th Probability Summer School held in Saint-Flour, 2008.
DOI : 10.1007/978-3-642-22147-7

M. Ledoux and M. Talagrand, Probability in Banach Spaces: isoperimetry and processes, 1991.
DOI : 10.1007/978-3-642-20212-4

A. B. Tsybakov, Introduction to Nonparametric Estimation, 2009.
DOI : 10.1007/b13794

. Yu and . Nesterov, Gradient methods for minimizing composite objective function. CORE Discussion Papers, Center for Operations Research and Econometrics (CORE), 2007.

Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/sqr(k)), Soviet Mathematics Doklady, vol.27, pp.372-376, 1983.

J. Duchi, S. Shalev-shwartz, Y. Singer, and T. Chandra, Efficient projections onto the l1-ball for learning in high dimensions, Proceedings of the 25th International Conference on Machine Learning, ICML '08, pp.272-279, 2008.

W. Wang and A. Migueí, Carreira-perpiñán, and Weprovideanelementaryproofofasimple Efficientalgorithmforcomputingtheeuclideanprojection. Projection onto the probability simplex: An, 2013.

E. Candes and T. Tao, The Dantzig selector: Statistical estimation when p is much larger than n, The Annals of Statistics, vol.35, issue.6, pp.2313-2351, 2007.
DOI : 10.1214/009053606000001523
URL : http://doi.org/10.1214/009053606000001523

J. Peter, . Bickel, A. B. Ritov, and . Tsybakov, Simultaneous analysis of lasso and dantzig selector, Ann. Statist, vol.37, issue.4, pp.1705-173208, 2009.

E. Jones, T. Oliphant, and P. Peterson, SciPy: Open source scientific tools for Python, 2001.

S. J. Sheather and M. C. Jones, A reliable data-based bandwidth selection method for kernel density estimation, Journal of the Royal Statistical Society, Series B: Methodological, vol.53, pp.683-690, 1991.

D. W. Scott, Multivariate density estimation: theory, practice, and visualization, 2015.
DOI : 10.1002/9781118575574

P. Hall and J. S. Marron, Estimation of integrated squared density derivatives, Statistics & Probability Letters, vol.6, issue.2, pp.109-115109, 1987.
DOI : 10.1016/0167-7152(87)90083-6

M. C. Jones and S. J. Sheather, Using non-stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives, Statistics & Probability Letters, vol.11, issue.6, pp.511-514511, 1991.
DOI : 10.1016/0167-7152(91)90116-9

C. J. Stone, Optimal Rates of Convergence for Nonparametric Estimators, The Annals of Statistics, vol.8, issue.6, pp.1348-1360, 1980.
DOI : 10.1214/aos/1176345206

A. Siu-kwan-lam, S. Pitrou, and . Seibert, Numba: A llvm-based python jit compiler, Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, LLVM '15, pp.1-7

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

M. Gavish and D. L. Donoho, The optimal hard threshold for singular values is 4

J. H. Friedman, On multivariate goodness of fit and two sample testing, p.30908, 2003.
DOI : 10.2172/826696