, Combining inequalities (B.14) to (B.19) we get that, for all i ?

A. Agarwal, S. Negahban, and M. J. Wainwright, Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions, The Annals of Statistics, vol.40, issue.2, pp.1171-1197, 2012.

D. Amelunxen, M. Lotz, M. B. Mccoy, and J. A. Tropp, Living on the edge: Phase transitions in convex programs with random data. Information and Inference: A, Journal of the IMA, vol.3, issue.3, pp.224-294, 2014.

A. Argyriou, T. Evgeniou, and M. Pontil, Convex multi-task feature learning, Machine Learning, vol.73, pp.243-272, 2008.

A. Argyriou, R. Foygel, and N. Srebro, Sparse prediction with the k-support norm, Advances in Neural Information Processing Systems, vol.25, pp.1466-1474, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00858954

F. Bach, Learning with submodular functions: A convex optimization perspective. Foundations and Trends in Machine Learning, vol.6, pp.145-373, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00645271

F. Bach, Duality between subgradient and conditional gradient methods, SIAM Journal on Optimization, vol.25, issue.1, pp.115-129, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00757696

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with sparsityinducing penalties. Foundation and Trends in Machine Learning, vol.1, pp.1-106, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00613125

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with sparsityinducing penalties. Foundations and Trends® in Machine Learning, vol.4, pp.1-106, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00613125

F. Bach, S. Lacoste-julien, and G. Obozinski, On the equivalence between herding and conditional gradient algorithms, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00681128

F. Bach, J. Mairal, and J. Ponce, Convex sparse matrix factorizations, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00345747

O. Banerjee, L. E. Ghaoui, and A. Aspremont, Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data, Journal of Machine learning research, vol.9, pp.485-516, 2008.

D. P. Bertsekas, Nonlinear programming, 1999.

D. P. Bertsekas, Convex optimization algorithms, 2015.

J. Bien, J. Taylor, and R. Tibshirani, A lasso for hierarchical interactions, The Annals of Statistics, vol.41, issue.3, pp.1111-1141, 2013.

J. M. Borwein and A. S. Lewis, Convex analysis and nonlinear optimization, 2006.

S. P. Boyd and L. Vandenberghe, Convex Optimization, 2004.

K. Bredies, D. A. Lorenz, and P. Maass, A generalized conditional gradient method and its connection to an iterative shrinkage method, Computational Optimization and Applications, vol.42, issue.2, pp.173-193, 2009.

E. Candès and B. Recht, Simple bounds for recovering low-complexity models, Mathematical Programming, pp.577-589, 2013.

E. J. Candès, X. Li, Y. Ma, W. , and J. , Robust principal component analysis, Journal of the ACM (JACM), vol.58, issue.3, p.11, 2011.

E. J. Candès and B. Recht, Exact matrix completion via convex optimization, Foundations of Computational mathematics, vol.9, issue.6, p.717, 2009.

M. Champion, V. Picheny, and M. Vignes, Inferring large graphs using 1 -penalized likelihood, Statistics and Computing, vol.28, issue.4, pp.905-921, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01602560

V. Chandrasekaran, P. A. Parrilo, and A. S. Willsky, Latent variable graphical model selection via convex optimization, Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, pp.1610-1613, 2010.
DOI : 10.1109/allerton.2010.5707106
URL : https://authors.library.caltech.edu/34693/1/cpw_lgm_preprint10.pdf

V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, The convex geometry of linear inverse problems, Foundations of Computational mathematics, vol.12, issue.6, pp.805-849, 2012.

V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky, Rank-sparsity incoherence for matrix decomposition, SIAM Journal on Optimization, vol.21, issue.2, pp.572-596, 2011.
DOI : 10.1137/090761793
URL : https://authors.library.caltech.edu/34747/1/cspw_slr_siopt11.pdf

C. Chow and C. Liu, Approximating discrete probability distributions with dependence trees, IEEE transactions on Information Theory, vol.14, issue.3, pp.462-467, 1968.
DOI : 10.1109/tit.1968.1054142
URL : http://www.cs.iastate.edu/~honavar/chou-liu.pdf

A. Cohen, W. Dahmen, and R. Devore, Compressed sensing and best k-term approximation, Journal of the American mathematical society, vol.22, issue.1, pp.211-231, 2009.
DOI : 10.1090/s0894-0347-08-00610-3
URL : http://www.igpm.rwth-aachen.de/Download/reports/pdf/IGPM260.pdf

P. L. Combettes and J. Pesquet, Proximal splitting methods in signal processing, Fixed-point algorithms for inverse problems in science and engineering, pp.185-212, 2011.
DOI : 10.1007/978-1-4419-9569-8_10
URL : https://hal.archives-ouvertes.fr/hal-00643807

S. A. Cook, A taxonomy of problems with fast parallel algorithms, Information and control, vol.64, issue.1-3, pp.2-22, 1985.

C. Dahinden, G. Parmigiani, M. C. Emerick, and P. Bühlmann, Penalized likelihood for sparse contingency tables with an application to full-length cdna libraries, BMC bioinformatics, vol.8, issue.1, p.476, 2007.
DOI : 10.1186/1471-2105-8-476
URL : https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/1471-2105-8-476

A. Aspremont, F. Bach, and L. E. Ghaoui, Optimal solutions for sparse principal component analysis, Journal of Machine Learning Research, vol.9, pp.1269-1294, 2008.

A. Aspremont, O. Banerjee, and L. Ghaoui, First-order methods for sparse covariance selection, SIAM Journal on Matrix Analysis and Applications, vol.30, issue.1, pp.56-66, 2008.

A. Aspremont, L. E. Ghaoui, M. I. Jordan, and G. R. Lanckriet, A direct formulation for sparse PCA using semidefinite programming, Advances in Neural Information Processing Systems, pp.41-48, 2005.

A. Defazio and T. S. Caetano, A convex formulation for learning scale-free networks via submodular relaxation, Advances in Neural Information Processing Systems, pp.1250-1258, 2012.

C. Ding, T. Li, J. , and M. I. , Convex and semi-nonnegative matrix factorizations. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.32, issue.1, pp.45-55, 2010.
DOI : 10.1109/tpami.2008.277
URL : http://ranger.uta.edu/~chqding/papers/Ding-Li-Jordan.pdf

D. L. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via 1 minimization, Proceedings of the National Academy of Sciences, vol.100, issue.5, pp.2197-2202, 2003.
DOI : 10.1073/pnas.0437847100
URL : https://www.pnas.org/content/pnas/100/5/2197.full.pdf

M. Drton and M. H. Maathuis, Structure learning in graphical modeling, Annual Review of Statistics and Its Application, vol.4, pp.365-393, 2017.
DOI : 10.1146/annurev-statistics-060116-053803
URL : http://arxiv.org/pdf/1606.02359

M. Elad and M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries, IEEE Transactions on Image processing, vol.15, issue.12, pp.3736-3745, 2006.
DOI : 10.1109/tip.2006.881969

M. Elad, J. Starck, P. Querre, and D. L. Donoho, Simultaneous cartoon and texture image inpainting using morphological component analysis (mca), Applied and Computational Harmonic Analysis, vol.19, issue.3, pp.340-358, 2005.
DOI : 10.1016/j.acha.2005.03.005
URL : https://doi.org/10.1016/j.acha.2005.03.005

E. Elhamifar, G. Sapiro, and R. Vidal, See all by looking at a few: Sparse modeling for finding representative objects, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.1600-1607, 2012.
DOI : 10.1109/cvpr.2012.6247852
URL : http://www.cis.jhu.edu/~ehsan/Downloads/SMRS-CVPR12-Ehsan.pdf

E. Elhamifar and R. Vidal, Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence, vol.35, pp.2765-2781, 2013.
DOI : 10.1109/tpami.2013.57
URL : http://arxiv.org/pdf/1203.1005

J. M. Fadili, G. Peyré, S. Vaiter, C. Deledalle, and J. Salmon, Stable recovery with analysis decomposable priors, Proc. SampTA'13, pp.113-116, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00926727

A. Forsgren, P. E. Gill, and E. Wong, Primal and dual active-set methods for convex quadratic programming, Mathematical Programming, pp.1-40, 2015.
DOI : 10.1007/s10107-015-0966-2
URL : http://arxiv.org/pdf/1503.08349

A. Forsgren, P. E. Gill, and E. Wong, Primal and dual active-set methods for convex quadratic programming, Mathematical Programming, vol.159, issue.1, pp.469-508, 2016.
DOI : 10.1007/s10107-015-0966-2
URL : http://arxiv.org/pdf/1503.08349

R. Foygel and L. Mackey, Corrupted sensing: Novel guarantees for separating structured signals, IEEE Transactions on Information Theory, vol.60, issue.2, pp.1223-1247, 2014.
DOI : 10.1109/tit.2013.2293654
URL : http://arxiv.org/pdf/1305.2524.pdf

R. Foygel, N. Srebro, and R. R. Salakhutdinov, Matrix reconstruction with the local max norm, Advances in Neural Information Processing Systems, pp.935-943, 2012.

V. Franc, S. Sonnenburg, and T. Werner, Cutting plane methods in machine learning, Optimization for Machine Learning, 2011.

M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Research Logistics (NRL), vol.3, issue.1-2, pp.95-110, 1956.
DOI : 10.1002/nav.3800030109

M. P. Friedlander, I. Macedo, and T. K. Pong, Gauge optimization and duality, SIAM Journal on Optimization, vol.24, issue.4, pp.1999-2022, 2014.
DOI : 10.1137/130940785
URL : http://arxiv.org/pdf/1310.2639

J. Friedman, T. Hastie, and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, vol.9, issue.3, pp.432-441, 2008.
DOI : 10.1093/biostatistics/kxm045
URL : https://academic.oup.com/biostatistics/article-pdf/9/3/432/17742149/kxm045.pdf

J. Friedman, T. Hastie, and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, Journal of statistical software, vol.33, issue.1, p.1, 2010.
DOI : 10.18637/jss.v033.i01
URL : https://www.jstatsoft.org/index.php/jss/article/view/v033i01/v33i01.pdf

N. Friedman, Inferring cellular networks using probabilistic graphical models, Science, vol.303, issue.5659, pp.799-805, 2004.
DOI : 10.1126/science.1094068

D. Geiger, T. Verma, and J. Pearl, Identifying independence in bayesian networks, Networks, vol.20, issue.5, pp.507-534, 1990.

Q. Gu and A. Banerjee, High dimensional structured superposition models, Advances In Neural Information Processing Systems, pp.3691-3699, 2016.

Q. Gu, Z. Wang, and H. Liu, Low-rank and sparse structure pursuit via alternating minimization, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp.600-609, 2016.

B. D. Haeffele and R. Vidal, Global optimality in tensor factorization, deep learning, and beyond, 2015.

Z. Harchaoui, A. Juditsky, and A. Nemirovski, Conditional gradient algorithms for norm-regularized smooth convex optimization, Mathematical Programming, vol.152, issue.1-2, pp.75-112, 2015.
DOI : 10.1007/s10107-014-0778-9
URL : https://hal.archives-ouvertes.fr/hal-00978368

D. Heckerman, D. Geiger, and D. M. Chickering, Learning bayesian networks: The combination of knowledge and statistical data, Machine learning, vol.20, issue.3, pp.197-243, 1995.

M. Hong, X. Wang, M. Razaviyayn, and Z. Luo, Iteration complexity analysis of block coordinate descent methods, 2013.
DOI : 10.1007/s10107-016-1057-8
URL : http://arxiv.org/pdf/1310.6957

M. J. Hosseini and S. Lee, Learning sparse gaussian graphical models with overlapping blocks, Advances in Neural Information Processing Systems, pp.3808-3816, 2016.

J. Z. Huang, N. Liu, M. Pourahmadi, and L. Liu, Covariance matrix selection and estimation via penalised normal likelihood, Biometrika, vol.93, issue.1, pp.85-98, 2006.

A. Hyvärinen, Estimation of non-normalized statistical models by score matching, Journal of Machine Learning Research, vol.6, pp.695-709, 2005.

L. Jacob, G. Obozinski, and J. Vert, Group lasso with overlap and graph lasso, ICML, 2009.

M. Jaggi, Revisiting frank-wolfe: Projection-free sparse convex optimization, ICML (1), pp.427-435, 2013.

A. Jalali, C. C. Johnson, and P. K. Ravikumar, On learning discrete graphical models using greedy methods, Advances in Neural Information Processing Systems, pp.1935-1943, 2011.

A. Jalali, S. Sanghavi, C. Ruan, and P. K. Ravikumar, A dirty model for multi-task learning, Advances in Neural Information Processing Systems, pp.964-972, 2010.

R. Jenatton, J. Audibert, and F. Bach, Structured variable selection with sparsityinducing norms, JMLR, vol.12, pp.2777-2824, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00377732

R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, Proximal methods for hierarchical sparse coding, JMLR, vol.12, pp.2297-2334, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00516723

A. B. Kahn, Topological sorting of large networks, Communications of the ACM, vol.5, issue.11, pp.558-562, 1962.

H. Karimi, J. Nutini, and M. Schmidt, Linear convergence of gradient and proximalgradient methods under the polyak-?ojasiewicz condition, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.795-811, 2016.

T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM review, vol.51, issue.3, pp.455-500, 2009.

D. Koller and N. Friedman, Probabilistic graphical models: principles and techniques, 2009.

M. Kowalski, P. Weiss, A. Gramfort, A. , and S. , Accelerating ista with an active set strategy, OPT 2011: 4th International Workshop on Optimization for Machine Learning, p.7, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00696992

V. Krishnamurthy, S. D. Ahipasaoglu, and A. Aspremont, A pathwise algorithm for covariance selection, Optimization for Machine Learning, pp.479-494, 2011.

S. Lacoste-julien and M. Jaggi, On the global linear convergence of Frank-Wolfe optimization variants, Advances in Neural Information Processing Systems, vol.28, pp.496-504, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01248675

S. Lacoste-julien, M. Jaggi, M. Schmidt, and P. Pletscher, Block-coordinate frankwolfe optimization for structural svms, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00720158

W. Lam and F. Bacchus, Using causal information and local measures to learn bayesian networks, Uncertainty in Artificial Intelligence, pp.243-250, 1993.

T. Larsson, A. Migdalas, and M. Patriksson, A generic column generation principle: derivation and convergence analysis, Operational Research, vol.15, issue.2, pp.163-198, 2015.

S. L. Lauritzen, Graphical models, vol.17, 1996.

D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol.401, issue.6755, p.788, 1999.

S. Lee, V. Ganapathi, and D. Koller, Efficient structure learning of markov networks using l_1-regularization, Advances in neural Information processing systems, pp.817-824, 2007.

E. Levina, A. Rothman, and J. Zhu, Sparse estimation of large covariance matrices via a nested lasso penalty, The Annals of Applied Statistics, pp.245-263, 2008.

F. Li and Y. Yang, Using modified lasso regression to learn large undirected graphs in a probabilistic framework, Proceedings of the National Conference on Artificial Intelligence, vol.20, p.801, 2005.

L. Lin, M. Drton, and A. Shojaie, Estimation of high-dimensional graphical models using regularized score matching, Electronic Journal of Statistics, vol.10, issue.1, pp.806-854, 2016.
DOI : 10.1214/16-ejs1126
URL : https://doi.org/10.1214/16-ejs1126

G. Liu, Z. Lin, Y. , and Y. , Robust subspace segmentation by low-rank representation, Proceedings of the 27th international conference on machine learning (ICML-10), pp.663-670, 2010.

J. Liu, P. Musialski, P. Wonka, Y. , and J. , Tensor completion for estimating missing values in visual data. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.35, issue.1, pp.208-220, 2013.

F. Locatello, R. Khanna, M. Tschannen, J. , and M. , A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol.54, pp.860-868, 2017.

F. Locatello, M. Tschannen, G. Rätsch, J. , and M. , Greedy algorithms for cone constrained optimization with convergence guarantees, Advances in Neural Information Processing Systems, pp.773-784, 2017.

J. Mairal, F. Bach, and J. Ponce, Sparse modeling for image and vision processing, Foundations and Trends® in Computer Graphics and Vision, vol.8, issue.2-3, pp.85-283, 2014.
DOI : 10.1561/0600000058
URL : https://hal.archives-ouvertes.fr/hal-01081139

A. Maurer and M. Pontil, Structured sparsity and generalization, The Journal of Machine Learning Research, vol.13, issue.1, pp.671-690, 2012.

M. B. Mccoy, V. Cevher, Q. T. Dinh, A. Asaei, and L. Baldassarre, Convexity in source separation: Models, geometry, and algorithms, IEEE Signal Processing Magazine, vol.31, issue.3, pp.87-95, 2014.

M. B. Mccoy and J. A. Tropp, The achievable performance of convex demixing, 2013.

M. B. Mccoy and J. A. Tropp, Sharp recovery bounds for convex demixing, with applications, Foundations of Computational Mathematics, vol.14, issue.3, pp.503-567, 2014.

N. Meinshausen and P. Bühlmann, High-dimensional graphs and variable selection with the lasso. The annals of statistics, pp.1436-1462, 2006.

Z. Meng, B. Eriksson, and A. Hero, Learning latent variable gaussian graphical models, Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp.1269-1277, 2014.

B. Moghaddam, Y. Weiss, A. , and S. , Spectral bounds for sparse pca: Exact and greedy algorithms, Advances in Neural Information Processing Systems, pp.915-922, 2006.

J. Moreau, Fonctions convexes duales et points proximaux dans un espace hilbertien, CR Acad. Sci. Paris Ser. A Math, vol.255, pp.2897-2899, 1962.
URL : https://hal.archives-ouvertes.fr/hal-01867195

J. Moreau, Proximité et dualité dans un espace hilbertien, vol.93, pp.273-299, 1965.
DOI : 10.24033/bsmf.1625
URL : http://www.numdam.org/article/BSMF_1965__93__273_0.pdf

E. Ndiaye, O. Fercoq, A. Gramfort, and J. Salmon, Gap safe screening rules for sparsity enforcing penalties, J. Mach. Learn. Res, vol.18, issue.128, pp.1-33, 2017.

S. N. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu, A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers, Statistical Science, vol.27, issue.4, pp.538-557, 2012.

Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM Journal on Optimization, vol.22, issue.2, pp.341-362, 2012.

Y. Nesterov, Complexity bounds for primal-dual methods minimizing the model of objective function, 2015.

J. Nocedal and S. Wright, Numerical optimization, 2006.

G. Obozinski and F. Bach, Convex relaxation for combinatorial penalties, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00694765

G. Obozinski and F. Bach, A unified perspective on convex structured sparsity: Hierarchical, symmetric, submodular norms and beyond, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01412385

G. Obozinski, L. Jacob, and J. Vert, Group Lasso with overlaps: the Latent Group Lasso approach, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00628498

G. Obozinski, B. Taskar, J. , and M. , Multi-task feature selection, Statistics Department, 2006.

G. Obozinski, B. Taskar, J. , and M. I. , Joint covariate selection and joint subspace selection for multiple classification problems, Statistics and Computing, vol.20, issue.2, pp.231-252, 2010.

F. Ong and M. Lustig, Beyond low rank+ sparse: Multiscale low rank matrix decomposition, IEEE journal of selected topics in signal processing, vol.10, issue.4, pp.672-687, 2016.

R. K. Pace and R. Barry, Sparse spatial autoregressions, Statistics and Probability Letters, vol.33, issue.3, pp.291-297, 1997.

Z. Qin, K. Scheinberg, and D. Goldfarb, Efficient block-coordinate descent algorithms for the group lasso, Mathematical Programming Computation, vol.5, issue.2, pp.143-169, 2013.

N. Rao, P. Shah, W. , and S. , Forward -Backward Greedy Algorithms for Atomic Norm Regularization, IEEE Transactions on Signal Processing, vol.63, issue.21, pp.5798-5811, 2015.

P. Ravikumar, M. J. Wainwright, and J. Lafferty, High-dimensional graphical model selection using l1-regularized logistic regression, Annals of Statistics, 2009.

A. Ray, S. Sanghavi, and S. Shakkottai, Improved greedy algorithms for learning graphical models, IEEE Transactions on Information Theory, vol.61, issue.6, pp.3457-3468, 2015.

E. Richard, F. R. Bach, and J. Vert, Intersecting singularities for multi-structured estimation, ICML (3), pp.1157-1165, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00918253

E. Richard, G. R. Obozinski, and J. Vert, Tight convex relaxations for sparse matrix factorization, Advances in Neural Information Processing Systems, pp.3284-3292, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01101878

R. Rockafellar, Convex Analysis, 1970.

V. Roth and B. Fischer, The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms, Proceedings of the 25th international conference on Machine learning, pp.848-855, 2008.

A. J. Rothman, P. J. Bickel, E. Levina, and J. Zhu, Sparse permutation invariant covariance estimation, Electronic Journal of Statistics, vol.2, pp.494-515, 2008.
DOI : 10.1214/08-ejs176
URL : https://doi.org/10.1214/08-ejs176

M. Schmidt, A. Niculescu-mizil, and K. Murphy, Learning graphical model structure using l1-regularization paths, AAAI, vol.7, pp.1278-1283, 2007.

S. Shalev-shwartz and A. Tewari, Stochastic methods for l1-regularized loss minimization, Journal of Machine Learning Research, vol.12, pp.1865-1892, 2011.

Y. She and H. Jiang, Group regularized estimation under structural hierarchy, 2014.

P. Spirtes and C. Glymour, An algorithm for fast recovery of sparse causal graphs, Social science computer review, vol.9, issue.1, pp.62-72, 1991.

K. M. Tan, P. London, K. Mohan, S. Lee, M. Fazel et al., Learning graphical models with hubs, Journal of Machine Learning Research, vol.15, issue.1, pp.3297-3331, 2014.

S. Tao, Y. Sun, and D. Boley, Inverse covariance estimation with structured groups, 26th International Joint Conference on Artificial Intelligence, 2017.

R. Tibshirani, Regression shrinkage and selection via the Lasso, J. Roy. Stat. Soc. B, vol.58, issue.1, 1996.

A. N. Tikhonov, On the solution of ill-posed problems and the method of regularization, Doklady Akademii Nauk, vol.151, pp.501-504, 1963.

R. Tomioka and T. Suzuki, Convex tensor decomposition via structured Schatten norm regularization, Advances in Neural information Processing Systems, pp.1331-1339, 2013.

J. A. Tropp, Just relax: Convex programming methods for subset selection and sparse approximation, p.404, 2004.

P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, Journal of optimization theory and applications, vol.109, issue.3, pp.475-494, 2001.

S. Vaiter, M. Golbabaee, J. Fadili, and G. Peyré, Model selection with low complexity priors. Information and Inference: A, Journal of the IMA, vol.4, issue.3, pp.230-287, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00842603

S. Vaiter, G. Peyré, and J. Fadili, Low complexity regularization of linear inverse problems, Sampling Theory, a Renaissance, pp.103-153, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01018927

R. S. Varga, On diagonal dominance arguments for bounding a ?1 ? . Linear Algebra and its applications, vol.14, pp.211-217, 1976.

R. Vidal, Subspace clustering, IEEE Signal Processing Magazine, vol.28, issue.2, pp.52-68, 2011.

M. Vinyes and G. Obozinski, Fast column generation for atomic norm regularization, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol.54, pp.547-556, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01502575

M. Vinyes and G. Obozinski, Learning the effect of latent variables in gaussian graphical models with unobserved variables, 2018.

K. Wimalawarne, M. Sugiyama, R. ;. Tomioka, M. Welling, C. Cortes et al., Multitask learning meets tensor factorization: task imputation via convex optimization, Advances in Neural Information Processing Systems, vol.27, pp.2825-2833, 2014.

K. Wimalawarne, R. Tomioka, and M. Sugiyama, Theoretical and experimental analyses of tensor-based regression and classification, Neural Computation, vol.4, issue.28, pp.686-715, 2016.

P. Wolfe, Convergence theory in nonlinear programming. Integer and nonlinear programming, pp.1-36, 1970.

P. Wolfe, Finding the nearest point in a polytope, Mathematical Programming, vol.11, issue.1, pp.128-149, 1976.

J. Wright, A. Ganesh, K. Min, and Y. Ma, Compressive principal component pursuit. Information and Inference: A, Journal of the IMA, vol.2, issue.1, pp.32-68, 2013.

S. J. Wright, Coordinate descent algorithms. Mathematical Programming, vol.151, pp.3-34, 2015.

H. Xu, C. Caramanis, and S. Sanghavi, Robust pca via outlier pursuit, Advances in Neural Information Processing Systems, pp.2496-2504, 2010.

P. Xu, J. Ma, and Q. Gu, Speeding up latent variable gaussian graphical model estimation via nonconvex optimization, Advances in Neural Information Processing Systems, pp.1930-1941, 2017.

X. Yan and J. Bien, Hierarchical sparse modeling: A choice of two regularizers, 2015.

C. You, C. Li, D. P. Robinson, and R. Vidal, Oracle based active set algorithm for scalable elastic net subspace clustering, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3928-3937, 2016.

Y. Yu, X. Zhang, and D. Schuurmans, Generalized conditional gradient for sparse estimation, 2014.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of The Royal Statistical Society Series B, vol.68, issue.1, pp.49-67, 2006.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, J. Roy. Stat. Soc. B, vol.68, pp.49-67, 2006.

M. Yuan and Y. Lin, Model selection and estimation in the gaussian graphical model, Biometrika, pp.19-35, 2007.

X. Yuan and T. Zhang, Truncated power method for sparse eigenvalue problems, Journal of Machine Learning Research, vol.14, pp.899-925, 2013.

Y. Zhang, A. Aspremont, and L. Ghaoui, Sparse pca: Convex relaxations, algorithms and applications, Handbook on Semidefinite, Conic and Polynomial Optimization, pp.915-940, 2012.

Y. Zhou, Structure learning of probabilistic graphical models: a comprehensive survey, 2011.

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.67, issue.2, pp.301-320, 2005.

H. Zou, T. Hastie, and R. Tibshirani, Sparse principal component analysis, Journal of computational and graphical statistics, vol.15, issue.2, pp.265-286, 2006.