, Publications et Résumé des Chapitres Publications. Les contributions de la thèse ont fait l'objet de publications et de présentations dans des conférences et journaux d'apprentissage statistique : Auteurs : E. Ndiaye, O. Fercoq, A. Gramfort, J. Salmon. 1´"1´"Gap Safe Screening Rules for Sparse Multi-task and Multi-class Models, pp.811-819, 2015.

, Gap Safe Screening Rules for Sparse-Group Lasso, Advances in Neural Information Processing Systems, pp.388-396, 2016.

, Gap Safe Screening Rules for Sparsity Enforcing Penalties, The Journal of Machine Learning Research, vol.18, issue.1, pp.4671-4703, 2017.

:. E. Auteurs, O. Ndiaye, A. Fercoq, V. Gramfort, J. Leclère et al.,

, Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression, Journal of Physics : Conference Series, vol.904, issue.1, p.12006, 2017.

:. E. Auteurs, T. Ndiaye, O. Le, J. Fercoq, I. Salmon et al., Nous présentons les résultats obtenus dans les différents chapitres de la thèse comme suit. Notations. La variable d'optimisation est un vecteur ? " p? 1 ,. .. , ? p q J admettant une structure de groupe, Un groupe de fonctionnalités est un sous-ensemble g ? rps et |g| est sa cardinalité, 2018.

, Nous désignons par ? g le vecteur dans R |g| , qui est la restriction de ? aux indices de g. Nous utilisons également la notation X g P R n?ng pour désigner la sous-matrice de X assemblée à partir des colonnes d'indices j P et X j lorsque les groupes une seule fonctionnalité, L'ensemble de groupes est noté G et nous nous concentrons uniquement sur les groupes ne se chevauchant pas qui forment une partition de l'ensemble rps

H. Akaike, A new look at the statistical model identification, IEEE transactions on automatic control, 1974.

E. L. Allgower and K. Georg, Numerical continuation methods: an introduction, 2012.

A. Antoniadis, Comments on: 1-penalization for mixture regression models, TEST, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00853947

A. Antoniadis, I. Gijbels, S. Lambert-lacroix, and J. Poggi, Joint estimation and variable selection for mean and dispersion in proper dispersion models, Electronic Journal of Statistics, 2016.

A. Argyriou, T. Evgeniou, and M. Pontil, Multi-task feature learning. NIPS, 2006.

A. Argyriou, T. Evgeniou, and M. Pontil, Convex multi-task feature learning, Machine Learning, 2008.

S. Arlot and A. Celisse, A survey of cross-validation procedures for model selection, Statistics surveys, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00407906

D. Azé and J. Penot, Uniformly convex and uniformly smooth convex functions. Annales de la faculté des sciences de Toulouse, 1995.

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00613125

H. H. Bauschke and P. L. Combettes, Convex analysis and monotone operator theory in Hilbert spaces, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01517477

A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM journal on imaging sciences, 2009.

A. Beck and M. Teboulle, Smoothing and first order methods: A unified framework, SIAM Journal on Optimization, 2012.

S. R. Becker, E. J. Candès, and M. C. Grant, Templates for convex cone problems with applications to sparse signal recovery, Mathematical Programming Computation, 2011.

S. Behnel, R. Bradshaw, C. Citro, L. Dalcin, D. S. Seljebotn et al., The best of both worlds. Computing in Science Engineering, 2011.

V. Bellon, V. Stoven, and C. Azencott, Multitask feature selection with task descriptors, Biocomputing 2016: Proceedings of the Pacific Symposium, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01246697

A. Belloni, V. Chernozhukov, and L. Wang, Square-root lasso: Pivotal recovery of sparse signals via conic programming, Biometrika, 2011.

P. J. Bickel, Y. Ritov, and A. B. Tsybakov, Simultaneous analysis of Lasso and Dantzig selector. The Annals of Statistics, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00401585

A. Bonnefoy, V. Emiya, L. Ralaivola, and R. Gribonval, A dynamic screening principle for the lasso, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00880787

A. Bonnefoy, V. Emiya, L. Ralaivola, and R. Gribonval, Dynamic screening: Accelerating firstorder algorithms for the lasso and group-lasso, IEEE Trans. Signal Process, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01084986

J. M. Borwein and H. Wolkowicz, Facial reduction for a cone-convex programming problem, Journal of the Australian Mathematical Society, 1981.

L. Bottou, F. E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, 2016.

O. Bousquet and L. Bottou, The tradeoffs of large scale learning, NIPS, 2008.

A. L. Brearley, G. Mitra, and H. P. Williams, Analysis of mathematical programming problems prior to applying the simplex algorithm. Mathematical programming, 1975.

L. Breiman, Better subset regression using the nonnegative garrote, 1995.

L. D. Brown, Fundamentals of statistical exponential families: with applications in statistical decision theory, 1986.

P. Bühlmann and S. Van-de-geer, Statistics for high-dimensional data, Springer Series in Statistics, 2011.

O. Burdakov, A new vector norm for nonlinear curve fitting and some other optimization problems. 33. Int. Wiss. Kolloq. Fortragsreihe "Mathematische Optimierung | Theorie und Anwendungen, 1988.

O. Burdakov and B. Merkulov, On a new norm for data fitting and optimization problems, Tech. Rep. LiTH-MAT, 2001.

S. Chatterjee, K. Steinhaeuser, A. Banerjee, S. Chatterjee, and A. Ganguly, Sparse group lasso: Consistency and climate applications, SIAM International Conference on Data Mining, 2012.

S. S. Chen and D. L. Donoho, Atomic decomposition by basis pursuit, 1995.

S. Chrétien and S. Darses, Sparse recovery with unknown variance: a lasso-type approach, IEEE Trans. Inf. Theory, 2011.

K. L. Clarkson, Coresets, sparse greedy approximation, and the frank-wolfe algorithm, ACM Transactions on Algorithms, p.63, 2010.

P. L. Combettes, Perspective functions: Properties, constructions, and examples. Set-Valued and Variational Analysis, 2016.

P. L. Combettes and C. L. Müller, Perspective maximum likelihood-type estimation via proximal decomposition, 2018.

P. L. Combettes and J. Pesquet, Proximal splitting methods in signal processing. In Fixed-point algorithms for inverse problems in science and engineering, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00643807

L. Condat, Fast projection onto the simplex and the l1 ball, Mathematical Programming, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01056171

C. F. Dantas and R. Gribonval, Dynamic screening with approximate dictionaries, XXVIème colloque GRETSI, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01598021

C. F. Dantas and R. Gribonval, Faster and still safe: combining screening techniques and structured dictionaries to accelerate the lasso, ICASSP 2018-IEEE International Conference on Acoustics, Speech and Signal Processing, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01706392

S. Diamond and S. Boyd, CVXPY: A Python-embedded modeling language for convex optimization, J. Mach. Learn. Res, 2016.

L. Dicker, Variance estimation in high-dimensional linear models, Biometrika, 2014.

D. Drusvyatskiy and H. Wolkowicz, The many faces of degeneracy in conic optimization. Foundations and Trends in Optimization, 2017.

C. Dünner, S. Forte, M. Taká?, and M. Jaggi, Primal-dual rates and certificates. ICML, 2016.

A. Ebadian, I. Nikoufar, and M. E. Gordji, Perspectives of matrix convex functions, Proceedings of the National Academy of Sciences, 2011.

E. G. Effros, A matrix convexity approach to some celebrated quantum inequalities, Proceedings of the National Academy of Sciences, 2009.

B. Efron, T. Hastie, I. M. Johnstone, and R. Tibshirani, Least angle regression. The Annals of Statistics, 2004.

M. A. Efroymson, Multiple regression analysis. Mathematical methods for digital computers, pp.191-203, 1960.

L. E. Ghaoui, V. Viallon, and T. Rabbani, Safe feature elimination in sparse supervised learning, J. Pacific Optim, 2012.

J. Fan and J. Lv, Sure independence screening for ultrahigh dimensional feature space, 2008.

J. Fan, S. Guo, and N. Hao, Variance estimation using refitted cross-validation in ultrahigh dimensional regression, J. Roy. Statist. Soc. Ser. B, 2012.

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, Liblinear: A library for large linear classification, J. Mach. Learn. Res, 2008.

O. Fercoq and P. Richtárik, Accelerated, parallel, and proximal coordinate descent, SIAM Journal on Optimization, 2015.

O. Fercoq, A. Gramfort, and J. Salmon, Mind the duality gap: safer rules for the lasso, vol.ICML, pp.333-342, 2015.

M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Research Logistics (NRL), 1956.

J. Friedman, T. Hastie, H. Höfling, and R. Tibshirani, Pathwise coordinate optimization. The Annals of Applied Statistics, 2007.

J. Friedman, T. Hastie, and R. R. Tibshirani, , 2010.

J. Friedman, T. Hastie, and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, Journal of statistical software, 2010.

B. Gärtner, M. Jaggi, and C. Maria, An exponential lower bound on the complexity of regularization paths, Journal of Computational Geometry, 2012.

J. Giesen, M. Jaggi, and S. Laue, Approximating parameterized convex optimization problems, European Symposium on Algorithms, 2010.

J. Giesen, J. K. Müller, S. Laue, and S. Swiercy, Approximating concavely parameterized optimization problems, NIPS, 2012.

T. Goldstein and S. Osher, The split Bregman method for L1-regularized problems, SIAM J. Imaging Sci, 2009.

A. Gramfort, M. Kowalski, and M. Hämäläinen, Mixed-norm estimates for the M/EEG inverse problem using accelerated gradient methods, Phys. Med. Biol, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00690774

A. Gramfort, D. Strohmeier, J. Haueisen, M. S. Hamalainen, and M. Kowalski, Time-frequency mixed-norm estimates: Sparse m/eeg imaging with non-stationary source activations, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00773276

I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Gene selection for cancer classification using support vector machines. Machine learning, 2002.

T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu, The entire regularization path for the support vector machine, J. Mach. Learn. Res, 2004.

T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning, 2009.

M. Hebiri and S. Van-de-geer, The smooth-lasso and other l1 + l2-penalized methods, Electronic Journal of Statistics, 2011.

J. Hiriart-urruty, A note on the Legendre-Fenchel transform of convex composite functions, Nonsmooth Mechanics and Analysis, 2006.

J. Hiriart-urruty and C. Lemaréchal, Convex analysis and minimization algorithms. II, 1993.

J. Hiriart-urruty and C. Lemaréchal, Fundamentals of convex analysis, 2012.

A. E. Hoerl, Application of ridge analysis to regression problems, Chemical Engineering Progress, 1962.

A. E. Hoerl and R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, 1970.

P. J. Huber, Robust estimation of a location parameter. The annals of mathematical statistics, 1964.

P. J. Huber, Robust Statistics, 1981.

P. J. Huber and R. Dutter, Numerical solution of robust regression problems, Compstat 1974 (Proc. Sympos. Computational Statist., Univ. Vienna, pp.165-172, 1974.

D. R. Hunter and K. Lange, A tutorial on MM algorithms, The American Statistician, 2004.

R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, Proximal methods for hierarchical sparse coding, J. Mach. Learn. Res, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00516723

T. B. Johnson and C. Guestrin, Blitz: A principled meta-algorithm for scaling sparse optimization, ICML, 2015.

T. B. Johnson and C. Guestrin, Unified methods for exploiting piecewise linear structure in convex optimization, NIPS, 2016.

B. Jorgensen, The theory of dispersion models, 1997.

A. Juditski and Y. Nesterov, Primal-dual subgradient methods for minimizing uniformly convex functions, 2014.

E. Kalnay, M. Kanamitsu, R. Kistler, W. Collins, D. Deaven et al., The NCEP/NCAR 40-year reanalysis project, 1996.

K. Koh, S. Kim, and S. Boyd, An interior-point method for large-scale l1-regularized logistic regression, J. Mach. Learn. Res, 2007.

K. Lange, D. R. Hunter, and I. Yang, Optimization transfer using surrogate objective functions, Journal of computational and graphical statistics, 2000.

S. C. Larson, The shrinkage of the coefficient of multiple correlation, Journal of Educational Psychology, 1931.

J. D. Lee, Y. Sun, and M. A. Saunders, Proximal newton-type methods for minimizing composite functions, SIAM Journal on Optimization, 2014.

S. Lee and E. P. Xing, Screening rules for overlapping group lasso, 2014.

S. Lee, J. Zhu, and E. P. Xing, Adaptive multi-task lasso: with application to eqtl detection, NIPS, 2010.

L. Li, K. Jamieson, G. Desalvo, A. Rostamizadeh, and A. Talwalkar, Hyperband: A novel banditbased approach to hyperparameter optimization, 2016.

X. Li, J. Haupt, R. Arora, H. Liu, M. Hong et al., A first order free lunch for sqrt-lasso, 2016.

J. Liang, J. Fadili, and G. Peyré, Activity identification and local linear convergence of forwardbackward-type methods, SIAM Journal on Optimization, 2017.

H. Liu, M. Palatucci, and J. Zhang, Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. ICML, 2009.

J. Liu, Z. Zhao, J. Wang, and J. Ye, Safe screening with variational inequalities and its application to lasso. ICML, 2014.

J. , Incremental majorization-minimization optimization with application to large-scale machine learning, SIAM Journal on Optimization, 2015.

J. Mairal and B. Yu, Complexity analysis of the lasso regularization path, ICML, 2012.

H. Markowitz, The optimization of a quadratic function subject to linear constraints, Naval Research Logistics (NRL), 1956.

H. Markowitz, Portfolio selection. The journal of finance, 1952.

B. Martinet, Brève communication. régularisation d'inéquations variationnelles par approximations successives. Revue française d'informatique et de recherche opérationnelle, Série rouge, 1970.

M. Massias, A. Gramfort, and J. Salmon, From safe screening rules to working sets for faster lasso-type solvers, 2017.

M. Massias, O. Fercoq, A. Gramfort, and J. Salmon, Generalized concomitant multi-task lasso for sparse multimodal regression, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01812011

M. Massias, A. Gramfort, and J. Salmon, Celer: a Fast Solver for the Lasso with Dual Extrapolation, ICML, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01833398

P. Mccullagh and J. A. Nelder, Generalized Linear Models, 1989.

C. Mészáros and U. H. Suhl, Advanced preprocessing techniques for linear and quadratic programming, OR Spectrum, 2003.

C. Michelot, A finite algorithm for finding the projection of a point onto the canonical simplex of rn, Journal of Optimization Theory and Applications, 1986.

E. Ndiaye, O. Fercoq, A. Gramfort, and J. Salmon, GAP safe screening rules for sparse multi-task and multi-class models, NIPS, 2015.

E. Ndiaye, O. Fercoq, A. Gramfort, and J. Salmon, GAP safe screening rules for Sparse-Group Lasso, NIPS, 2016.

E. Ndiaye, O. Fercoq, A. Gramfort, V. Leclere, and J. Salmon, Efficient smoothed concomitant Lasso estimation for high dimensional regression, Journal of Physics: Conference Series, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01404966

E. Ndiaye, O. Fercoq, A. Gramfort, and J. Salmon, Gap safe screening rules for sparsity enforcing penalties, J. Mach. Learn. Res, 2017.

Y. Nesterov, Introductory lectures on convex optimization, 2004.

Y. Nesterov, Smooth minimization of non-smooth functions. Mathematical programming, 2005.

Y. Nesterov, Gradient methods for minimizing composite objective function. Citeseer, 2007.

Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM Journal on Optimization, 2012.

J. Nutini, I. Laradji, and M. Schmidt, Let's make block coordinate descent go fast: Faster greedy rules, message-passing, active-set complexity, 2017.

G. Obozinski and F. Bach, A unified perspective on convex structured sparsity: Hierarchical, symmetric, submodular norms and beyond. HAL Id : hal-01412385, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01412385

G. Obozinski, B. Taskar, and M. I. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems, Statistics and Computing, 2010.

K. Ogawa, Y. Suzuki, and I. Takeuchi, Safe screening of non-support vectors in pathwise svm computation. ICML, 2013.

J. M. Ortega and W. C. Rheinboldt, Iterative solution of nonlinear equations in several variables, 1970.

M. R. Osborne, An effective method for computing regression quantiles, IMA Journal of Numerical Analysis, 1992.

M. R. Osborne, B. Presnell, and B. A. Turlach, On the lasso and its dual, Journal of Computational and Graphical statistics, 2000.

M. R. Osborne, B. Presnell, and B. A. Turlach, A new approach to variable selection in least squares problems, IMA Journal of Numerical Analysis, 2000.

A. B. Owen, A robust hybrid of lasso and ridge regression, Contemporary Mathematics, 2007.

N. Parikh and S. Boyd, Proximal algorithms. Foundations and Trends in Optimization, 2014.

M. Y. Park and T. Hastie, L1-regularization path algorithm for generalized linear models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2007.

F. Pedregosa, Hyperparameter optimization with approximate gradient. ICML, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01386410

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, J. Mach. Learn. Res, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

J. Peng, J. Zhu, A. Bergamaschi, W. Han, D. Noh et al., Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer, The Annals of Applied Statistics, 2010.

J. Peypouquet, Convex optimization in normed spaces: theory, methods and examples, 2015.

V. Pham and L. E. Ghaoui, Robust sketching for multiple square-root LASSO problems, AISTATS, 2015.

B. Playe, C. Azencott, and V. Stoven, Efficient multi-task chemogenomics for drug specificity prediction. bioRxiv, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01984828

A. Raj, J. Olbrich, B. Gärtner, B. Schölkopf, and M. Jaggi, Screening rules for convex problems, 2016.

A. Rakotomamonjy, Variable selection using svm-based criteria, J. Mach. Learn. Res, 2003.

S. Reid, R. Tibshirani, and J. Friedman, A study of error variance estimation in lasso regression, 2013.

P. Richtárik and M. Taká?, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Mathematical Programming, 2014.

H. Robbins and S. Monro, A stochastic approximation method. The annals of mathematical statistics, 1951.

R. T. Rockafellar, Convex analysis, 1997.

S. Rosset and J. Zhu, Piecewise linear regularized solution paths. The Annals of Statistics, 2007.
DOI : 10.1214/009053606000001370
URL : https://doi.org/10.1214/009053606000001370

M. Sangnier, O. Fercoq, F. .-d'alché, and . Buc, Data sparse nonparametric regression with ?insensitive losses, Asian Conference on Machine Learning, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01593459

E. J. Schlossmacher, An iterative technique for absolute deviations curve fitting, Journal of the American Statistical Association, 1973.
DOI : 10.2307/2284512

G. Schwarz, Estimating the dimension of a model. The annals of statistics, 1978.

D. Scieur, A. .-d'aspremont, and F. Bach, Regularized nonlinear acceleration, Advances In Neural Information Processing Systems, 2016.
DOI : 10.1007/s10107-018-1319-8
URL : https://hal.archives-ouvertes.fr/hal-01384682

S. Shalev-shwartz and S. Ben-david, Understanding machine learning: From theory to algorithms, 2014.
DOI : 10.1017/cbo9781107298019

S. Shalev-shwartz and T. Zhang, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. ICML, 2014.
DOI : 10.1007/s10107-014-0839-0
URL : http://arxiv.org/pdf/1309.2375

A. Shibagaki, Y. Suzuki, M. Karasuyama, and I. Takeuchi, Regularization path of cross-validation error lower bounds, NIPS, 2015.

A. Shibagaki, M. Karasuyama, K. Hatano, and I. Takeuchi, Simultaneous safe screening of features and samples in doubly sparse modeling, 2016.

N. Simon, J. Friedman, T. Hastie, and R. Tibshirani, A sparse-group lasso, J. Comput. Graph. Statist, 2013.
DOI : 10.1080/10618600.2012.681250

P. Sprechmann, I. Ramirez, G. Sapiro, and C. E. Yonina, Collaborative hierarchical sparse modeling, Information Sciences and Systems (CISS), 2010.
DOI : 10.1109/ciss.2010.5464845

P. Sprechmann, I. Ramirez, G. Sapiro, and Y. C. Eldar, C-hilasso: A collaborative hierarchical sparse modeling framework, IEEE Trans. Signal Process, 2011.
DOI : 10.1109/tsp.2011.2157912
URL : http://arxiv.org/pdf/1006.1346.pdf

N. Städler, P. Bühlmann, and S. Van-de-geer, 1-penalization for mixture regression models, TEST, 2010.

C. M. Stein, Estimation of the mean of a multivariate normal distribution. The annals of Statistics, 1981.

S. M. Stigler, The epic story of maximum likelihood, Statistical Science, 2007.
DOI : 10.1214/07-sts249
URL : https://doi.org/10.1214/07-sts249

T. Sun and Q. Tran-dinh, Generalized self-concordant functions: A recipe for newton-type methods, 2017.
DOI : 10.1007/s10107-018-1282-4
URL : http://arxiv.org/pdf/1703.04599

T. Sun and C. Zhang, Comments on: 1-penalization for mixture regression models, TEST, 2010.

T. Sun and C. Zhang, Scaled sparse linear regression, Biometrika, 2012.

T. Sun and C. Zhang, Sparse matrix inversion with scaled lasso, J. Mach. Learn. Res, 2013.

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 1996.

R. Tibshirani, J. Bien, J. Friedman, T. Hastie, N. Simon et al., Strong rules for discarding predictors in lasso-type problems, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2012.

R. J. Tibshirani, The lasso problem and uniqueness, Electronic Journal of Statistics, 2013.

A. N. Tikhonov, On the stability of inverse problems, Dokl. Akad. Nauk SSSR, vol.39, pp.176-179, 1943.

S. Van-de-geer and B. Stucky, ?2-confidence sets in high-dimensional regression, Statistical Analysis for High-Dimensional Data, 2016.

A. W. , Van der Vaart. Asymptotic statistics, 1998.

J. Wang and J. Ye, Two-layer feature reduction for sparse-group lasso via decomposition of convex sets, NIPS, 2014.

J. Wang, J. Zhou, J. Liu, P. Wonka, and J. Ye, A safe screening rule for sparse logistic regression, NIPS, 2014.

J. Wang, P. Wonka, and J. Ye, Lasso screening rules via dual polytope projection, J. Mach. Learn. Res, 2015.

J. Warga, Minimizing certain convex functions, Journal of the Society for Industrial and Applied Mathematics, 1963.

S. J. Wright, Coordinate descent algorithms, 2015.

D. Wrinch and H. Jeffreys, Xlii. on certain fundamental principles of scientific inquiry. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1921.

Z. J. Xiang, H. Xu, and P. J. Ramadge, Learning sparse representations of high dimensional data on large scale dictionaries, NIPS, 2011.

H. Xu, C. Caramanis, and S. Mannor, Robust regression and lasso, IEEE Trans. Inf. Theory, 2010.

Q. Xu, S. J. Pan, H. Xue, and Q. Yang, Multitask learning for protein subcellular location prediction, IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2011.

T. Yoshida, I. Takeuchi, and M. Karasuyama, Safe triplet screening for distance metric learning, 2018.

G. Yu and J. Bien, Estimating the error variance in a high-dimensional linear model, 2017.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 2006.

S. Yun, On the iteration complexity of cyclic coordinate gradient descent methods, SIAM J. Optim, 2014.

Y. Zeng and P. Breheny, The biglasso package: A memory-and computation-efficient solver for lasso model fitting with big data in R, 2017.

D. Zhang and D. Shen, Alzheimer's Disease Neuroimaging Initiative. Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in alzheimer's disease, Neuroimage, 2012.

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. Roy. Statist. Soc. Ser. B, 2005.