G. , 13 Additional details and results on the experiments, p.281

P. Algorithm-of, , 2016.

G. , 2 Different values of the correlation coefficient ?, p.281

G. , 3 Different sparsity scenarios

, Let µ and V 0 be defined as in Corollary 13. Let R ? O r and U ? C(R) ? V 0 . According to Corollary 13, f is µ-strongly convex on C(R) convex function

H. Akaike, A new look at the statistical model identification, Selected Papers of Hirotugu Akaike, pp.215-222, 1974.

E. B. Andersen, Sufficiency and exponential families for discrete sample spaces, Journal of the American Statistical Association, vol.65, issue.331, pp.1248-1255, 1970.

R. K. Ando and T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeled data, Journal of Machine Learning Research, vol.6, pp.1817-1853, 2005.

R. K. Ando and T. Zhang, Two-view feature generation model for semisupervised learning, Proceedings of the 24th international conference on Machine learning, pp.25-32, 2007.

A. Antoniadis, X. Brosat, J. Cugliari, and J. Poggi, Prévision d'un processus à valeurs fonctionnelles en présence de non stationnarités, 2012.

A. Antoniadis, X. Brosat, J. Cugliari, and J. Poggi, Une approche fonctionnelle pour la prévision non-paramétrique de la consommation d'électricité, Journal de la Société Française de Statistique, vol.155, issue.2, pp.202-219, 2014.

A. Antoniadis, E. Paparoditis, and T. Sapatinas, A functional waveletkernel approach for time series prediction, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.68, issue.5, pp.837-857, 2006.

V. I. Arnold, On functions of three variables, Doklady Akademii Nauk, vol.114, pp.679-681, 1957.

H. Attouch and J. Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Mathematical Programming, vol.116, issue.1-2, pp.5-16, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00803898

H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-?ojasiewicz inequality. Mathematics of Operations Research, vol.35, pp.438-457, 2010.

H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Mathematical Programming, vol.137, issue.1-2, pp.91-129, 2013.
URL : https://hal.archives-ouvertes.fr/inria-00636457

B. Auder, J. Cugliari, Y. Goude, and J. Poggi, Scalable clustering of individual electrical curves for profiling and bottom-up forecasting, Energies, vol.11, issue.7, p.1893, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02063272

M. Avellaneda and D. Boyer-olson, Reconstruction of volatility: pricing index options by the steepest descent approximation, 2002.

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with sparsity-inducing penalties. Foundations and Trends® in Machine Learning, vol.4, pp.1-106, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00613125

S. Bakin, Adaptive regression and model selection in data mining problems, 1999.

B. Bakker and T. Heskes, Task clustering and gating for Bayesian multitask learning, Journal of Machine Learning Research, vol.4, pp.83-99, 2003.

P. Baldi and K. Hornik, Neural networks and principal component analysis : Learning from examples without local minima, Neural networks, vol.2, issue.1, pp.53-58, 1989.

T. Barbier, Modélisation de la consommation électrique à partir de grandes masses de données pour la simulation des alternatives énergétiques du futur, 2017.

J. Baxter, A model of inductive bias learning, Journal of artificial intelligence research, vol.12, pp.149-198, 2000.

S. Ben-david and R. Schuller, Exploiting task relatedness for multiple task learning, Learning Theory and Kernel Machines, pp.567-580, 2003.

S. Bhojanapalli, B. Neyshabur, and N. Srebro, Global optimality of local search for low rank matrix recovery, Advances in Neural Information Processing Systems, pp.3873-3881, 2016.

P. Binev, A. Cohen, W. Dahmen, and R. Devore, Universal algorithms for learning theory. part ii: Piecewise polynomial functions. Constructive approximation, vol.26, pp.127-152, 2007.

J. Bolte, A. Daniilidis, L. , and A. , The ?ojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems, SIAM Journal on Optimization, vol.17, issue.4, pp.1205-1223, 2007.

J. Bolte, S. Sabach, and M. Teboulle, Proximal alternating linearized minimization or nonconvex and nonsmooth problems, Mathematical Programming, vol.146, issue.1-2, pp.459-494, 2014.

J. F. Bonnans and A. Shapiro, Optimization problems with perturbations : A guided tour, SIAM review, vol.40, issue.2, pp.228-264, 1998.
URL : https://hal.archives-ouvertes.fr/inria-00073819

N. Boumal, V. Voroninski, and A. Bandeira, The non-convex Burer-Monteiro approach works on smooth semidefinite programs, Advances in Neural Information Processing Systems, pp.2757-2765, 2016.

L. Breiman, Random forests. Machine learning, vol.45, pp.5-32, 2001.

L. Breiman and J. H. Friedman, Estimating optimal transformations for multiple regression and correlation, Journal of the American statistical Association, vol.80, issue.391, pp.580-598, 1985.

N. E. Breslow and D. G. Clayton, Approximate inference in generalized linear mixed models, Journal of the American statistical Association, vol.88, issue.421, pp.9-25, 1993.

A. Bruhns, G. Deurveilher, and J. Roy, A non linear regression model for mid-term load forecasting and improvements in seasonality, Proceedings of the 15th Power Systems Computation Conference, pp.22-26, 2005.

F. Bunea, Y. She, and M. H. Wegkamp, Optimal selection of reduced rank estimators of high-dimensional matrices, The Annals of Statistics, pp.1282-1309, 2011.

F. Bunea, Y. She, and M. H. Wegkamp, Joint variable and rank selection for parsimonious estimation of high-dimensional matrices, The Annals of Statistics, vol.40, issue.5, pp.2359-2388, 2012.

R. Caruana, Multitask learning. Machine learning, vol.28, pp.41-75, 1997.

B. Chen and M. Chang, Load forecasting using support vector machines: A study on eunite competition, IEEE transactions on power systems, vol.19, issue.4, pp.1821-1830, 2001.

L. Chen and J. Z. Huang, Sparse reduced-rank regression for simultaneous dimension reduction and variable selection, Journal of the American Statistical Association, vol.107, issue.500, pp.1533-1545, 2012.

T. Chen and C. Guestrin, XGBoost: A scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp.785-794, 2016.

T. Chen, T. He, M. Benesty, V. Khotilovich, and Y. Tang, Xgboost: extreme gradient boosting, pp.1-4, 2015.

H. Cho, Y. Goude, X. Brossat, and Q. Yao, Modeling and forecasting daily electricity load curves: a hybrid approach, Journal of the American Statistical Association, vol.108, issue.501, pp.7-21, 2013.

H. Cho, Y. Goude, X. Brossat, and Q. Yao, Modelling and forecasting daily electricity load via curve linear regression, Modeling and Stochastic Learning for Forecasting in High Dimensions, pp.35-54, 2015.

E. Chouzenoux, J. Pesquet, and A. Repetti, Variable metric forwardbackward algorithm for minimizing the sum of a differentiable function and a convex function, Journal of Optimization Theory and Applications, vol.162, issue.1, pp.107-132, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00789970

R. Cont and R. Deguest, Equity correlations implied by index options : estimation and model uncertainty analysis, Mathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics, vol.23, issue.3, pp.496-530, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00837997

P. Craven and G. Wahba, Smoothing noisy data with spline functions, Numerische mathematik, vol.31, issue.4, pp.377-403, 1978.

. Cre, L'électricité, comment ça marche ?, 2019.

S. Cros and P. Pinson, Prévision météorologique pour les énergies renouvelables, La Météorologie, issue.100, 2018.

D. Csiba and P. Richtarik, Global convergence of arbitrary-block gradient methods for generalized Polyak-?ojasiewicz functions, 2017.

J. Cugliari, Y. Goude, and J. Poggi, Disaggregated electricity forecasting using wavelet-based clustering of individual consumers, IEEE International Energy Conference (ENERGYCON), pp.1-6, 2016.

J. M. Danskin, The theory of max-min and its application to weapons allocation problems, vol.5, 1967.

G. Darmois, Sur les lois de probabilités à estimation exhaustive, CR Acad. Sci, vol.260, p.85, 1265.

A. P. Dawid, Present position and potential developments: Some personal views statistical theory the prequential approach, Journal of the Royal Statistical Society: Series A (General), vol.147, issue.2, pp.278-290, 1984.

C. De-boor, C. De-boor, E. Mathématicien, C. De-boor, D. Boor et al., A practical guide to splines, vol.27, 1978.

M. Devaine, P. Gaillard, Y. Goude, and G. Stoltz, Forecasting electricity consumption by aggregating specialized experts, Machine Learning, vol.90, pp.231-260, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00484940

V. Dordonnat, S. J. Koopman, M. Ooms, A. Dessertaine, and J. Collet, An hourly periodic state space model for modelling french national electricity load, International Journal of Forecasting, vol.24, issue.4, pp.566-587, 2008.

S. S. Du, C. Jin, J. D. Lee, M. I. Jordan, A. Singh et al., Gradient descent can take exponential time to escape saddle points, Advances in Neural Information Processing Systems, pp.1067-1077, 2017.

B. Dubois, J. Delmas, and G. Obozinski, Fast algorithms for sparse reduced-rank regression, The 22nd International Conference on Artificial Intelligence and Statistics, pp.2415-2424, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02075623

Q. Duchemin, Modèles de clustering pour la prévision de la consommation électrique, 2018.

G. Dudek, Short-term load forecasting using random forests, Intelligent Systems' 2014, pp.821-828, 2015.

M. Dumont, R. Marée, L. Wehenkel, and P. Geurts, Fast multi-class image annotation with random subwindows and multiple output randomized trees, Proc. International Conference on Computer Vision Theory and Applications (VISAPP), vol.2, pp.196-203, 2009.

V. Durrleman and N. El-karoui, Coupling smiles, Quantitative Finance, vol.8, issue.6, pp.573-590, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00708494

P. H. Eilers and B. D. Marx, Flexible smoothing with B-splines and penalties. Statistical science, pp.89-102, 1996.

E. Elhamifar and R. Vidal, Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence, vol.35, pp.2765-2781, 2013.

T. Evgeniou, C. A. Micchelli, and M. Pontil, Learning multiple tasks with kernel methods, Journal of Machine Learning Research, vol.6, pp.615-637, 2005.

T. Evgeniou and M. Pontil, Regularized multi-task learning, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.109-117, 2004.

J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American statistical Association, vol.96, issue.456, pp.1348-1360, 2001.

S. Fan and R. J. Hyndman, Short-term load forecasting based on a semiparametric additive model, IEEE Transactions on Power Systems, vol.27, issue.1, pp.134-141, 2011.

R. A. Fisher, Xv.-the correlation between relatives on the supposition of mendelian inheritance, Earth and Environmental Science Transactions of the Royal Society of Edinburgh, vol.52, issue.2, pp.399-433, 1919.

B. Forster, Splines and multiresolution analysis, Handbook of Mathematical Methods in Imaging, pp.1231-1270, 2011.

P. Frankel, G. Garrigos, and J. Peypouquet, Splitting methods with variable metric for Kurdyka-?ojasiewicz functions and general convergence rates, Journal of Optimization Theory and Applications, vol.165, issue.3, pp.874-900, 2015.

J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, Springer series in statistics, vol.1, 2001.

J. H. Friedman, Greedy function approximation: a gradient boosting machine, Annals of statistics, pp.1189-1232, 2001.

J. H. Friedman, Multivariate adaptive regression splines, The Annals of Statistics, vol.19, issue.1, pp.1-67, 1991.

J. H. Friedman and W. Stuetzle, Projection pursuit regression, Journal of the American statistical Association, vol.76, issue.376, pp.817-823, 1981.

J. H. Friedman and W. Stuetzle, Smoothing of scatterplots, 1982.

P. Gaillard and Y. Goude, Forecasting electricity consumption by aggregating experts; how to design a good set of experts. In Modeling and stochastic learning for forecasting in high dimensions, pp.95-115, 2015.

P. Gaillard, Y. Goude, and R. Nedellec, Additive models and robust aggregation for GEFCom2014 probabilistic electric load and electricity price forecasting, International Journal of forecasting, vol.32, issue.3, pp.1038-1050, 2016.

J. Gama, I. ?liobait?, A. Bifet, M. Pechenizkiy, and A. Bouchachia, A survey on concept drift adaptation, ACM computing surveys (CSUR), vol.46, issue.4, p.44, 2014.

R. Ge, C. Jin, and Y. Zheng, No spurious local minima in nonconvex low rank problems: a unified geometric analysis, 2017.

R. Ge, J. D. Lee, and T. Ma, Matrix completion has no spurious local minimum, Advances in Neural Information Processing Systems, pp.2973-2981, 2016.

A. E. Gelfand and S. R. Dalal, A note on overdispersed exponential families, Biometrika, vol.77, issue.1, pp.55-64, 1990.

A. Gelman, Analysis of variance-why it is more important than ever. The annals of statistics, vol.33, pp.1-53, 2005.

T. Gneiting, F. Balabdaoui, and A. E. Raftery, Probabilistic forecasts, calibration and sharpness, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.69, issue.2, pp.243-268, 2007.
URL : https://hal.archives-ouvertes.fr/hal-01575138

I. J. Good, Some history of the hierarchical Bayesian methodology. Trabajos de estadística y de investigación operativa, vol.31, p.489, 1980.

Y. Goude, R. Nedellec, and N. Kong, Local short and middle term electricity load forecasting with semi-parametric additive models, IEEE transactions on smart grid, vol.5, issue.1, pp.440-446, 2013.

E. Grave, G. R. Obozinski, and F. R. Bach, Trace lasso: a trace norm regularization for correlated designs, Advances in Neural Information Processing Systems, pp.2187-2195, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00620197

P. J. Green, Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives, Journal of the Royal Statistical Society: Series B (Methodological), vol.46, issue.2, pp.149-170, 1984.

C. W. Gross and J. E. Sohl, Disaggregation methods to expedite product line forecasting, Journal of Forecasting, vol.9, issue.3, pp.233-254, 1990.

H. Hahn, S. Meyer-nieberg, and S. Pickl, Electric load forecasting methods: Tools for decision making, European journal of operational research, vol.199, issue.3, pp.902-907, 2009.

T. Hastie and R. Tibshirani, Generalized Additive Models, 1990.

S. Haykin, Neural networks: a comprehensive foundation, 1994.

T. Heskes, Empirical Bayes for learning to learn, 2000.

H. S. Hippert, C. E. Pedreira, and R. C. Souza, Neural networks for shortterm load forecasting: A review and evaluation, IEEE Transactions on power systems, vol.16, issue.1, pp.44-55, 2001.

T. Hofmann and J. Puzicha, Latent class models for collaborative filtering, IJCAI, vol.99, 1999.

T. Hong and S. Fan, Probabilistic electric load forecasting: A tutorial review, International Journal of Forecasting, vol.32, issue.3, pp.914-938, 2016.

T. Hong, P. Pinson, F. , and S. , Global energy forecasting competition, 2012.

S. Huang and K. Shih, Short-term load forecasting via ARMA model identification including non-Gaussian process considerations, IEEE Transactions on power systems, vol.18, issue.2, pp.673-679, 2003.

R. J. Hyndman, R. A. Ahmed, G. Athanasopoulos, and H. L. Shang, Optimal combination forecasts for hierarchical time series, Computational Statistics & Data Analysis, vol.55, issue.9, pp.2579-2589, 2011.

, Energy policies of iea countries -France, IEA, 2016.

, France energy balance Sankey diagram, IEA, 2019.

N. Intrator and S. Edelman, Making a low-dimensional representation suitable for diverse tasks, Learning to learn, pp.135-157, 1996.

L. Jacob, J. Vert, and F. R. Bach, Clustered multi-task learning: A convex formulation, Advances in Neural Information Processing Systems, pp.745-752, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00320573

P. Jain, C. Jin, S. Kakade, and P. Netrapalli, Global convergence of non-convex gradient descent for computing matrix squareroot, Artificial Intelligence and Statistics, pp.479-488, 2017.

L. Jian, H. Tao, and Y. Meng, Real-time anomaly detection for very shortterm load forecasting, Journal of Modern Power Systems and Clean Energy, vol.6, issue.2, pp.235-243, 2018.

C. Jin, R. Ge, P. Netrapalli, S. M. Kakade, J. et al., How to escape saddle points efficiently, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.1724-1732, 2017.

E. Jones, T. Oliphant, and P. Peterson, Scipy: Open source scientific tools for Python, 2001.

B. Jørgensen, Exponential dispersion models, Journal of the Royal Statistical Society: Series B (Methodological), vol.49, issue.2, pp.127-145, 1987.

B. Jourdain and M. Sbai, Coupling index and stocks, Quantitative Finance, vol.12, issue.5, pp.805-818, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00350652

H. Karimi, J. Nutini, and M. Schmidt, Linear convergence of gradient and proximal-gradient methods under the Polyak-?ojasiewicz condition, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.795-811, 2016.

R. E. Kass and D. Steffey, Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models), Journal of the American Statistical Association, vol.84, issue.407, pp.717-726, 1989.

K. Kawaguchi, Deep learning without poor local minima, Advances in Neural Information Processing Systems, pp.586-594, 2016.

K. Khamaru and M. Wainwright, Convergence guarantees for a class of non-convex and non-smooth optimization problems, Proceedings of the 35th International Conference on Machine Learning, vol.80, pp.2601-2610, 2018.

A. Khotanzad, R. Afkhami-rohani, T. Lu, A. Abaye, M. Davis et al., ANNSTLF-a neural-network-based electric load forecasting system, IEEE Transactions on Neural networks, vol.8, issue.4, pp.835-846, 1997.

S. Kiartzis, A. Bakirtzis, and V. Petridis, Short-term load forecasting using neural networks, vol.33, pp.1-6, 1995.

S. Kim and G. B. Giannakis, Load forecasting via low rank plus sparse matrix factorization, Asilomar Conference on Signals, Systems and Computers, pp.1682-1686, 2013.

R. Koenker, Quantile Regression, 2005.

R. Koenker and G. Bassett, Regression quantiles, Econometrica: journal of the Econometric Society, pp.33-50, 1978.

A. N. Kolmogorov, On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition, Doklady Akademii Nauk, vol.114, pp.953-956, 1957.

J. Z. Kolter and J. Ferreira, A large-scale study on predicting and contextualizing building energy usage, Twenty-fifth AAAI conference on artificial intelligence, 2011.

B. O. Koopman, On distributions admitting a sufficient statistic, Transactions of the American Mathematical society, vol.39, issue.3, pp.399-409, 1936.

A. Kumar, I. Daume, and H. , Learning task grouping and overlap in multitask learning, 2012.

E. Kyriakides and M. Polycarpou, Short term electric load forecasting: A tutorial, Trends in Neural Computation, pp.391-418, 2007.

D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol.401, issue.6755, p.788, 1999.

D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, Advances in neural information processing systems, pp.556-562, 2001.

J. D. Lee, I. Panageas, G. Piliouras, M. Simchowitz, M. I. Jordan et al., First-order methods almost always avoid saddle points, 2017.

G. Li and T. K. Pong, Calculus of the exponent of Kurdyka-?ojasiewicz inequality and its applications to linear convergence of first-order methods, Foundations of Computational Mathematics, pp.1-34, 2017.

H. Li and Z. Lin, Accelerated proximal gradient methods for nonconvex programming, Advances in neural information processing systems, pp.379-387, 2015.

Q. Li, Z. Zhu, and G. Tang, Geometry of factored nuclear norm regularization, 2017.

Q. Li, Z. Zhu, and G. Tang, The non-convex geometry of low-rank matrix optimization. Information and Inference: A, Journal of the IMA, 2018.

X. Li, Z. Wang, J. Lu, R. Arora, J. Haupt et al., Symmetry, saddle points, and global geometry of nonconvex matrix factorization, 2016.

X. Lin and D. Zhang, Inference in generalized additive mixed modelsby using smoothing splines, Journal of the royal statistical society: Series b (statistical methodology), vol.61, pp.381-400, 1999.

D. V. Lindley and A. F. Smith, Bayes estimates for the linear model, Journal of the Royal Statistical Society: Series B (Methodological), vol.34, issue.1, pp.1-18, 1972.

D. C. Liu and J. Nocedal, On the limited memory BFGS method for large scale optimization, Mathematical programming, vol.45, issue.1-3, pp.503-528, 1989.

Z. Ma and T. Sun, Adaptive sparse reduced-rank regression, p.1403, 2014.

D. Maclaurin, D. Duvenaud, and R. P. Adams, Autograd: Effortless gradients in numpy, ICML 2015 AutoML Workshop, vol.238, 2015.

J. Macqueen, Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol.1, pp.281-297, 1967.

E. Mammen and S. Van-de-geer, Locally adaptive regression splines, The Annals of Statistics, vol.25, issue.1, pp.387-413, 1997.

A. Maurer, Transfer bounds for linear feature learning, Machine learning, vol.75, issue.3, pp.327-350, 2009.

P. Mccullagh and J. Nelder, Generalized Linear Models, vol.2, 1983.

J. Mei, Y. De-castro, Y. Goude, J. Azaïs, and G. Hébrail, Nonnegative matrix factorization with side information for time series recovery and prediction, IEEE Transactions on Knowledge and Data Engineering, vol.31, issue.3, pp.493-506, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01686429

J. Mei, Y. De-castro, Y. Goude, and G. Hébrail, Nonnegative matrix factorization for time series recovery from a few temporal aggregates, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.2382-2390, 2017.

J. Mei, Y. Goude, G. Hebrail, and N. Kong, Spatial estimation of electricity consumption using socio-demographic information, 2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC), pp.753-757, 2016.

J. W. Messner and P. Pinson, Online adaptive lasso estimation in vector autoregressive models for high dimensional wind power forecasting, International Journal of Forecasting, 2018.

M. Mougeot, D. Picard, V. Lefieux, and L. Maillard-teyssier, Forecasting intra day load curves using sparse functional regression, Modeling and Stochastic Learning for Forecasting in High Dimensions, pp.161-181, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01267509

A. Mukherjee, K. Chen, N. Wang, and J. Zhu, On the degrees of freedom of reduced-rank estimators in multivariate regression, Biometrika, vol.102, issue.2, pp.457-477, 2015.

A. Muñoz, E. F. Sánchez-Úbeda, A. Cruz, and J. Marín, Short-term forecasting in power systems: a guided tour, Handbook of power systems II, pp.129-160, 2010.

K. Nagbe, J. Cugliari, J. , and J. , Short-term electricity demand forecasting using a functional state space model, Energies, vol.11, issue.5, p.1120, 2018.

K. Nagbe, J. Cugliari, A. Thebault, J. , and J. , Prévision de génération d'électricité à partir de sources renouvelables, 2017.

J. A. Nelder and R. W. Wedderburn, Generalized linear models, Journal of the Royal Statistical Society: Series A (General), vol.135, issue.3, pp.370-384, 1972.

M. Nikolova and P. Tan, Alternating proximal gradient descent for nonconvex regularised problems with multiconvex coupling terms. HAL-01492846, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01492846

J. Nowicka-zagrajek and R. Weron, Modeling electricity loads in california: Arma models with hyperbolic noise, Signal Processing, vol.82, issue.12, pp.1903-1915, 2002.

J. Nowotarski and R. Weron, Recent advances in electricity price forecasting: A review of probabilistic forecasting, Renewable and Sustainable Energy Reviews, vol.81, pp.1548-1568, 2018.

G. Obozinski, B. Taskar, J. , and M. I. , Joint covariate selection and joint subspace selection for multiple classification problems, Statistics and Computing, vol.20, issue.2, pp.231-252, 2010.

P. Ochs, Y. Chen, T. Brox, and T. Pock, iPiano: Inertial proximal algorithm for nonconvex optimization, SIAM Journal on Imaging Sciences, vol.7, issue.2, pp.1388-1419, 2014.

H. W. Oliver, The exact Peano derivative, Transactions of the American Mathematical Society, vol.76, issue.3, pp.444-456, 1954.

I. Panageas and G. Piliouras, Gradient descent only converges to minimizers: Non-isolated critical points and invariant regions, 2016.

D. Park, A. Kyrillidis, C. Caramanis, and S. Sanghavi, Finding low-rank solutions via non-convex matrix factorization, efficiently and provably, 2016.

D. C. Park, M. El-sharkawi, R. Marks, L. Atlas, and M. Damborg, Electric load forecasting using an artificial neural network, IEEE transactions on Power Systems, vol.6, issue.2, pp.442-449, 1991.

J. Pearl, Bayesian networks: A model of self-activated memory for evidential reasoning, Proceedings of the 7th Conference of the Cognitive Science Society, pp.329-334, 1985.

D. N. Perkins and G. Salomon, Transfer of learning. International encyclopedia of education, vol.2, pp.6452-6457, 1992.

M. H. Pesaran and A. Pick, Forecast combination across estimation windows, Journal of Business & Economic Statistics, vol.29, issue.2, pp.307-318, 2011.

C. G. Petra, V. Zavala, E. Nino-ruiz, and M. Anitescu, Economic impacts of wind covariance estimation on power grid operations, 2014.

A. Pierrot and Y. Goude, Short-term electricity load forecasting with generalized additive models, Proceedings of ISAP power, 2011.

P. Pinson, Very-short-term probabilistic forecasting of wind power with generalized logit-normal distributions, Journal of the Royal Statistical Society: Series C (Applied Statistics), vol.61, issue.4, pp.555-576, 2012.

E. J. Pitman, Sufficient statistics and intrinsic accuracy, Mathematical Proceedings of the cambridge Philosophical society, vol.32, pp.567-579, 1936.

B. T. Polyak, Gradient methods for minimizing functionals, Zhurnal Vychislitel'noi Matematiki i Matematicheskoi Fiziki, vol.3, issue.4, pp.643-653, 1963.

T. K. Pong, P. Tseng, S. Ji, Y. , and J. , Trace norm regularization: Reformulations, algorithms, and multi-task learning, SIAM Journal on Optimization, vol.20, issue.6, pp.3465-3489, 2010.

P. Rai, A. Kumar, and H. Daume, Simultaneously leveraging output and task structures for multiple-output regression, Advances in Neural Information Processing Systems, pp.3185-3193, 2012.

S. W. Raudenbush and A. S. Bryk, Hierarchical linear models: Applications and data analysis methods, vol.1, 2002.

H. Robbins, An empirical Bayes approach to statistics. Herbert Robbins Selected Papers, pp.41-47, 1956.

R. T. Rockafellar, R. J. Wets, and .. , Variational analysis, vol.317, 2009.

A. J. Rothman, E. Levina, and J. Zhu, Sparse multivariate regression with covariance estimation, Journal of Computational and Graphical Statistics, vol.19, issue.4, pp.947-962, 2010.

. Rte, Référentiel de températures, ?id=9482& mode=detail. Last accessed on, 2011.

. Rte, Méthodologie des prévisions, 2014.

. Rte, L'équilibre offre-demande d'électricité pour l'hiver, 2016.

. Rte, Bilan Électrique, Schéma décennal de développement du réseau, 2015.

. Rte, , 2019.

. Rte, , 2019.

L. , , 2019.

. Rte, L'équilibre offre-demande d'électricité pour l'hiver, 2018.

. Rte, L'équilibre offre-demande d'électricité pour l'été, 2019.

B. M. Sanandaji, A. Tascikaraoglu, K. Poolla, and P. Varaiya, Lowdimensional models in spatio-temporal wind speed forecasting, 2015 American Control Conference (ACC), pp.4485-4490, 2015.

M. Sangnier, O. Fercoq, and F. Buc, Joint quantile regression in vector-valued RKHSs, Advances in Neural Information Processing Systems, pp.3693-3701, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01272327

L. Schumaker, Spline functions: basic theory, 2007.

Y. She, Selective factor extraction in high dimensions, Biometrika, vol.104, issue.1, pp.97-110, 2017.

S. Shenoy, D. Gorinevsky, and S. Boyd, Non-parametric regression modeling for stochastic optimization of power grid load forecast, 2015 American Control Conference (ACC), pp.1010-1015, 2015.

J. M. Sloughter, T. Gneiting, and A. E. Raftery, Probabilistic wind speed forecasting using ensembles and Bayesian model averaging, Journal of the american statistical association, vol.105, issue.489, pp.25-35, 2010.

G. Stewart, Smooth local bases for perturbed eigenspaces. Institute for Advanced Computer Studies TR, p.8, 2012.

C. J. Stone and C. Koo, Additive splines in statistics, Proceedings of the American Statistical Association. Original pagination is p, p.48, 1985.

M. Sugiyama and M. Kawanabe, Machine learning in non-stationary environments: Introduction to covariate shift adaptation, 2012.

M. Sugiyama, T. Suzuki, and T. Kanamori, Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation, Annals of the Institute of Statistical Mathematics, vol.64, issue.5, pp.1009-1044, 2012.

J. Sun, Q. Qu, W. , and J. , When are nonconvex problems not scary, 2015.

J. W. Taylor, Triple seasonal methods for short-term electricity demand forecasting, European Journal of Operational Research, vol.204, issue.1, pp.139-152, 2010.

J. W. Taylor, Short-term load forecasting with exponentially weighted methods, IEEE Transactions on Power Systems, vol.27, issue.1, pp.458-464, 2011.

V. Thouvenot, Estimation et sélection pour les modèles additifs et application à la prévision de la consommation électrique, 2015.

V. Thouvenot, A. Pichavant, Y. Goude, A. Antoniadis, and J. Poggi, Electricity forecasting using multi-stage estimators of nonlinear additive models, IEEE Transactions on Power Systems, vol.31, issue.5, pp.3665-3673, 2015.

R. Tibshirani and T. Hastie, Local likelihood estimation, Journal of the American Statistical Association, vol.82, issue.398, pp.559-567, 1987.

R. J. Tibshirani, Adaptive piecewise polynomial estimation via trend filtering, The Annals of Statistics, vol.42, issue.1, pp.285-323, 2014.

N. Ueda and R. Nakano, Deterministic annealing variant of the EM algorithm, Advances in neural information processing systems, pp.545-552, 1995.

R. Velu and G. C. Reinsel, Multivariate reduced-rank regression: theory and applications, vol.136, 2013.

G. Wahba, Spline bases, regularization, and generalized cross-validation for solving approximation problems with large quantities of noisy data. Approximation theory III, vol.2, 1980.

G. Wahba, Spline models for observational data, vol.59, 1990.

L. Wang, X. Zhang, and Q. Gu, A unified computational and statistical framework for nonconvex low-rank matrix estimation, 2016.

Y. Wang, D. Wipf, Q. Ling, W. Chen, and I. J. Wassell, Multi-task learning for subspace segmentation, 2015.

J. H. Ward, Hierarchical grouping to optimize an objective function, Journal of the American statistical association, vol.58, issue.301, pp.236-244, 1963.

R. Weron, Modeling and forecasting electricity loads and prices: A statistical approach, vol.403, 2007.

T. K. Wijaya, Pervasive data analytics for sustainable energy systems, 2015.

T. K. Wijaya, S. F. Humeau, M. Vasirani, and K. Aberer, Individual, aggregate, and cluster-based aggregate forecasting of residential demand, 2014.

T. K. Wijaya, M. Sinn, C. , and B. , Forecasting uncertainty in electricity demand, Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.

. Wikipedia, Energy demand management, 2019.

F. Wilcoxon, Individual comparisons by ranking methods, Breakthroughs in statistics, pp.196-202, 1992.

D. Wipf, Non-convex rank minimization via an Empirical Bayesian approach, 2014.

S. Wood and M. S. Wood, Package MGCV. R package version, vol.1, p.29, 2015.

S. N. Wood, Stable and efficient multiple smoothing parameter estimation for generalized additive models, Journal of the American Statistical Association, vol.99, issue.467, pp.673-686, 2004.

S. N. Wood, Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.73, issue.1, pp.3-36, 2011.

S. N. Wood, Generalized additive models: an introduction with R, 2017.

S. N. Wood, Y. Goude, and S. Shaw, Generalized additive models for large data sets, Journal of the Royal Statistical Society: Series C (Applied Statistics), vol.64, issue.1, pp.139-155, 2015.

M. Wytock and J. Z. Kolter, Large-scale probabilistic forecasting in energy systems using sparse gaussian conditional random fields, 52nd IEEE Conference on Decision and Control, pp.1019-1024, 2013.

Y. Xu and W. Yin, A globally convergent algorithm for nonconvex optimization based on block coordinate update, Journal of Scientific Computing, vol.72, issue.2, pp.700-734, 2017.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.68, issue.1, pp.49-67, 2006.

C. Zhang and J. Huang, The sparsity and bias of the lasso selection in high-dimensional linear regression, The Annals of Statistics, vol.36, issue.4, pp.1567-1594, 2008.

Y. Zhang and Q. Yang, A survey on multi-task learning, 2017.

S. Zhou and X. Shen, Spatially adaptive regression splines and accurate knot selection schemes, Journal of the American Statistical Association, vol.96, issue.453, pp.247-259, 2001.

C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization, ACM Transactions on Mathematical Software (TOMS), vol.23, issue.4, pp.550-560, 1997.

Z. Zhu, Q. Li, G. Tang, and M. B. Wakin, The global optimization geometry of low rank matrix optimization, 2017.

Z. Zhu, Q. Li, G. Tang, and M. B. Wakin, The global optimization geometry of nonsymmetric matrix factorization and sensing, 2017.

H. Zou, The adaptive lasso and its oracle properties, Journal of the American Statistical Association, vol.101, issue.476, pp.1418-1429, 2006.

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: series B (statistical methodology), vol.67, issue.2, pp.301-320, 2005.