Model Averaging in Large Scale Learning

Edwin Grappin

Thèse Année : 2018

Model Averaging in Large Scale Learning

Estimateur par agrégat en apprentissage statistique en grande dimension

(1)

Edwin Grappin

Fonction : Auteur

Centre de Recherche en Économie et Statistique

Résumé

This thesis explores properties of estimations procedures related to aggregation in the problem of high-dimensional regression in a sparse setting. The exponentially weighted aggregate (EWA) is well studied in the literature. It benefits from strong results in fixed and random designs with a PAC-Bayesian approach. However, little is known about the properties of the EWA with Laplace prior. Chapter 2 analyses the statistical behaviour of the prediction loss of the EWA with Laplace prior in the fixed design setting. Sharp oracle inequalities which generalize the properties of the Lasso to a larger family of estimators are established. These results also bridge the gap from the Lasso to the Bayesian Lasso. Chapter 3 introduces an adjusted Langevin Monte Carlo sampling method that approximates the EWA with Laplace prior in an explicit finite number of iterations for any targeted accuracy. Chapter 4 explores the statisctical behaviour of adjusted versions of the Lasso for the transductive and semi-supervised learning task in the random design setting.

Les travaux de cette thèse explorent les propriétés de procédures d'estimation par agrégation appliquées aux problèmes de régressions en grande dimension. Les estimateurs par agrégation à poids exponentiels bénéficient de résultats théoriques optimaux sous une approche PAC-Bayésienne. Cependant, le comportement théorique de l'agrégat avec extit{prior} de Laplace n'est guère connu. Ce dernier est l'analogue du Lasso dans le cadre pseudo-bayésien. Le Chapitre 2 explicite une borne du risque de prédiction de cet estimateur. Le Chapitre 3 prouve qu'une méthode de simulation s'appuyant sur un processus de Langevin Monte Carlo permet de choisir explicitement le nombre d'itérations nécessaire pour garantir une qualité d'approximation souhaitée. Le Chapitre 4 introduit des variantes du Lasso pour améliorer les performances de prédiction dans des contextes partiellement labélisés.

Mots clés

Statistical learning Regression Machine learning Estimation by aggregation Supervized; Semi-Supervized and Transductive learning PAC-Bayesian

Apprentissage statistique Régression Apprentissage automatique Estimation par agrégation Apprentissage supervisé, semi-supervisé et transductif PAC-Bayésien

Domaines

Statistiques [math.ST] Probabilités [math.PR]

Fichier principal

73105_GRAPPIN_2018_archivage.pdf (3.39 Mo)

Origine : Version validée par le jury (STAR)

ABES STAR : Contact

https://pastel.hal.science/tel-01735320

Soumis le : jeudi 15 mars 2018-17:19:07

Dernière modification le : mardi 16 janvier 2024-03:06:36

Archivage à long terme le : mardi 11 septembre 2018-00:10:21

Dates et versions

tel-01735320 , version 1 (15-03-2018)

Identifiants

HAL Id : tel-01735320 , version 1

Citer

Edwin Grappin. Model Averaging in Large Scale Learning. Statistics [math.ST]. Université Paris Saclay (COmUE), 2018. English. ⟨NNT : 2018SACLG001⟩. ⟨tel-01735320⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X PASTEL GENES CNRS ENSAE INSMI STAR PARISTECH CREST ENSAI UNIV-PARIS-SACLAY X-CREST

353 Consultations

376 Téléchargements

Model Averaging in Large Scale Learning

Estimateur par agrégat en apprentissage statistique en grande dimension

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager