Imputation multiple par analyse factorielle : Une nouvelle méthodologie pour traiter les données manquantes

Abstract : This thesis proposes new multiple imputation methods that are based on principal component methods, which were initially used for exploratory analysis and visualisation of continuous, categorical and mixed multidimensional data. The study of principal component methods for imputation, never previously attempted, offers the possibility to deal with many types and sizes of data. This is because the number of estimated parameters is limited due to dimensionality reduction.First, we describe a single imputation method based on factor analysis of mixed data. We study its properties and focus on its ability to handle complex relationships between variables, as well as infrequent categories. Its high prediction quality is highlighted with respect to the state-of-the-art single imputation method based on random forests.Next, a multiple imputation method for continuous data using principal component analysis (PCA) is presented. This is based on a Bayesian treatment of the PCA model. Unlike standard methods based on Gaussian models, it can still be used when the number of variables is larger than the number of individuals and when correlations between variables are strong.Finally, a multiple imputation method for categorical data using multiple correspondence analysis (MCA) is proposed. The variability of prediction of missing values is introduced via a non-parametric bootstrap approach. This helps to tackle the combinatorial issues which arise from the large number of categories and variables. We show that multiple imputation using MCA outperforms the best current methods.
Document type :
Theses
Complete list of metadatas

https://pastel.archives-ouvertes.fr/tel-01336206
Contributor : Abes Star <>
Submitted on : Wednesday, June 22, 2016 - 4:58:34 PM
Last modification on : Wednesday, March 21, 2018 - 4:08:05 PM

File

pdf2star-1466604887-These_audi...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01336206, version 1

Collections

Citation

Vincent Audigier. Imputation multiple par analyse factorielle : Une nouvelle méthodologie pour traiter les données manquantes. Analyse numérique [math.NA]. Agrocampus Ouest, 2015. Français. ⟨NNT : 2015NSARG015⟩. ⟨tel-01336206⟩

Share

Metrics

Record views

413

Files downloads

885