Skip to Main content Skip to Navigation

Imputation multiple par analyse factorielle : Une nouvelle méthodologie pour traiter les données manquantes

Abstract : This thesis proposes new multiple imputation methods that are based on principal component methods, which were initially used for exploratory analysis and visualisation of continuous, categorical and mixed multidimensional data. The study of principal component methods for imputation, never previously attempted, offers the possibility to deal with many types and sizes of data. This is because the number of estimated parameters is limited due to dimensionality reduction.First, we describe a single imputation method based on factor analysis of mixed data. We study its properties and focus on its ability to handle complex relationships between variables, as well as infrequent categories. Its high prediction quality is highlighted with respect to the state-of-the-art single imputation method based on random forests.Next, a multiple imputation method for continuous data using principal component analysis (PCA) is presented. This is based on a Bayesian treatment of the PCA model. Unlike standard methods based on Gaussian models, it can still be used when the number of variables is larger than the number of individuals and when correlations between variables are strong.Finally, a multiple imputation method for categorical data using multiple correspondence analysis (MCA) is proposed. The variability of prediction of missing values is introduced via a non-parametric bootstrap approach. This helps to tackle the combinatorial issues which arise from the large number of categories and variables. We show that multiple imputation using MCA outperforms the best current methods.
Document type :
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Wednesday, June 22, 2016 - 4:58:34 PM
Last modification on : Wednesday, April 6, 2022 - 4:08:06 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01336206, version 1


Vincent Audigier. Imputation multiple par analyse factorielle : Une nouvelle méthodologie pour traiter les données manquantes. Analyse numérique [math.NA]. Agrocampus Ouest, 2015. Français. ⟨NNT : 2015NSARG015⟩. ⟨tel-01336206⟩



Record views


Files downloads