Skip to Main content Skip to Navigation

Modèles à variables latentes pour des données issues de tiling arrays : Applications aux expériences de ChIP-chip et de transcriptome

Abstract : Tiling arrays make possible a large scale exploration of the genome with high resolution. Biological questions usually addressed are either the gene expression or the detection of transcribed regions which can be investigated via transcriptomic experiments, and also the regulation of gene expression thanks to ChIP-chip experiments. In order to analyse ChIP-chip and transcriptomic data, we propose latent variable models, especially Hidden Markov Models, which are part of unsupervised classification methods. The biological features of the tiling arrays signal, such as the spatial dependence between observations along the genome and structural annotation are integrated in the model. Moreover, the models are adapted to the biological question at hand and a model is proposed for each type of experiment. We propose a mixture of regressions for the comparison of two samples, when one sample can be considered as a reference sample (ChIP-chip), and a two-dimensional Gaussian model with constraints on the variance parameter when the two samples play symmetrical roles (transcriptome). Finally, a semi-parametric modeling is considered, allowing more flexible emission distributions. With the objective of classification, we propose a false-positive control in the case of a two-cluster classification and for independent observations. Then, we focus on the classification of a set of observations forming a region of interest such as a gene. The different models are illustrated on real ChIP-chip and transcriptomic datasets coming from a NimbleGen tiling array covering the entire genome of Arabidopsis thaliana.
Document type :
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, June 7, 2022 - 11:05:52 AM
Last modification on : Wednesday, September 28, 2022 - 3:07:24 PM
Long-term archiving on: : Thursday, September 8, 2022 - 6:34:01 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03689397, version 1


Caroline Berard. Modèles à variables latentes pour des données issues de tiling arrays : Applications aux expériences de ChIP-chip et de transcriptome. Biologie végétale. AgroParisTech, 2011. Français. ⟨NNT : 2011AGPT0067⟩. ⟨tel-03689397⟩



Record views


Files downloads