Learning from multiple genomic information in cancer for diagnosis and prognosis

Matahi Moarii

Résumé

Several initiatives have been launched recently to investigate the molecular characterisation of large cohorts of human cancers with various high-throughput technologies in order to understanding the major biological alterations related to tumorogenesis. The information measured include gene expression, mutations, copy-number variations, as well as epigenetic signals such as DNA methylation. Large consortiums such as “The Cancer Genome Atlas” (TCGA) have already gathered publicly thousands of cancerous and non-cancerous samples. We contribute in this thesis in the statistical analysis of the relationship between the different biological sources, the validation and/or large scale generalisation of biological phenomenon using an integrative analysis of genetic and epigenetic data.Firstly, we show the role of DNA methylation as a surrogate biomarker of clonality between cells which would allow for a powerful clinical tool for to elaborate appropriate treatments for specific patients with breast cancer relapses.In addition, we developed systematic statistical analyses to assess the significance of DNA methylation variations on gene expression regulation. We highlight the importance of adding prior knowledge to tackle the small number of samples in comparison with the number of variables. In return, we show the potential of bioinformatics to infer new interesting biological hypotheses.Finally, we tackle the existence of the universal biological phenomenon related to the hypermethylator phenotype. Here, we adapt regression techniques using the similarity between the different prediction tasks to obtain robust genetic predictive signatures common to all cancers and that allow for a better prediction accuracy.In conclusion, we highlight the importance of a biological and computational collaboration in order to establish appropriate methods to the current issues in bioinformatics that will in turn provide new biological insights.

De nombreuses initiatives ont été mises en places pour caractériser d'un point de vue moléculaire de grandes cohortes de cancers à partir de diverses sources biologiques dans l'espoir de comprendre les altérations majeures impliquées durant la tumorogénèse. Les données mesurées incluent l'expression des gènes, les mutations et variations de copy-number, ainsi que des signaux épigénétiques tel que la méthylation de l'ADN. De grands consortium tels que “The Cancer Genome Atlas” (TCGA) ont déjà permis de rassembler plusieurs milliers d'échantillons cancéreux mis à la disposition du public. Nous contribuons dans cette thèse à analyser d'un point de vue mathématique les relations existant entre les différentes sources biologiques, valider et/ou généraliser des phénomènes biologiques à grande échelle par une analyse intégrative de données épigénétiques et génétiques.En effet, nous avons montré dans un premier temps que la méthylation de l'ADN était un marqueur substitutif intéressant pour jauger du caractère clonal entre deux cellules et permettait ainsi de mettre en place un outil clinique des récurrences de cancer du sein plus précis et plus stable que les outils actuels, afin de permettre une meilleure prise en charge des patients.D'autre part, nous avons dans un second temps permis de quantifier d'un point de vue statistique l'impact de la méthylation sur la transcription. Nous montrons l'importance d'incorporer des hypothèses biologiques afin de pallier au faible nombre d'échantillons par rapport aux nombre de variables.Enfin, nous montrons l'existence d'un phénomène biologique lié à l'apparition d'un phénotype d'hyperméthylation dans plusieurs cancers. Pour cela, nous adaptons des méthodes de régression en utilisant la similarité entre les différentes tâches de prédictions afin d'obtenir des signatures génétiques communes prédictives du phénotypes plus précises.En conclusion, nous montrons l'importance d'une collaboration biologique et statistique afin d'établir des méthodes adaptées aux problématiques actuelles en bioinformatique.

Learning from multiple genomic information in cancer for diagnosis and prognosis

Apprentissage de données génomiques multiples pour le diagnostic et le pronostic du cancer

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager