Deciphering splicing with sparse regression techniques in the era of high-throughput RNA sequencing.

Abstract : The number of protein-coding genes in a human, a nematodeand a fruit fly are roughly equal.The paradoxical miscorrelation between the number of genesin an organism's genome and its phenotypic complexityfinds an explanation in the alternative natureof splicing in higher organisms.Alternative splicing largely increases the functionaldiversity of proteins encoded by a limitednumber of genes.It is known to be involved incell fate decisionand embryonic development,but also appears to be dysregulatedin inherited and acquired human genetic disorders,in particular in cancers.High-throughput RNA sequencing technologiesallow us to measure and question splicingat an unprecedented resolution.However, while the cost of sequencing RNA decreasesand throughput increases,many computational challenges arise from the discrete and local nature of the data.In particular, the task of inferring alternative transcripts requires a non-trivial deconvolution procedure.In this thesis, we contribute to deciphering alternative transcript expressions andalternative splicing events fromhigh-throughput RNA sequencing data.We propose new methods to accurately and efficientlydetect and quantify alternative transcripts.Our methodological contributionslargely rely on sparse regression techniquesand takes advantage ofnetwork flow optimization techniques.Besides, we investigate means to query splicing abnormalitiesfor clinical diagnosis purposes.We suggest an experimental protocolthat can be easily implemented in routine clinical practice,and present new statistical models and algorithmsto quantify splicing events and measure how abnormal these eventsmight be in patient data compared to wild-type situations.
Document type :
Theses
Liste complète des métadonnées

Cited literature [200 references]  Display  Hide  Download

https://pastel.archives-ouvertes.fr/tel-01681314
Contributor : Abes Star <>
Submitted on : Friday, January 12, 2018 - 10:36:08 AM
Last modification on : Friday, April 5, 2019 - 10:58:29 AM
Document(s) archivé(s) le : Wednesday, May 23, 2018 - 7:42:29 PM

File

2016PSLEM063_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01681314, version 2

Citation

Elsa Bernard. Deciphering splicing with sparse regression techniques in the era of high-throughput RNA sequencing.. Bioinformatics [q-bio.QM]. PSL Research University, 2016. English. ⟨NNT : 2016PSLEM063⟩. ⟨tel-01681314v2⟩

Share

Metrics

Record views

297

Files downloads

120