Multi-dimensional probing for RNA secondary structure(s) prediction

Abstract : In structural bioinformatics, predicting the secondary structure(s) of ribonucleic acids (RNAs) represents a major direction of research to understand cellular mechanisms. A classic approach for structure postulates that, at the thermodynamic equilibrium, RNA adopts its various conformations according to a Boltzmann distribution based on its free energy. Modern approaches, therefore, favor the consideration of the dominant conformations. Such approaches are limited in accuracy due to the imprecision of the energy model and the structure topology restrictions.Experimental data can be used to circumvent the shortcomings of predictive computational methods. RNA probing encompasses a wide array of experimental protocols dedicated to revealing partial structural information through exposure to a chemical or enzymatic reagent, whose effect depends on, and thus reveals, features of its adopted structure(s). Accordingly, single-reagent probing data is used to supplement free-energy models within computational methods, leading to significant gains in prediction accuracy. In practice, however, structural biologists integrate probing data produced in various experimental conditions, using different reagents or over a collection of mutated sequences, to model RNA structure(s). This integrative approach remains manual, time-consuming and arguably subjective in its modeling principles. In this Ph.D., we contributed in silico methods for an automated modeling of RNA structure(s) from multiple sources of probing data.We have first established automated pipelines for the acquisition of reactivity profiles from primary data produced through a variety of protocols (SHAPE, DMS using Capillary Electrophoresis, SHAPE-Map/Ion Torrent). We have designed and implemented a new, versatile, method that simultaneously integrates multiple probing profiles. Based on a combination of Boltzmann sampling and structural clustering, it produces alternative stable conformations jointly supported by a set of probing experiments. As it favors recurrent structures, our method allows exploiting the complementarity of several probing assays. The quality of predictions produced using our method compared favorably against state-of-the-art computational predictive methods on single-probing assays.Our method was used to identify models for structured regions in RNA viruses. In collaboration with experimental partners, we suggested a refined structure of the HIV-1 Gag IRES, showing a good compatibility with chemical and enzymatic probing data. The predicted structure allowed us to build hypotheses on binding sites that are functionally relevant to the translation. We also proposed conserved structures in Ebola Untranslated regions, showing a high consistency with both SHAPE probing and evolutionary data. Our modeling allows us to detect conserved and stable stem-loop at the 5’end of each UTR, a typical structure found in viral genomes to protect the RNA from being degraded by nucleases.Our method was extended to the analysis of sequence variants. We analyzed a collection of DMS probed mutants, produced by the Mutate-and-Map protocol, leading to better structural models for the GIR1 lariat-capping ribozyme than from the sole wild-type sequence. To avoid systematic production of point-wise mutants, and exploit the recent SHAPEMap protocol, we designed an experimental protocol based on undirected mutagenesis and sequencing, where several mutated RNAs are produced and simultaneously probed. Produced reads must then be re-assigned to mutants to establish their reactivity profiles used later for structure modeling. The assignment problem was modeled as a likelihood maximization joint inference of mutational profiles and assignments, and solved using an instance of the "Expectation-Maximization" algorithm. Preliminary results on a reduced/simulated sample of reads showed a remarkable decrease of the reads assignment errors compared to a classic algorithm.
Document type :
Theses
Complete list of metadatas

Cited literature [88 references]  Display  Hide  Download

https://pastel.archives-ouvertes.fr/tel-01968071
Contributor : Abes Star <>
Submitted on : Wednesday, January 2, 2019 - 1:31:45 AM
Last modification on : Thursday, April 11, 2019 - 8:01:55 AM
Long-term archiving on : Wednesday, April 3, 2019 - 2:38:42 PM

File

73469_SAAIDI_2018_archivage.pd...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01968071, version 1

Citation

Afaf Saaidi. Multi-dimensional probing for RNA secondary structure(s) prediction. Bioinformatics [q-bio.QM]. Université Paris-Saclay, 2018. English. ⟨NNT : 2018SACLX067⟩. ⟨tel-01968071⟩

Share

Metrics

Record views

174

Files downloads

148