Skip to Main content Skip to Navigation

Design computationnel de protéines pour la prédiction de structure

Abstract : Thanks to recent technological breakthroughs and the arrival of new generation sequencers, the amount of genomic data raises exponentially while the gap with the number of solved structures is widening. Ideally, computational 3D structure prediction should be possible with the only sequence information, even without any homology. Indeed, below 30% of sequence identity, similarity measurements are not efficient enough to detect homology. Therefore, it is necessary to implement new methods to take apart the twilight zone. Usually, for a given structure (and so a biological function), only a few existing sequences is known, and barely similar. Thus it is difficult to build a profile in order to find homologues without knowledge of the structure. How can we have databases of sequences for each structure ? The Computational Protein Design (CPD) try to answer this issue : if a fold is known, it is possible to predict every matching sequence ? The CPD consists of recognizing, among all compatible sequences with the wanted fold, those whom will confer to the protein the wanted function. Two steps are needed. The first one consists of calculating some energy matrix holding interaction energies between every pair of residues of the protein by allowing successively all types of amino acids in every possible conformation. The second one, or "optimization step", consists of exploring simultaneously spaces of sequences and conformations in order to determine the best combination of amino acids with the fold given at the beginning. First, the analysis of covariances of alignment positions of theoretical sequences has been managed. We succeeded in the implementation of a statistical method to locate positions that mutate together for a given structure. The profile built with all these theoretical sequences averages too strongly the amino acids data. That is why we improve the homologues searching using groups of sequences classified with the help of patterns located on these positions of covariance. To appreciate the quality of these predictions of theoretical sequences, we had to implement a selection protocol of the best mutated proteins in order to test them in vivo. Nonetheless how can we determine that a sequence is better that another ? What are the relevant criteria ? Thus, a set of descriptors have been chosen to sort the theoretical sequences on the basis of various criteria. Eventually, we got a dozen of sequences. Then, theses mutated proteins have been submitted to molecular dynamics simulations to assess their theoretical stability. For the most encouraging mutated proteins, experimentations took place to get a biological validation of the CPD model : over-expression, purification, structural determination... These protocols of analysis and validation seem to be good means will allow our team to test other mutant proteins in the future. So they can modify parameters during the generation by CPD and lean on experimental results to adjust them.
Complete list of metadatas
Contributor : Audrey Sedano-Pelzer <>
Submitted on : Monday, May 27, 2013 - 7:34:57 PM
Last modification on : Wednesday, July 29, 2020 - 4:10:05 PM
Long-term archiving on: : Tuesday, April 4, 2017 - 11:44:48 AM


  • HAL Id : pastel-00826589, version 1



Audrey Sedano-Pelzer. Design computationnel de protéines pour la prédiction de structure. Bio-informatique [q-bio.QM]. Ecole Polytechnique X, 2013. Français. ⟨pastel-00826589⟩



Record views


Files downloads