Skip to Main content Skip to Navigation

Computational protein design : un outil pour l'ingénierie des protéines et la biologie synthétique

Abstract : Computational Protein Design, or CPD is the search for the amino acid sequences compatible with a targeted protein structure. The goal is to design a new function and/or add a new behavior. CPD has been developed in our laboratory for several years, with the software Proteus which has several successes to its credit. Our approach uses a physics-based energy model, and relies on the energy difference between the folded and unfolded states of the protein. During this thesis, we enriched Proteus on several points, including the addition of a Monte Carlo exploration method with Replica Exchange or REMC. We compared extensively three stochastic methods for the exploration of sequence space: REMC, plain Monte Carlo and a heuristic designed for CPD: Multistart Steepest Descent or MSD.These comparisons concerned nine proteins from three structural families: SH2, SH3 and PDZ. Using the exploration techniques above, we were able to identify the Global Minimum EnergyConformation, or GMEC for nearly all the test cases where up to10 positions of the polypeptide chain were free to mutate (the others retaining their native types). For the tests where 20positions were free to mutate, the GMEC was identified in 2/3 of the cases. Overall, REMC and MSD give very good sequences in terms of energy, often identical or very close to the GMEC. MSDperformed best in the tests with 30 mutating positions. REMCwith eight replicas and optimized parameters often gave the best result when all positions could mutate. Moreover, compared to an exact enumeration of the low energy sequences, REMC provided a sample of sequences with a high sequence diversity.In the second part of this work, we tested our CPD model forPDZ domain design. For the folded state, we used two variants ofa GB solvent model. The first used a mean, effective protein/solvent dielectric boundary; the second one, more rigorous, used an exact boundary that flucutated over the MCtrajectory. To characterize the unfolded state, we used a set of amino acid chemical potentials or reference energies. These reference energies were determined by maximizing a likelihoodfunction so as to reproduce the amino acid frequencies in naturalPDZ domains. The sequences designed by Proteus were compared to the natural sequences. Our sequences are globally similar to the Pfam sequences, in the sense of the BLOSUM40scores, with especially high scores for the residues in the core ofthe protein. The more rigorous GB variant always gives sequences similar to moderately distant natural homologues and perfect recognition by the the Super family fold recognition tool.Our sequences were also compared to those produced by the Rosetta software. The quality, according to the same criteria as before, was very similar, but the Rosetta sequences exhibit fewer mutations than the Proteus sequences.
Complete list of metadatas
Contributor : Abes Star :  Contact
Submitted on : Tuesday, March 20, 2018 - 3:46:08 PM
Last modification on : Wednesday, October 14, 2020 - 4:10:40 AM
Long-term archiving on: : Tuesday, September 11, 2018 - 9:01:21 AM


Version validated by the jury (STAR)


  • HAL Id : tel-01738535, version 1


David Mignon. Computational protein design : un outil pour l'ingénierie des protéines et la biologie synthétique. Chemo-informatique. Université Paris Saclay (COmUE), 2017. Français. ⟨NNT : 2017SACLX089⟩. ⟨tel-01738535⟩



Record views


Files downloads