Skip to Main content Skip to Navigation

Structured priors for supervised learning in computational biology  

Abstract : Supervised learning methods are used to build functions which accurately predict the behavior of new objects from observed data. They are therefore extremely useful in several computational biology problems, where they can exploit the increasing amount of empirical data generated by high-throughput technologies, or the accumulation of experimental knowledge in public databases. In several cases however, the amount of training data is not sufficient to deal with the complexity of the learning problem. Fortunately this type of ill-posed problem is not new in statistics and statistical machine learning. It is classically addressed using regularization approaches, or equivalently using a prior on what the function should be like. In this thesis, we build on this principle and propose new regularization methods based on biological prior knowledge for each problem. In the context of in silico vaccine and drug design, we show how using the knowledge that similar targets bind similar ligands, one can improve dramatically the prediction accuracy for the targets with little known ligands, and even make predictions for targets with no known ligand. We also design a convex regularization function which takes into account the fact that only some unknown beforehand groups of targets tend to have the same binding behavior. Finally, in the context of outcome prediction from molecular data, we propose a regularization function which leads to sparse vector whose support is typically a union of potentially overlapping groups of genes defined a priori like, e.g., pathways, or a set of genes which tend to be connected to each other when a graph reflecting biological information is given.
Document type :
Complete list of metadatas
Contributor : Ecole Mines Paristech <>
Submitted on : Thursday, January 21, 2010 - 8:00:00 AM
Last modification on : Wednesday, October 14, 2020 - 3:52:35 AM
Long-term archiving on: : Friday, September 10, 2010 - 3:00:55 PM


  • HAL Id : pastel-00005743, version 1


Laurent Jacob. Structured priors for supervised learning in computational biology  . Life Sciences [q-bio]. École Nationale Supérieure des Mines de Paris, 2009. English. ⟨NNT : 2009ENMP1644⟩. ⟨pastel-00005743⟩



Record views


Files downloads