Apprentissage automatique de relations d'équivalence sémantique à partir du Web

Abstract : This PhD thesis can be situated in the context of a question answering system, which is capable of automatically finding answers to factual questions on the Web. One way to improve the quality of these answers is to increase the recall rate of the system, by identifying the answers under multiple possible formulations(paraphrases). As the manual recording of paraphrases is a long and expensive task, the goal of this PhD thesis is to design and develop a mechanism that learns automatically and in a weakly supervised manner the possible paraphrases of an answer. Thanks to the redundance and the linguistic variety of the information it contains, the Web is considered to be a very interesting corpus. Assimilated to a gigantic bipartite graph represented, on the one hand, by formulations and, on the other hand, by argument couples, the Web turns out to be propitious to the application of Firth's hypothesis, according to which "you shall know a word (resp. a formulation, in our case) by the company (resp. arguments) it keeps". Consequently, the Web is sampled using an iterative mechanism : formulations (potential paraphrases) are extracted by anchoring arguments and, inversely, new arguments are extracted by anchoring the acquired formulations. In order to make the learning process converge, an intermediary stage is necessary, which partitions the sampled data using a statistical classification method. The obtained results were empirically evaluated, which, more particularly, shows the value added by the learnt paraphrases of the question answering system.
Document type :
Theses
Complete list of metadatas

https://pastel.archives-ouvertes.fr/pastel-00001119
Contributor : Ecole Télécom Paristech <>
Submitted on : Monday, November 22, 2010 - 5:09:54 PM
Last modification on : Wednesday, February 20, 2019 - 2:41:00 PM
Long-term archiving on : Friday, September 10, 2010 - 3:27:45 PM

Identifiers

  • HAL Id : pastel-00001119, version 1

Citation

Florence Duclaye. Apprentissage automatique de relations d'équivalence sémantique à partir du Web. domain_other. Télécom ParisTech, 2003. Français. ⟨pastel-00001119⟩

Share

Metrics

Record views

809

Files downloads

1075