Efficient finite-state algorithms for the application of local grammars

Abstract : This work focuses on the research and development of efficient algorithms of application of local grammars, taking as reference those of the currently existent open-source systems : Unitex's top-down parser and Outilex's Earley-like parser. Local grammars are a finite-state based formalism for the representation of natural language grammars. Moreover, local grammars are a model for the construction of fully scaled and accurated descriptions of the syntax of natural languages by means of systematic observation and methodical accumulation of data. The adequacy of local grammars for this task has been proved by multiple works. Due to the ambiguous nature of natural languages, and the particular properties of local grammars, classic parsing algorithms such as LR, CYK's and Tomita's cannot be used in the context of this work. Top-down and Earley parsers are possible alternatives, though they have an exponential worst-case cost for the case of local grammars. We have first conceived an algorithm of application of local grammars having a polynomial worst-case cost. Furthermore, we have conceived other optimizations which increase the efficiency of the algorithm for general cases, namely the efficient management of sets of elements and sequences. We have implemented our algorithm and those of the Unitex and Outilex systems with the same tools in order to test them under the same conditions. Moreover, we have implemented different versions of each algorithm, either using our custom set data structures or those included in GNU's implementation of the C++ Standard Template Library (STL). We have compared the performances of the different algorithms and algorithm versions in the context of an industrial natural language application provided by the enterprise Telefónica I+D : extending the understanding capabilities of a chatterbot that provides mobile services, such as sending SMSs to mobile phones as well as games and other digital contents. Conversation with the chatterbot is held in Spanish by means of Microsoft's Windows Live Messenger. In spite of the limited domain and the simplicity of the applied grammars, execution times of our parsing algorithm coupled with our custom implementation of sets were lower. Thanks to the improved asymptotic cost of our algorithm, execution times for the case of complex and large coverage grammars can be expected to be considerably lower than those of the Unitex and Outilex algorithms
Complete list of metadatas

https://pastel.archives-ouvertes.fr/tel-00621249
Contributor : Abes Star <>
Submitted on : Wednesday, February 8, 2012 - 12:22:12 PM
Last modification on : Thursday, April 12, 2018 - 1:53:49 AM
Long-term archiving on : Wednesday, May 9, 2012 - 2:35:51 AM

File

TH2011PEST1047_complete.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00621249, version 2

Citation

Javier Miguel Sastre Martinez. Efficient finite-state algorithms for the application of local grammars. Other [cs.OH]. Université Paris-Est, 2011. English. ⟨NNT : 2011PEST1047⟩. ⟨tel-00621249v2⟩

Share

Metrics

Record views

766

Files downloads

163