Skip to Main content Skip to Navigation
Theses

Reconnaissance de mots manuscrits cursifs par modèles de Markov cachés en contexte : application au français, à l'anglais et à l'arabe

Abstract : This thesis aims at elaborating a new handwritten words recognition system that can be learned and applied on any handwriting style and any alphabet. An analytic approach is used. Words are divided into subparts (characters or graphemes) that have to be modelled. The division is made implicitly thanks to sliding windows, which transform the word images into sequences. Hidden Markov Models, widely known as one of the most powerful tools for sequence modelling, are chosen to model the characters. A Bakis-type HMM represents each character. This enables the model to absorb variations in handwriting. A word model is built by concatenating its compound characters models. In this thesis, the choice is made to strengthen the HMM modelling by acting directly within the models. To this end, a new approach is proposed, using context knowledge : each character model depends on its context (its preceding and following characters). This new character model is named trigraph. Taking into account the characters environment allows more precise and more effective models to be built. However, this implies a multiplication of HMM parameters to be learned (often on a restricted number of observation data). An original method for parameter grouping is proposed in this thesis to overcome this issue : a state-based clustering, performed on each state position and based on binary decision trees. This type of clustering is new in the handwriting recognition field. It has many advantages, including parameter reduction. Moreover, the use of decision trees allows the HMMs to keep one of their most interesting attributes : independence between training and testing lexicon.
Document type :
Theses
Complete list of metadatas

https://pastel.archives-ouvertes.fr/pastel-00656402
Contributor : Anne-Laure Bianne Bernard <>
Submitted on : Wednesday, January 4, 2012 - 11:09:04 AM
Last modification on : Friday, July 31, 2020 - 10:44:06 AM
Long-term archiving on: : Monday, November 19, 2012 - 12:15:30 PM

Identifiers

  • HAL Id : pastel-00656402, version 1

Collections

Citation

Anne-Laure Bianne Bernard. Reconnaissance de mots manuscrits cursifs par modèles de Markov cachés en contexte : application au français, à l'anglais et à l'arabe. Traitement des images [eess.IV]. Télécom ParisTech, 2011. Français. ⟨pastel-00656402⟩

Share

Metrics

Record views

1272

Files downloads

3725