Skip to Main content Skip to Navigation
Theses

Reconnaissance de textes manuscrits par modèles de Markov cachés et réseaux de neurones récurrents : application à l'écriture latine et arabe

Abstract : Handwriting recognition is an essential component of document analysis. One of the popular trends is to go from isolated word to word sequence recognition. Our work aims to propose a text-line recognition system without explicit word segmentation. In order to build an efficient model, we intervene at different levels of the recognition system. First of all, we introduce two new preprocessing techniques : a cleaning and a local baseline correction for text-lines. Then, a language model is built and optimized for handwritten mails. Afterwards, we propose two state-of-the-art recognition systems based on contextual HMMs (Hidden Markov Models) and recurrent neural networks BLSTM (Bi-directional Long Short-Term Memory). We optimize our systems in order to give a comparison of those two approaches. Our systems are evaluated on arabic and latin cursive handwritings and have been submitted to two international handwriting recognition competitions. At last, we introduce a strategy for some out-of-vocabulary character strings recognition, as a prospect of future work.
Complete list of metadata

https://pastel.archives-ouvertes.fr/tel-03677609
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, May 24, 2022 - 5:32:31 PM
Last modification on : Monday, May 30, 2022 - 9:41:13 AM
Long-term archiving on: : Tuesday, August 30, 2022 - 10:04:05 AM

File

TheseMorillot.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03677609, version 1

Collections

Citation

Olivier Morillot. Reconnaissance de textes manuscrits par modèles de Markov cachés et réseaux de neurones récurrents : application à l'écriture latine et arabe. Traitement du texte et du document. Télécom ParisTech, 2014. Français. ⟨NNT : 2014ENST0002⟩. ⟨tel-03677609⟩

Share

Metrics

Record views

71

Files downloads

18