Image and video text recognition using convolutional neural networks - PASTEL - Thèses en ligne de ParisTech Accéder directement au contenu
Thèse Année : 2008

Image and video text recognition using convolutional neural networks

Reconnaissance de texte dans les images et les vidéos en utilisant les réseaux de neurones à convolutions

Résumé

Thanks to increasingly powerful storage media, multimedia resources have become nowadays essential resources, in the field of information and broadcasting (News Agency, INA), culture (museums), transport (monitoring), environment (satellite images), or medical imaging (medical records in hospitals). Thus, the challenge is how to quickly find relevant information. Therefore, research in multimedia is increasingly focused on indexing and retrieval techniques. To accomplish this task, the text within images and videos can be a relevant key. The challenges of recognizing text in images and videos are many: poor resolution, characters of different sizes, artifacts due to compression and effects of anti-recovery, very complex and variable background. There are four steps for the recognition of the text: (1) detecting the presence of the text, (2) localizing of the text, (3) extracting and enhancing the text area, and finally (4) recognizing the content of the text. In this work we will focus on this last step and we assume that the text box has been detected, located and retrieved correctly. This recognition module can also be divided into several sub-modules such as a binarization module, a text segmentation module, a character recognition module. We focused on a particular machine learning algorithm called convolutional neural networks (CNNs). These are networks of neurons whose topology is similar to the mammalian visual cortex. CNNs were initially used for recognition of handwritten digits. They were then applied successfully on many problems of pattern recognition. We propose in this thesis a new method of binarization of text images, a new method for segmentation of text images, the study of a convolutional neural network for character recognition in images, a discussion on the relevance of the binarization step in the recognition of text in images based on machine learning methods, and a new method of text recognition in images based on graph theory.

Domaines

Fichier principal
Vignette du fichier
phd_saidane_final.pdf (5.25 Mo) Télécharger le fichier

Dates et versions

pastel-00004685 , version 1 (22-06-2009)

Identifiants

  • HAL Id : pastel-00004685 , version 1

Citer

Zohra Saidane. Image and video text recognition using convolutional neural networks. domain_other. Télécom ParisTech, 2008. English. ⟨NNT : ⟩. ⟨pastel-00004685⟩
556 Consultations
3140 Téléchargements

Partager

Gmail Facebook X LinkedIn More