Image and video text recognition using convolutional neural networks

Abstract : Thanks to increasingly powerful storage media, multimedia resources have become nowadays essential resources, in the field of information and broadcasting (News Agency, INA), culture (museums), transport (monitoring), environment (satellite images), or medical imaging (medical records in hospitals). Thus, the challenge is how to quickly find relevant information. Therefore, research in multimedia is increasingly focused on indexing and retrieval techniques. To accomplish this task, the text within images and videos can be a relevant key. The challenges of recognizing text in images and videos are many: poor resolution, characters of different sizes, artifacts due to compression and effects of anti-recovery, very complex and variable background. There are four steps for the recognition of the text: (1) detecting the presence of the text, (2) localizing of the text, (3) extracting and enhancing the text area, and finally (4) recognizing the content of the text. In this work we will focus on this last step and we assume that the text box has been detected, located and retrieved correctly. This recognition module can also be divided into several sub-modules such as a binarization module, a text segmentation module, a character recognition module. We focused on a particular machine learning algorithm called convolutional neural networks (CNNs). These are networks of neurons whose topology is similar to the mammalian visual cortex. CNNs were initially used for recognition of handwritten digits. They were then applied successfully on many problems of pattern recognition. We propose in this thesis a new method of binarization of text images, a new method for segmentation of text images, the study of a convolutional neural network for character recognition in images, a discussion on the relevance of the binarization step in the recognition of text in images based on machine learning methods, and a new method of text recognition in images based on graph theory.
Type de document :
domain_other. Télécom ParisTech, 2008. English
Domaine :
Liste complète des métadonnées
Contributeur : Ecole Télécom Paristech <>
Soumis le : lundi 22 juin 2009 - 08:00:00
Dernière modification le : mardi 23 janvier 2018 - 16:59:10
Document(s) archivé(s) le : dimanche 27 novembre 2016 - 00:37:08


  • HAL Id : pastel-00004685, version 1


Zohra Saidane. Image and video text recognition using convolutional neural networks. domain_other. Télécom ParisTech, 2008. English. 〈pastel-00004685〉



Consultations de la notice


Téléchargements de fichiers