Modélisation de contextes pour l'annotation sémantique de vidéos

Résumé : Recent years have witnessed an explosion of multimedia contents available. In 2010 the video sharing website YouTube announced that 35 hours of videos were uploaded on its site every minute, whereas in 2008 users were "only" uploading 12 hours of video per minute. Due to the growth of data volumes, human analysis of each video is no longer a solution; there is a need to develop automated video analysis systems. This thesis proposes a solution to automatically annotate video content with a textual description. The thesis core novelty is the consideration of multiple contextual information to perform the annotation. With the constant expansion of visual online collections, automatic video annotation has become a major problem in computer vision. It consists in detecting various objects (human, car. . . ), dynamic actions (running, driving. . . ) and scenes characteristics (indoor, outdoor. . . ) in unconstrained videos. Progress in this domain would impact a wild range of applications including video search, video intelligent surveillance or human-computer interaction.Although some improvements have been shown in concept annotation, it still remains an unsolved problem, notably because of the semantic gap. The semantic gap is defined as the lack of correspondences between video features and high-level human understanding. This gap is principally due to the concepts intra-variability caused by photometry change, objects deformation, objects motion, camera motion or viewpoint change... To tackle the semantic gap, we enrich the description of a video with multiple contextual information. Context is defined as "the set of circumstances in which an event occurs". Video appearance, motion or space-time distribution can be considered as contextual clues associated to a concept. We state that one context is not informative enough to discriminate a concept in a video. However, by considering several contexts at the same time, we can address the semantic gap.
Type de document :
Autre [cs.OH]. Ecole Nationale Supérieure des Mines de Paris, 2013. Français. 〈NNT : 2013ENMP0051〉
Liste complète des métadonnées

Littérature citée [211 références]  Voir  Masquer  Télécharger
Contributeur : Abes Star <>
Soumis le : mardi 11 mars 2014 - 17:02:16
Dernière modification le : lundi 12 novembre 2018 - 10:56:46
Document(s) archivé(s) le : mercredi 11 juin 2014 - 13:11:26


Version validée par le jury (STAR)


  • HAL Id : pastel-00958135, version 1


Nicolas Ballas. Modélisation de contextes pour l'annotation sémantique de vidéos. Autre [cs.OH]. Ecole Nationale Supérieure des Mines de Paris, 2013. Français. 〈NNT : 2013ENMP0051〉. 〈pastel-00958135〉



Consultations de la notice


Téléchargements de fichiers