Skip to Main content Skip to Navigation

Modélisation de contextes pour l'annotation sémantique de vidéos

Abstract : Recent years have witnessed an explosion of multimedia contents available. In 2010the video sharing website YouTube announced that 35 hours of videos were uploadedon its site every minute, whereas in 2008 users were "only" uploading 12 hours ofvideo per minute. Due to the growth of data volumes, human analysis of each videois no longer a solution; there is a need to develop automated video analysis systems.This thesis proposes a solution to automatically annotate video content with atextual description. The thesis core novelty is the consideration of multiple contex-tual information to perform the annotation.With the constant expansion of visual online collections, automatic video annota-tion has become a major problem in computer vision. It consists in detecting variousobjects (human, car. . . ), dynamic actions (running, driving. . . ) and scenes charac-teristics (indoor, outdoor. . . ) in unconstrained videos. Progress in this domain wouldimpact a wild range of applications including video search, video intelligent surveil-lance or human-computer interaction.Although some improvements have been shown in concept annotation, it still re-mains an unsolved problem, notably because of the semantic gap. The semantic gapis defined as the lack of correspondences between video features and high-level humanunderstanding. This gap is principally due to the concepts intra-variability causedby photometry change, objects deformation, objects motion, camera motion or view-point change. . .To tackle the semantic gap, we enrich the description of a video with multiplecontextual information. Context is defined as "the set of circumstances in which anevent occurs". Video appearance, motion or space-time distribution can be consid-ered as contextual clues associated to a concept. We state that one context is notinformative enough to discriminate a concept in a video. However, by consideringseveral contexts at the same time, we can address the semantic gap.
Document type :
Complete list of metadata

Cited literature [211 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, March 11, 2014 - 5:02:16 PM
Last modification on : Wednesday, November 17, 2021 - 12:30:57 PM
Long-term archiving on: : Wednesday, June 11, 2014 - 1:11:26 PM


Version validated by the jury (STAR)


  • HAL Id : pastel-00958135, version 1


Nicolas Ballas. Modélisation de contextes pour l'annotation sémantique de vidéos. Autre [cs.OH]. Ecole Nationale Supérieure des Mines de Paris, 2013. Français. ⟨NNT : 2013ENMP0051⟩. ⟨pastel-00958135⟩



Record views


Files downloads