Cutting the visual world into bigger slices for improved video concept detection

Abstract : Visual material comprising images and videos is growing ever so rapidly over the internet and in our personal collections. This necessitates automatic understanding of the visual content which calls for the conception of intelligent methods to correctly index, search and retrieve images and videos. This thesis aims at improving the automatic detection of concepts in the internet videos by exploring all the available information and putting the most beneficial out of it to good use. Our contributions address various levels of the concept detection framework and can be divided into three main parts. The first part improves the Bag of Words (BOW) video representation model by proposing a novel BOW construction mechanism using concept labels and by including a refinement to the BOW signature based on the distribution of its elements. We then devise methods to incorporate knowledge from similar and dissimilar entities to build improved recognition models in the second part. Here we look at the potential information that the concepts share and build models for meta-concepts from which concept specific results are derived. This improves recognition for concepts lacking labeled examples. Lastly we contrive certain semi-supervised learning methods to get the best of the substantial amount of unlabeled data. We propose techniques to improve the semi-supervised cotraining algorithm with optimal view selection.
Document type :
Theses
Complete list of metadatas

Cited literature [220 references]  Display  Hide  Download

https://pastel.archives-ouvertes.fr/tel-01420419
Contributor : Abes Star <>
Submitted on : Tuesday, December 20, 2016 - 3:35:07 PM
Last modification on : Thursday, October 17, 2019 - 12:36:09 PM
Long-term archiving on : Monday, March 20, 2017 - 4:40:38 PM

File

TheseNiazV2.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01420419, version 1

Citation

Usman Niaz. Cutting the visual world into bigger slices for improved video concept detection. Image Processing [eess.IV]. Télécom ParisTech, 2014. English. ⟨NNT : 2014ENST0040⟩. ⟨tel-01420419⟩

Share

Metrics

Record views

382

Files downloads

195