Transcription des signaux percussifs. Application à l'analyse de scènes musicales audiovisuelles

Abstract : This thesis establishes links between the fields of audio indexing and video sequence analysis, through the problem of drum signal analysis. In a first part, the problem of drum track transcription from polyphonic music signals is addressed. After having presented several pre-processings for drum track enhancement, and a large set of relevant features, a statistical machine learning approach to drum track transcription is proposed. Novel supervised and unsupervised sequence modeling methods are also introduced to enhance the detection of drum strokes by taking into account the regularity of drum patterns. We conclude this part by evaluating various drum track separation algorithms and by underlining the duality between transcription and source separation. In a second part, we extend this transcription system by taking into account the video information brought by cameras filming the drummer. Various approaches are introduced to segment the scene and map each region of interest to a drum instrument. Motion intensity features are then used to detect drum strokes. Our results show that a multimodal approach is capable of resolving some ambiguities inherent to audio-only transcription. In the final part, we extend our work to a broader range of music videos, which may not show the musicians. We particularly address the problem of understanding how a piece of music can be illustrated by images. After having presented or introduced new segmentation techniques for audio and video streams, we define synchrony measures on their structures. These measures can be used for both retrieval applications (music retrieval by video) or content classification.
Document type :
Theses
Domain :
Complete list of metadatas

Cited literature [297 references]  Display  Hide  Download

https://pastel.archives-ouvertes.fr/pastel-00002805
Contributor : Ecole Télécom Paristech <>
Submitted on : Friday, September 28, 2007 - 8:00:00 AM
Last modification on : Wednesday, February 20, 2019 - 2:40:40 PM
Long-term archiving on : Friday, October 19, 2012 - 12:05:17 PM

Identifiers

  • HAL Id : pastel-00002805, version 1

Citation

Olivier Gillet. Transcription des signaux percussifs. Application à l'analyse de scènes musicales audiovisuelles. domain_other. Télécom ParisTech, 2007. English. ⟨pastel-00002805⟩

Share

Metrics

Record views

310

Files downloads

1487