Skip to Main content Skip to Navigation
Theses

On-line speaker diarization for smart objects

Abstract : On-line speaker diarization aims to detect “who is speaking now" in a given audio stream. The majority of proposed on-line speaker diarization systems has focused on less challenging domains, such as broadcast news and plenary speeches, characterised by long speaker turns and low spontaneity. The first contribution of this thesis is the development of a completely unsupervised adaptive on-line diarization system for challenging and highly spontaneous meeting data. Due to the obtained high diarization error rates, a semi-supervised approach to on-line diarization, whereby speaker models are seeded with a modest amount of manually labelled data and adapted by an efficient incremental maximum a-posteriori adaptation (MAP) procedure, is proposed. Obtained error rates may be low enough to support practical applications. The second part of the thesis addresses instead the problem of phone normalisation when dealing with short-duration speaker modelling. First, Phone Adaptive Training (PAT), a recently proposed technique, is assessed and optimised at the speaker modelling level and in the context of automatic speaker verification (ASV) and then is further developed towards a completely unsupervised system using automatically generated acoustic class transcriptions, whose number is controlled by regression tree analysis. PAT delivers significant improvements in the performance of a state-of-the-art iVector ASV system even when accurate phonetic transcriptions are not available.
Complete list of metadata

https://pastel.archives-ouvertes.fr/tel-03701649
Contributor : ABES STAR :  Contact
Submitted on : Wednesday, June 22, 2022 - 12:40:15 PM
Last modification on : Friday, June 24, 2022 - 3:33:34 AM
Long-term archiving on: : Friday, September 23, 2022 - 6:32:03 PM

File

ThesisSoldi.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03701649, version 1

Citation

Giovanni Soldi. On-line speaker diarization for smart objects. Signal and Image processing. Télécom ParisTech, 2016. English. ⟨NNT : 2016ENST0061⟩. ⟨tel-03701649⟩

Share

Metrics

Record views

47

Files downloads

24