P. Perrot, J. Razik, M. Morel, H. Hemiri, and G. Chollet, Techniques de conversion de voix appliquéesappliquéesà l'impost ure. Traitement ET Analyse dE l'I nformation, 2009.

L. Zouari, H. Hemiri, J. Razik, A. Amehraye, and G. Chollet, Reconnaissance de la parole en t emps réel pour le dialogue oral, Traitement ET Analyse dE l'Information: Méthodes et Applications (TAI MA), 2009.

G. Chollet, A. Amehraye, J. Razik, L. Zouari, H. Hemiri et al., Spoken Dialogue in Virt ual Worlds Chap. Development of Multimodal I nterfaces: Active Listening and Synchrony, LNCS, vol.5967, pp.423-443, 2010.

H. K. Hemiri, G. Chollet, and D. Pet-rovska-delacrét-az, Aut omat ic Det ect ion of K nown Advert isement s in Radio Broadcast wit h Dat a-driven ALISP Transcript ions. I nternational Workshop on Content-Based Multimedia I ndexing (CBMI ), pp.223-228, 2011.

H. K. Hemiri, D. Pet-rovska-delacrét-az, and G. Chollet, Une empreint e audiò a base d'ALISP appliquéè a l'ident ificat ion audio dans un flux radiophonique, Colloque en COmpression et REprésentation des Signaux Audiovisuels (CORESA), 2012.

K. Hemiri, G. Chollet, D. Pet-rovska-delacrét-az, R. Blouet, K. Hachicha et al., Soft ware Radio FM Broadcast Receiver for Audio Indexing Applicat ions. I EEE I nternational Conference on I ndustrial Technology (I CI T, pp.585-590, 2012.

. Viat-eur, Prot otype of a radio-on-demand broadcast receiver wit h real t ime musical genre classificat ion, Conference on Design and Architectures for Signal and I mage Processing (DASI P), pp.1-2, 2012.

H. K. Hemiri, D. Pet-rovska-delacrét-az, and G. Chollet, A Generic Audio Ident ificat ion Syst em for Radio Broadcast Monit oring Based on Dat a-driven Segment at ion. I EEE I nternational Symposium on Multimedia (I SM), pp.427-432, 2012.

H. K. Hemiri, G. Chollet, and D. Pet-rovska-delacrét-az, Aut omat ic Det ect ion of K nown Advert isement s in Radio Broadcast wit h Dat a-driven ALISP Transcript ions, Multimedia Tools And Applications (MTAP), pp.35-49, 2013.

J. Ajmera, H. Bourlard, I. Lapidot, and I. Mccowan, Unknown-Mult iple Speaker Clust ering Using Hmm, I nternational Conference on Spoken Language Processing, pp.573-576, 2002.

S. F. Alt-schul, W. Gish, and W. Miller, Basic local alignment search tool, Journal of Molecular Biology, vol.215, issue.3, pp.403-410, 1990.
DOI : 10.1016/S0022-2836(05)80360-2

X. Anguera, M. Aguilo, C. Woot-ers, C. Nadeu, and J. Hernando, Hybrid Speech/ nonspeech det ect or applied t o Speaker Diarizat ion of Meet ings, Speaker and Language Recognition Workshop, Odyssey, pp.1-6, 2006.

X. Anguera, S. Bozonnet, N. Evans, C. Fredouille, G. Friedland et al., Speaker Diarization: A Review of Recent Research, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.2, pp.356-370, 2012.
DOI : 10.1109/TASL.2011.2125954

URL : https://hal.archives-ouvertes.fr/hal-00733397

X. Anguera, C. Woot-ers, B. Peskin, and M. Aguiló, Robust speaker segment at ion for meet ings: T he ICSI-SRI spring 2005 diarizat ion syst em. In I nternational conference on Machine Learning for Multimodal I nteraction, pp.26-38, 2005.

A. Apost-olico, M. Comin, and L. Parida, Bridging Lossy and Lossless Compression by Mot if Pat t ern Discovery General T heory of I nformation Transfer

. Combinatorics, Bridging lossy and lossless compression by mot if pat t ern discovery, pp.793-813, 2006.

J. A. Bachorowski, M. J. Smoski, and M. J. Owren, The acoustic features of human laughter, The Journal of the Acoustical Society of America, vol.110, issue.3, pp.1581-1597, 2001.
DOI : 10.1121/1.1391244

J. Baker, J. Li-deng, J. Glass, S. Hudanpur, S. Chin-hui-lee et al., Developments and directions in speech recognition and understanding, Part 1 [DSP Education], IEEE Signal Processing Magazine, vol.26, issue.3, pp.75-80, 2009.
DOI : 10.1109/MSP.2009.932166

S. Baluja and M. Covell, Waveprint: Efficient wavelet-based audio fingerprinting, Pattern Recognition, vol.41, issue.11, pp.3467-3480, 2008.
DOI : 10.1016/j.patcog.2008.05.006

C. Barras, X. Zhu, S. Meignier, and J. Gauvain, Mult ist age speaker diarizat ion of broadcast news. I EEE Transactions on Audio, Speech, and Language Processing, pp.1505-1512, 2006.

M. A. Bart-sch and G. H. Wakefield, To cat ch a chorus: using chroma-based represent at ions for audio t humbnailing, Workshop on the Applications of Signal Processing to Audio and Acoustics, pp.15-18, 2001.

M. Bet-ser, Décomposition harmonique des signaux audio appliquéappliquéà l'indexation audio, 2008.

P. Beyerlein, X. Aubert, R. Haeb-umbach, M. Harris, D. Lakow et al., Large vocabulary continuous speech recognition of Broadcast News ??? The Philips/RWTH approach, Speech Communication, vol.37, issue.1-2, pp.109-131, 2002.
DOI : 10.1016/S0167-6393(01)00062-0

C. A. Bickley and S. Hunnicut-t, Acoust ic analysis of laught er, I nternational Conference on Spoken Language Processing, pp.927-930, 1992.

F. Bimbot and B. , An evaluat ion of t emporal decomposit ion, EUROSPEECH, 1991.

M. Bisani and H. Ney, Joint-sequence models for grapheme-to-phoneme conversion, Speech Communication, vol.50, issue.5, pp.434-451, 2008.
DOI : 10.1016/j.specom.2008.01.002

URL : https://hal.archives-ouvertes.fr/hal-00499203

S. Bozonnet, N. W. Evans, and C. Fredouille, T he lia-eurecom RT '09 speaker diarizat ion syst em: Enhancement s in speaker modelling and clust er purificat ion, I EEE International Conference on Acoustics Speech and Signal Processing, pp.4958-4961, 2010.

C. J. Burges, D. Plast-ina, J. C. Plat-t, E. Renshaw, and H. S. Malvar, Using audio fingerprint ing for duplicat e det ect ion and t humbnail generat ion, I EEE I nternational Conference on Acoustics, Speech, and Signal Processing, pp.9-12, 2005.

C. J. Burges, J. C. Plat, and S. Jana, Distortion discriminant analysis for audio fingerprinting, IEEE Transactions on Speech and Audio Processing, vol.11, issue.3, pp.165-174, 2003.
DOI : 10.1109/TSA.2003.811538

N. Campbell, R. , H. , and R. Ohara, No laughing mat t er, I nterspeech, pp.465-468, 2005.

P. Cano, E. Bat-t-la, H. Mayer, and H. Neuschmied, Robust Sound Modeling for Song Det ect ion in Broadcast Audio. Audio engineering society, 2002.

P. Cano, E. Bat-t-le, T. , and J. , A Review of Audio Fingerprinting, Journal of VLSI signal processing systems for signal, image and video technology, vol.33, issue.3, pp.271-284, 2005.
DOI : 10.1007/s11265-005-4151-3

J. Cernock´ycernock´y, Speech Processing Using Automatically Derived Segmental Units: Applications to Very Low Rate Coding and Speaker Verification, 1998.

D. Charlet, C. Barras, and J. S. Liénard, Impact of Overlapping Speech Det ect ion on Speaker Diarizat ion for Broadcast News and Debat es, I EEE I nternational Conference on Acoustics, Speech and Signal Processing, 2013.

G. Chollet, K. Mctait, and D. , Pet rovska-Delacrét az. Dat a driven approaches t o speech and language processing. Lecture notes in computer science, pp.164-198, 2005.

I. J. Cox, J. Ilian, F. T. Leight, and T. Shamoon, A Secure, Robust Wat ermark for Mult imedia, I nternational Workshop on I nformation Hiding, pp.185-206, 1996.

M. Cremer, B. Froba, J. Hellmut-h, O. Herre, and E. Allamanche, AudioID: Towards Cont ent -Based Ident ificat ion of Audio Mat erial, Audio Engineering Society Convention 110, 2001.

R. Dannenberg, List ening t o " Naima " : An Aut omat ed St ruct ural Analysis from Recorded Audio, I nternational Computer Music Conference, pp.28-34, 2002.

R. B. Dannenberg and N. Hu, Pat t ern discovery t echniques for music audio, I nternational Conference on Music I nformation Retrieval, pp.63-70, 2002.

A. D. Cheveigné, Computational Auditory Scene Analysis, pp.65-70, 2006.
DOI : 10.1002/9780470611180.ch5

P. Delacourt, D. , and C. Wellekens, Det ect ion of speaker changes in an audio document, EUROSPEECH, 1999.

P. Delacourt, C. Wellekens, . Dist, and . Bic, DISTBIC: A speaker-based segmentation for audio data indexing, Speech Communication, vol.32, issue.1-2, pp.111-126, 2000.
DOI : 10.1016/S0167-6393(00)00027-3

S. Deligne and F. Bimbot, Inference of variable-length linguistic and acoustic units by multigrams, Speech Communication, vol.23, issue.3, pp.223-241, 1997.
DOI : 10.1016/S0167-6393(97)00048-4

S. Deligne, S. Dharanipragada, R. A. Gopinat-h, B. Maison, P. A. Olsen et al., Print z. A robust high accuracy speech recognit ion syst em for mobile applicat ions, I EEE Transactions on Speech and Audio Processing, issue.8, pp.10551-561, 2002.

A. Hannani, Text-I ndependant Speaker Verification Based On High-Level I nformation Extracted With Data-Driven Methods, 2007.

A. Hannani, D. Pet-rovska-delacrét-az, B. Fauve, A. Mayoue, J. Mason et al., Text independent Speaker Verificat ion, Guide to Biometric Reference Systems and Performance Evaluation, 2009.
DOI : 10.1007/978-1-84800-292-0_7

E. El-k-houry, Unsupervised Video I ndexing based on Audiovisual Characterization of Persons, 2010.

E. El-k-houry, C. Senac, and J. Pinquier, Improved speaker diarizat ion syst em for meet ings, I EEE I nternational Conference on Acoustics, Speech and Signal Processing, pp.4097-4100, 2009.

S. Fenet, M. Moussallam, Y. Grenier, G. Richard, and L. Daudet, A framework for fingerprint -based det ect ion of repeat ing object s in mult imedia st reams, EUSI PCO, pp.1464-1468, 2012.

S. Fenet, G. Richard, and Y. Grenier, A Scalable Audio Fingerprint Met hod wit h Robust ness t o Pit ch-Shift ing, I nternational Symposium on Music Information Retrieval, pp.121-126, 2011.

F. Fischler and R. Bolles, Readings in comput er vision: issues, problems, principles, and paradigms. chapt er Random sample consensus: a paradigm for model fit t ing wit h applicat ions t o image analysis and aut omat ed cart ography, pp.726-740, 1987.

C. Fredouille, S. Bozonnet, N. W. Evans, and L. Rt, 09 Speaker Diarizat ion Syst em, NI ST Rich Transcription Workshop, 2009.

G. Friedland, O. Vinyals, Y. Huang, and C. Muller, Prosodic and other Long-Term Features for Speaker Diarization, IEEE Transactions on Audio, Speech, and Language Processing, vol.17, issue.5, pp.985-993, 2009.
DOI : 10.1109/TASL.2009.2015089

S. Galliano, E. Geoff-rois, D. Most-efa, K. Choukri, J. F. Bonast et al., T he EST ER Phase I I Evaluat ion Campaign for t he Rich Transcript ion o French Broadcast News, EUROSPEECH, 2005.

S. Galliano, G. Gravier, and L. Chaubard, T he EST ER 2 Evaluat ion Campaign for t he Rich Transcript ion of French Radio Broadcast s. In I nterspeech, pp.2583-2586, 2009.

J. Gauvain and C. H. Lee, Maximum a post eriori est imat ion for mult ivariat e Gaussian mixt ure observat ions of Markov chains, Transactions on Speech and Audio Processing, pp.291-298, 1994.

H. Gish, M. Siu, A. Chan, and W. Belfield, Unsupervised Training of an HMMbased Speech Rec, 2009.

H. Gish, S. Man-hung, and R. Rohlicek, Segregat ion of speakers for speech recognit ion and speaker ident ificat ion, I EE I nternational Conference on Acoustics, Speech, and Signal Processing, pp.873-876, 1991.

J. Glass, Towards unsupervised speech processing, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), pp.1-4, 2012.
DOI : 10.1109/ISSPA.2012.6310546

C. Gollan, S. Hahn, R. Schlut, and H. Ney, An Improved Met hod for Unsupervised Training of LVCSR Syst ems, I nterspeech, pp.2101-2104, 2007.

G. Gravier, G. Adda, N. Paulson, M. Carré, A. Giraudel et al., T he ETAPE corpus for t he evaluat ion of speech-based T V cont ent processing in t he French

V. Gupt-a, G. Boulianne, P. Enny, P. Ouellet, and P. Dumouchel, Speaker diarizat ion of French broadcast news, I EEE I nternational Conference on Acoustics, Speech and Signal Processing, pp.4365-4368, 2008.

J. Hait-sma and T. , A Highly Robust Audio Fingerprint ing Syst em. In I nternational Society for Music I nformation Retrieval, pp.107-115, 2002.

J. Hait-sma and T. , Speed-change resist ant audio fingerprint ing using aut ocorrelat ion, I EEE I nternational Conference on Acoustics, Speech, and Signal Processing, pp.728-759, 2003.

K. J. Han and S. S. Narayanan, Agglomerat ive hierarchical speaker clust ering using increment al Gaussian mixt ure clust er modeling, I nterspeech, pp.20-23, 2008.

D. F. Harwat-h, H. J. , and J. R. Glass, Zero resource spoken audio corpus analysis, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.
DOI : 10.1109/ICASSP.2013.6639335

C. Herley, ARGOS: aut omat ically ext ract ing repeat ing object s from mult imedia st reams. I EEE Transactions on Multimedia, pp.115-129, 2006.

H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, The Journal of the Acoustical Society of America, vol.87, issue.4, pp.1738-1752, 1990.
DOI : 10.1121/1.399423

M. Huijbregt-s, D. A. Van-leeuwen, and C. , Woot ers. Speaker Diarizat ion Error Analysis Using Oracle Component s. I EEE Transactions on Audio, Speech, and Language Processing, pp.393-403, 2012.

A. Jansen and K. Church, Towards Unsupervised Training of Speaker Independent Acoust ic Models, In I NT ERSPEECH, pp.1693-1692, 2011.

Q. Jin, K. Laskowski, T. Schult, and A. Waibel, Speaker segment at ion and clust ering in meet ings, International Conference on Spoken Language Processing, 2004.

P. Joly, J. Benois-pineau, E. , and G. Quenot, T he Argos Campaign: Evaluat ion of Video Analysis Tools, I nternational Workshop on Content-Based Multimedia I ndexing, pp.130-137, 2007.

S. and H. Ney, Mult ilingual Acoust ic Modeling Using Graphemes, pp.1145-1148, 2003.

Y. K-e, D. Hoiem, and R. Sukt-hankar, Comput er vision for music ident ificat ion, I EEE Conference on Computer Vision and Pattern Recognition, pp.597-604, 2005.

L. Ennedy and D. P. Ellis, Laught er Det ect ion in Meet ings, NI ST Meeting Recognition Workshop, pp.118-121, 2004.

M. K. Iller, S. St, and T. , Schult z. Grapheme Based Speech Recognit ion, EU- ROSPEECH, pp.3141-3144, 2003.

M. K. Nox and N. Mirghafori, Aut omat ic laught er det ect ion using neural networks. In I nterspeech, pp.2973-2976, 2007.

S. Ullback and R. A. Leibler, On Information and Sufficiency, The Annals of Mathematical Statistics, vol.22, issue.1, pp.79-86, 1951.
DOI : 10.1214/aoms/1177729694

K. Laskowski and T. Schult-z, Det ect ion of Laught er-in-Int eract ion in Mult ichannel Close-Talk Microphone Recordings of Meet ings, workshop on Machine Learning for Multimodal I nteraction, pp.149-160, 2008.

V. B. Le, O. Mella, and D. Fohr, Speaker diarizat ion using normalized cross likelihood rat io, I nterspeech, pp.1869-1872, 2007.

C. J. Legget-t-er and P. C. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, Computer Speech & Language, vol.9, issue.2, pp.171-185, 1995.
DOI : 10.1006/csla.1995.0010

V. Levensht-ein, Binary codes capable of correct ing delet ions, insert ions, and reversals. Cybernetics and control theory, pp.707-710, 1966.

Y. Linde, A. Buzo, and R. M. Gray, An Algorithm for Vector Quantizer Design, IEEE Transactions on Communications, vol.28, issue.1, pp.84-95, 1980.
DOI : 10.1109/TCOM.1980.1094577

Y. Liu, K. Cho, H. S. Yun, J. W. Shin, and N. S. Im, DCT based mult iple hashing t echnique for robust audio fingerprint ing, I CASSP, pp.61-64, 2009.

J. Loof, C. Christ-ian-gollan, and H. Ney, Cross-language Boot st rapping for Unsupervised Acoust ic Model Training: Rapid Development of a Polish Speech Recognit ion Syst em, I nterspeech, pp.88-91, 2009.

J. F. Lopez and D. P. Ellis, Using Acoust ic Condit ion Clust ering To Improve Acoust ic Change Det ect ion On Broadcast News, I nternational Conference on Speech and Language Processing, pp.568-571, 2000.

J. M. Makhoul, S. Roucos, and H. Gish, Vect or Quant izat ion in Speech Coding, Proceedings of the I EEE, pp.1551-1588, 1985.

I. Maliout-ov, A. Park, R. Barzilay, and J. Glass, Making Sense of Sound: Unsupervised Topic Segment at ion over Acoust ic Input, Annual Meeting of the Association of Computational Linguistics, pp.504-511, 2007.

M. Marolt, SONIC: Transcript ion of Polyphonic Piano Music wit h Neural Networks, Workshop on Current Research Directions in Computer Music, pp.217-224, 2001.

A. F. Mart-in and J. S. Garofolo, NIST Speech Processing Evaluat ions: LVCSR, Speaker Recognit ion, Language Recognit ion, I EEE Workshop on Signal Processing Applications for Public Security and Forensics, pp.1-7, 2007.

C. Mccool, S. Marcel, A. Hadid, M. Piet-ikainen, P. Mat-ejka et al., Bonast re, P. Tresadern, and T . Coot es. Bi-Modal Person Recognit ion on a Mobile Phone: Using Mobile Phone Dat a, IEEE I nternational Conference on Multimedia and Expo Workshops, pp.635-640, 2012.

G. Mck-eown, M. Valst-ar, R. Cowie, M. Pant, and M. Schroder, The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent, IEEE Transactions on Affective Computing, vol.3, issue.1, pp.5-17, 2012.
DOI : 10.1109/T-AFFC.2011.20

S. Meignier, J. F. Bonast, and S. Igounet, E-HMM approach for learning and adapt ing sound models, Speaker and Language Recognition Workshop, pp.175-180, 2001.

P. Mermelst-ein, Dist ance Measures for Speech Recognit ion?Psychological and Inst rument al. In Joint Workshop on Pattern Recognition and Artificial I ntelligence, 1976.

D. Moraru, M. Ben, and G. Gravier, Experiment s on speaker t racking and segment at ion in radio broadcast news, 2005.

A. Muscariello, G. Gravier, and F. Bimbot, An effi cient met hod for t he unsupervised discovery of signalling mot ifs in large audio st reams, Content-Based Multimedia I ndexing (CBMI), 2011 9th International Workshop on, pp.145-150, 2011.

A. Muscariello, G. Gravier, and F. Bimbot, Zero-resource audio-only spoken t erm det ect ion based on a combinat ion of t emplat e mat ching t echniques, 2011.

A. Muscariello, G. Gravier, and F. Bimbot, Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.7, pp.2031-2044, 2012.
DOI : 10.1109/TASL.2012.2194283

URL : https://hal.archives-ouvertes.fr/hal-00740978

A. V. Nefian, L. Luhong, X. Pi, L. Xiaoxiang, C. Mao et al., A coupled HMM for audio-visual speech recognit ion, I EEE I nternational Conference on Acoustics, Speech, and Signal Processing, pp.2013-2016, 2002.

S. Novot-ney, R. Schwart, and J. Ma, Unsupervised acoust ic and language model t raining wit h small amount s of labelled dat a, I EEE I nternational Conference on Acoustics, Speech and Signal Processing, pp.4297-4300, 2009.

J. P. Ogle and D. P. Ellis, Fingerprint ing t o Ident ify Repeat ed Sound Event s in Long-Durat ion Personal Audio Recordings, IEEE I nternational Conference on Acoustics, Speech, and Signal Processing, pp.233-236, 2007.

M. Padellini, F. Capman, and G. Baudoin, Very Low Bit Rat e speech coding in Noisy Environment s, Speech and Computer (SPECOM), 2005.

A. S. Park and J. R. Glass, Unsupervised Pat t ern Discovery in Speech. I EEE Transactions on Audio, Speech, and Language Processing, pp.186-197, 2008.

W. R. Pearson and D. J. Lipman, Improved t ools for Biological Sequence Comparison, Proceedings of the National Academy of Sciences, pp.2444-2448, 1988.

P. Perrot, G. Aversano, R. Blouet, M. Charbit, and G. Chollet, Voice Forgery Using ALISP: Indexation in a Client Memory, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., pp.17-20, 2005.
DOI : 10.1109/ICASSP.2005.1415039

S. Pet-ridis, B. Mart, and M. , The MAHNOB Laughter database, Image and Vision Computing, vol.31, issue.2, pp.186-202, 2013.
DOI : 10.1016/j.imavis.2012.08.014

S. Pet-ridis, M. Pant, and . Ic, Fusion of audio and visual cues for laught er det ect ion, I nternational Conference on I mage and Video Retrieval, pp.329-337, 2008.

D. Pet-rovska-delacrét-az, C. Cernock´ycernock´y, J. Hennebert, and G. Chollet, Segmental Approaches for Automatic Speaker Verification, Digital Signal Processing, vol.10, issue.1-3, pp.198-212, 2000.
DOI : 10.1006/dspr.2000.0370

D. Pet-rovska-delacrét-az, G. Chollet, and . Bernadet-t-e-dorizzi, Guide to Biometric Reference Systems and Performance Evaluation, 2009.

J. Pinquier and R. André-obrecht, Jingle det ect ion and ident ificat ion in audio document s. I EEE I nternational Conference on Acoustics, Speech, and Signal Processing, pp.329-322, 2004.

J. Pinquier, J. L. Rouas, and R. André-obrecht, A fusion st udy in speech/ music classificat ion, I EEE I nternational Conference on Acoustics, Speech, and Signal Processing, pp.17-20, 2003.

J. C. Ram´?rez, J. Segura, C. Ben´?t-ez, A. De-la-torre, and A. Rubio, A new adapt ive longt erm spect ral est imat ion voice act ivity det ect or, EUROSPEECH, pp.3041-3044, 2003.

M. Ramona, S. Fenet, R. Blouet, H. Bredin, T. Fillon et al., A PUBLIC AUDIO IDENTIFICATION EVALUATION FRAMEWORK FOR BROADCAST MONITORING, Applied Artificial Intelligence, vol.26, issue.1-2, pp.119-136, 2011.
DOI : 10.1109/LSP.2005.863678

M. Ramona and G. Peet-ers, Audio ident ificat ion based on spect ral modeling of barkbands energy and synchronizat ion t hrough onset det ect ion, I EEE I nternational Conference on Acoustics, Speech and Signal Processing, pp.477-480, 2011.

B. Reuderink, M. Poel, K. Truong, R. Poppe, M. Pant et al., Decision-Level Fusion for Audio-Visual Laught er Det ect ion. In I nternational Workshop on Machine Learning for Multimodal I nteraction, pp.137-148, 2008.

D. Reynolds, T. Quat, and R. Dunn, Speaker verificat ion using Adapt ed Gaussian mixt ure models, Digital Signal Processing, pp.19-41, 2000.

J. Rissanen, Stochastic Complexity in Statistical I nquiry T heory, 1989.

M. Rouvier and S. Meignier, A Global Opt imizat ion Framework For Speaker Diarizat ion, Speaker and Language Recognition Workshop, 2012.

D. Roy and A. Pent, Learning words from sights and sounds: a computational model, Cognitive Science, vol.55, issue.3, pp.113-146, 2000.
DOI : 10.1207/s15516709cog2601_4

G. K. Sandve and F. Drablos, A survey of mot if discovery met hods in an int egrat ed framework, Biology Direct, vol.1, issue.1, 2006.

B. Schuller and F. Weninger, Discriminat ion of speech and non-linguist ic vocalizat ions by Non-Negat ive Mat rix Fact orizat ion, International Conference on Acoustics Speech and Signal Processing, pp.5054-5057, 2010.

M. A. Siegler, U. Jain, B. Raj, and R. M. St-ern, Aut omat ic Segment at ion, Classificat ion and Clust ering of Broadcast News Audio, DARPA Speech Recognition Workshop, pp.97-99, 1997.

M. Sinclair and S. K. Ing, Where Are T he Challenges in Speaker Diarizat ion? In I EEE I nternational Conference on Acoustics, Speech and Signal Processing, 2013.

A. Sinit-syn, Duplicat e Song Det ect ion using Audio Fingerprint ing for Consumer Elect ronics Devices, I EEE I nternational Symposium on Consumer Electronics, pp.1-6, 2006.

M. Siu, H. Gish, S. Lowe, and A. Chan, Unsupervised Audio Pat t ern Discovery using HMM-based Self-Organized Unit s, Interspeech, 2011.
DOI : 10.1016/j.csl.2013.05.002

I. Stylianou, Modèles Harmoniques plus Bruit combinés avec des Méthodes Statistiques , pour la Modification de la Parole et du Locuteur, 1996.

S. E. Trant-er and D. A. Reynolds, An overview of aut omat ic speaker diarizat ion syst ems. I EEE Transactions on Audio, Speech, and Language Processing, pp.1557-1565, 2006.

A. Trit-schler and R. Gopinat-h, Improved speaker segment at ion and segment s clust ering using t he bayesian informat ion crit erion, EUROSPEECH, pp.679-682, 1999.

J. Trouvain, Segment ing phonet ic unit s in laught er, I nternational Conference of the Phonetic Sciences, pp.2793-2796, 2003.

K. P. Truong and D. A. Van-leeuwen, Aut omat ic det ect ion of laught er, I nterspeech, pp.485-488, 2005.

K. P. Truong and D. A. Van-leeuwen, Automatic discrimination between laughter and speech, Speech Communication, vol.49, issue.2, pp.144-158, 2007.
DOI : 10.1016/j.specom.2007.01.001

URL : https://hal.archives-ouvertes.fr/hal-00499165

K. P. Truong and D. A. Van-leeuwen, Evaluat ing aut omat ic laught er segment at ion in meet ings using acoust ic and acoust ic-phonet ic feat ures, I nterdisciplinary Workshop on the Phonetics of Laughter, pp.49-53, 2007.

J. Urbain, E. Bevacqua, T. Dut-oit, A. Moinet, R. Niewiadomski et al., T he AVLaught erCycle Dat abase, I nternational Conference on Language Resources and Evaluation (LREC'10), pp.2996-3001, 2010.

J. Urbain and T. Dut-oit, A phonet ic analysis of nat ural laught er, for use in aut omat ic laught er processing syst ems, I nternational Conference on Aff ective Computing and I ntelligent Interaction, pp.397-406, 2011.

P. Viola and M. Jones, Rapid object det ect ion using a boost ed cascade of simple feat ures, I EEE Conference on Computer Vision and Pattern Recognition, pp.511-518, 2001.

A. Wang, The Shazam music recognition service, Communications of the ACM, vol.49, issue.8, pp.44-48, 2006.
DOI : 10.1145/1145287.1145312

D. Wang, R. Vipperla, and N. Evans, Online pat t ern learning for non-negat ive convolut ive sparse coding, 2011.

F. Weninger, B. Schuller, M. Wollmer, and G. , Localizat ion of non-linguist ic event s in spont aneous speech by Non-Negat ive Mat rix Fact orizat ion and Long Short

Z. Yaodong and J. R. Glass, Towards mult i-speaker unsupervised speech pat t ern discovery, IEEE I nternational Conference on Acoustics Speech and Signal Processing, pp.4366-4369, 2010.

S. Young, N. H. Russell, and J. H. , Token Passing: a Concept ual Model for Connect ed Speech Recognit ion Syst ems, 1989.

B. Zhu, W. Li, Z. Wang, and X. Xue, A novel audio fingerprint ing met hod robust t o t ime scale modificat ion and pit ch shift ing, Proceedings of the international conference on Multimedia, pp.987-990, 2010.

X. Zhu, C. Barras, L. Lamel, and J. L. Gauvain, Speaker Diarization: From Broadcast News to Lectures, Machine Learning for Multimodal I nteraction, pp.396-406, 2006.
DOI : 10.1007/11965152_35

X. Zhu, C. Barras, S. Meignier, and J. L. Gauvain, Combining Speaker Ident ificat ion and BIC for Speaker Diarizat ion, 2005.