. Stat and . Vers, N° stock.: 243 -N° bande: 0029110 -N°EM: PE41235 [SRC] -MGCPB0029110 . 02/03 - Format: 2P HB -Définition: 625 -Format img, pp.1-0100, 1976.

. Stat and . Vers, N° stock.: 230 -N° bande: 0029012 -N°EM: PE41235 [SRC] -MGCPB0029110 . 03/03 - Format: 2P HB -Définition: 625 -Format img, pp.1-0100, 1976.

. Stat and . Vers, N° stock.: 613 -N° bande: 0031531 -N°EM, pp.1-3, 1976.

. Stat and . Vers, N° bande: 0278911 -N°EM: PS41235 [COM] -MGCPB0278910 . 03/03 -Filiation: MGCPB0031254 Format: 1P C -Définition: 625 -Son: MONO -Coul, p.3

. Stat and . Vers, Détruit Localisation: Détruit (06, pp.278912-41235, 2004.

-. In, 00:03:00:09 TC OUT: 01:34:35:19 -Filiation: MGCPB0031254 Format: 1/2 BSP -Définition: 625 -Son: MONO -Coul.: COULEUR - Filière: MG -Type Mat.: MASTERUN -Durée: 00:00:00 - Stat.Vers.: Versé (13 Num.: Numérisé et en ligne - Rang: 2 -TC IN: 00:00:02:11 TC OUT: 01:33:46:03 -Filiation: MGCPB0031254 Format: 1/2 BSP -Définition: 625 -Son: MONO -Coul, Localisation: CPESSU4 (13/03 N° stock.: MSV136492 -N° bande: 0001696 -N°EM: PE41235 [MSV] -MGCPB0001696 . 02/03 -StatLocalisation: CPESSU4 (13/03 N° stock.: MSV136493 -N° MGCPB0031254 Format: 1/2 BSP -Définition: 625 -Son: MONO -Coul.: COULEUR - Filière: MG -Type Mat N° stock.: MSV136494 -N° bandeMGCPB0031254 . 03/03 - Format: 2P HB -Définition, pp.8-403, 2008.

. Stat and . Vers, N° stock.: 654 -N° bande: 0031993 -N°EM, pp.1-4, 1992.

. Stat and . Vers, N° stock.: 603 -N° bande: 0031478 -N°EM, pp.1-4, 1992.

. Stat and . Vers, N° stock.: 560 -N° bande: 0031254 -N°EM: PE41235 [SRC] -MGCPB0029110 . 03/03 - Format: 2P HB -Définition: 625 -Format img, pp.1-0100, 1992.

. Stat and . Vers, N° stock.: 613 -N° bande: 0031531 -N°EM: PE41235 [SRC] -MGCPB0029110 . 02/03 - Format: 2P HB -Définition: 625 -Format img, pp.1-0100, 1976.

V. Félicien, R. Gaël, E. Slim, and C. Jean, Detecting artist performances in a TV show, 2008.

H. Zaïd, V. Félicien, L. Alexandre, and C. Olivier, A regularized kernel-based approach to unsupervised audio segmentation, International Conference on Acoustics, Speech and Signal Processing, 2009.

V. Félicien, E. Slim, C. Jean, and R. Gaël, Descripteurs visuels robustes pour l'identification de locuteurs dans des émissions télévisées de talk-shows, actes de : Compression et Représentation des Signaux Audiovisuels, 2010.

B. Simon, V. Félicien, E. Nick, E. Slim, R. Gaël et al., A multimodal approach to initialisation for top-down speaker diarization of television shows, Dans les actes de : European Signal Processing Conference, 2010.

V. Félicien, E. Slim, C. Jean, and R. Gaël, Robust visual features for the multimodal identification of unregistered speakers, International Conference on Image Processing, 2010.

V. Félicien, E. Slim, C. Jean, and R. Gaël, High-Level TV talk show structuring centered on speakers' interventions. Chapter for TV content analysis : techniques and applications, Yiannis Kompatsiaris

A. Philippe, J. Philippe, and L. Véronique, Medium-knowledge-based macrosegmentation of video into sequences, Intelligent Multimedia Information Retrieval, vol.25, pp.74-84, 1997.

A. Jitendra and W. Chuck, A robust speaker clustering algorithm Dans les actes de : Workshop on Automatic Speech Recognition Understanding, p.81, 2003.

A. Xavier, Robust speaker diarization for meetings, Thèse de doctorat, p.86, 2006.

A. Xavier, B. Simon, E. Nicholas, F. Corinne, F. Gerald et al., Speaker diarization : a review of recent research, IEEE Transactions On Acoustics Speech and Language Processing, p.76, 2011.

A. Xavier, W. Chuck, and J. M. Pardo, Robust speaker diarization for meetings : ICSI RT06s meetings evaluation system, Lecture Notes in Computer Science, vol.4299, issue.3, pp.346-358, 2006.

A. Hisashi, S. Shigeyoshi, and H. Osamu, A shot classification method of selecting effective key-frames for video browsing, les actes de : ACM International Conference on Multimedia, p.59, 1996.

J. Anibal, A. Julien, P. Régine, and A. , Evaluation of classification techniques for audio indexing, les actes de : European Signal Processing Conference, p.46, 2005.

A. Jürgen, A. Del, B. Walter, N. Pietro, and P. , Soccer highlights detection and recognition using HMMs, International Conference on Multimedia and Expo, p.62, 2002.

A. Olivier and P. Yannick, Advene : an open-source framework for integrating and visualising audiovisual metadata, les actes de : ACM International Conference on Multimedia, p.97, 2007.

B. Tom and M. Carlo, Properties of line spectrum pair polynomials -a review, Signal Processing, vol.86, issue.11, pp.3286-3298, 2006.

B. Siwar, G. Guillaume, D. Claire-hélène, and G. Patrick, Structure learning in bayesian network based video indexing, les actes de : International Conference on Multimedia and Expo, p.62, 2008.

B. Mark, M. Joemon, and . Jose, An audio-based sports video segmentation and event detection algorithm, Conference on Computer Vision and Pattern Recognition Workshop, p.62, 2004.

B. Claude, Z. Xuan, M. Sylvain, and G. Jean-luc, Multistage speaker diarization of broadcast news, IEEE Transactions on Acoustics, Speech and Signal Processing, vol.14, issue.5, pp.1505-1512, 2006.

B. Mathieu, B. Michaël, B. Frédéric, and G. Guillaume, Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs

B. Meriem, Indexation audio-visuelle des personnes dans un contexte de télévision, Thèse de doctorat, pp.34-146, 2011.

B. Meriem, C. Delphine, and C. Gérard, Lip activity detection for talking faces classification in TV-content, les actes de : International Conference on Machine Vision, pp.2010-83

B. Meriem, C. Delphine, and C. Gérard, Talking faces indexing in TV-content

. Dans-les-actes-de, Content-Based Multimedia Indexing, p.83, 2010.

B. Rachid, G. Eric, and H. Benoit, EURECOM at TRECVid 2007 : extraction of high level features. Dans les actes de : International Workshop on Video Retrieval Evaluation, p.62, 2007.

B. Benjamin, F. Isabelle, and P. Julien, Exploiting speaker segmentations for automatic role detection. An application to broadcast news documents, les actes de : International Workshop on Content-Based Multimedia Indexing, 0141.

B. Benjamin, F. Isabelle, P. Julien, and A. Régine, Speaker role recognition to help spontaneous conversational speech detection, les actes de : ACM Workshop on Searching for Spontaneous Conversational Speech, 0141.

B. Isabelle, Fusion d'informations en traitement du signal et des images, Hermes Lavoisier, p.63, 2003.

B. Jean-françois, La reconnaissance du locuteur : un problème résolu ? Dans les actes de : Journées d'études sur la Parole, p.86, 2008.

B. Pierre, Sur la télévision. Raisons d'agir, p.30, 1996.

B. Simon, E. Nicholas, and F. Corinne, The LIA-EURECOM RT'09 speaker diarization system : enhancements in speaker modelling and cluster purification, International Conference on Acoustics, Speech, and Signal Processing, pp.88-89, 2010.

B. Simon, V. Félicien, E. Nick, E. Slim, R. Gael et al., A multimodal approach to initialisation for top-down speaker diarization of television shows, European Signal Processing Conference, pp.34-46, 2010.

B. Gary and K. Adrian, Learning OpenCV : computer vision with the OpenCV library. O'Reilly Media, p.169, 2008.

B. Hervé and C. Gérard, Measuring audio and visual speech synchrony : methods and applications, International Conference on Visual Information Engineering, p.82, 2006.

B. Hervé and C. Gérard, Audio-visual speech synchrony measure for talking-face identity verification, International Conference on Acoustics, Speech, and Signal Processing, p.125, 2007.

B. Paul, S. Alan, M. Noel, N. E. Connor, M. Sean et al., Evaluation and combining digital video shot boundary detection algorithms, les actes de : Irish Machine Vision and Information Processing Conference, p.59, 2000.

B. Roberto, M. Ornella, and M. Carla-maria, A survey on the automatic indexing of video data, Journal of Visual Communication and Image Representation, vol.10, issue.13, pp.78-112, 1999.

C. Marine and H. Pierre, Sémantique et multimodalité en analyse de l'information, p.63, 2011.

C. Jean, A. Simone, B. Sebastien, F. Mike, G. Mael et al., The AMI meeting corpus : a pre-announcement, Machine Learning for Multimodal Interaction : Second International Workshop, p.81, 2005.

C. Jean, Document description for audiovisual archiving, corpora, technologies and uses. Dans les actes de : Content-Based Multimedia Indexing, 2007.

C. Chih-chung and L. Chih-jen, LIBSVM : a library for support vector machines, ACM Transactions on Intelligent Systems and Technology, vol.227, issue.3, pp.1-2727, 2011.

C. Patrick, Les conditions d'une typologie des genres télévisuels d'information. Réseaux, pp.79-101, 1997.

S. Scott and P. S. Chen, GOPALAKRISHNAN : Speaker, environment and channel change detection and clustering via the bayesian information criterion. Dans les actes de : DARPA Broadcast News Transcription and Understanding Workshop, p.79, 1998.

C. Corinna and V. Vladimir, Support-vector networks, Machine Learning, pp.273-297, 1995.

C. Timothée, J. Chris, M. Eleni, and T. Ben, Movie/script : alignment and parsing of video and text transcription, European Conference on Computer Vision, p.16, 2008.

C. Nello and S. John, An introduction to support vector machines and other kernel-based learning methods, p.172, 2000.

D. Perrine and C. J. Wellekens, DISTBIC : a speaker-based segmentation for audio data indexing, Speech Communication, vol.32, issue.12, pp.111-126, 2000.

D. Manolis, Multimodal tennis video structure analysis with segment models, Thèse de doctorat, p.16, 2006.

D. Manolis, G. Guillaume, and G. Patrick, Audiovisual integration with segment models for tennis video parsing, Computer Vision and Image Understanding, vol.111, issue.16, pp.142-154, 2008.

D. Paul, E. Yannick, M. Sylvain, and M. Teva, The LIUM speech transcription system : a CMU Sphinx iii-based system for french broadcast news. Dans les actes de : International Speech Communication Association, p.17, 2005.

D. Alfred, Unsupervised detection of multimodal clusters in edited recordings, actes de : Multimedia Signal Processing, p.83, 2010.

D. Nevenka, Z. Hong-jiang, S. Behzad, S. Ibrahim, H. Thomas et al., Applications of video-content analysis and retrieval, IEEE Transactions on Multimedia, vol.9, issue.3, pp.42-55, 2002.

D. Frédéric, D. Manuel, and D. Christian, An online kernel change detection algorithm, IEEE Transactions on Signal Processing, vol.53, issue.8, pp.2961-2974, 2005.

R. O. Duda, P. E. Hart, and G. David, STORK : Pattern classification, p.103, 2000.

E. Slim, Classification automatique des signaux audio-fréquences : reconnaissance des instruments de musique, Thèse de doctorat, p.118, 2005.

E. Mark, S. Josef, and Z. Andrew, Hello ! My name is... Buffy ? automatic naming of characters in TV video, British Machine Vision Conference, p.115, 2006.

F. Pedro, . Felzenszwalb, and P. Daniel, HUTTENLOCHER : Pictorial structures for object recognition, International Journal of Computer Vision, vol.61, issue.1, pp.57-79, 2005.

F. Belkacem, D. Manuel, and H. Amrane, Speaker diarization using one-class support vector machines, Speech Communication, vol.50, issue.80, pp.355-365, 2008.

F. John and D. Trevor, Signal level fusion for multimodal perceptual user interface. Dans les actes de : Workshop on Perceptive user interfaces, p.83, 2001.

F. John, D. Trevor, F. William, and V. Paul, Learning joint statistical models for audio-visual fusion and segregation, les actes de : Neural Information Processing Systems, p.83, 2000.

F. Corinne, B. Simon, and E. Nicholas, The LIA-EURECOM RT'09 speaker diarization system Dans les actes de : RT'09 NIST Rich Transcription Workshop, pp.81-89, 2009.

F. Yoav and E. Robert, SCHAPIRE : A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, vol.55, issue.1, pp.119-139, 1997.

F. Yoav and E. Robert, SCHAPIRE : A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence, vol.14, issue.5, pp.771-780, 1999.

F. Gerald, H. Hayley, and Y. Chuohao, Multi-modal speaker diarization of realworld meetings using compressed-domain video features, International Conference on Acoustics, Speech and Signal Processing, p.99, 1981.

F. Gerald, V. Oriol, H. Yan, and M. Christian, Prosodic and other longterm features for speaker diarization, IEEE Transactions on Acoustics, Speech and Signal Processing, vol.17, issue.76, pp.985-993, 2009.

F. Gerald, Y. Chuohao, and H. Hayley, Visual speaker localization aided by acoustic models, les actes de : ACM International Conference on Multimedia, p.108, 2009.

G. Sylvain, G. Guillaume, and C. Laura, The ESTER 2 evaluation campaign for the rich transcription of French radio broadcasts, les actes de : Annual Conference of the International Speech Communication Association, p.76, 2009.

G. Rodolphe and C. Patrick, Un genre télévisuel : le talk show. Dunod, p.30, 1997.

G. Herbert, S. Man-hung, and R. Robin, Segregation of speakers for speech recognition and speaker identification, International Conference on Acoustics, Speech, and Signal Processing, p.80, 1991.

G. Erving, Forms of talk, p.30, 1981.

G. Yihong, S. Lim, and C. Chua, Automatic parsing of TV soccer programs, Proceedings of the International Conference on Multimedia Computing and Systems, p.62, 1995.
DOI : 10.1109/MMCS.1995.484921

G. Camille, G. Guillaume, and S. Pascale, Can automatic speech transcripts be used for large scale TV stream description and structuring ? Dans les actes de : Workshop on Content-Based Audio Video Analysis for Novel TV Services, p.17, 2009.

G. Isabelle and E. André, An introduction to feature and variable selection, Journal of Machine Learning Research, vol.3, pp.1157-1182, 2003.

G. André, Tracking pitches for broadcast television, IEEE Computer, vol.35, issue.16, pp.38-43, 2002.

H. Raffay, R. Krishan, K. Matthias, G. Kihwan, K. Irfan et al., Player localization using multiple static cameras for sports visualization, Conference on Computer Vision and Pattern Recognition, p.62, 2010.

H. Yina, L. Guizhong, C. Gérard, and R. Joseph, Person identity clustering in TV show videos. Dans les actes de : Visual Information Engineering, p.46, 2008.

H. Zaïd, V. Félicien, L. Alexandre, and C. Olivier, A regularized kernel-based approach to unsupervised audio segmentation, les actes de : International Conference on Acoustics, Speech and Signal Processing, p.80, 2009.

H. John and M. Javier, Audio-vision : Using audiovisual synchrony to locate sounds, Advances in Neural Information Processing, pp.813-819, 2000.

H. Cyril and W. J. Christmas, Cepstral features for classification of an impulse response with varying sample size dataset, European Signal Processing Conference, p.61, 2007.

H. Harold, Relations between two sets of variates, Biometrika, vol.28, pp.3-4321, 1936.

H. Hayley and F. Gerald, Towards audio-visual on-line diarization of participants in group meetings. Dans les actes de : Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications, p.83, 2008.

I. Ichiro, M. Hiroshi, K. Norio, and S. Shinichi, Topic threading for structuring a large-scale news video archive, Lecture Notes in Computer Science, vol.3115, issue.1, pp.2128-2129, 2004.

J. Gaël, Indexation de la vidéo par le costume, Thèse de doctorat, p.95, 2005.

J. Gaël and J. Philippe, Costume : a new feature for automatic video content indexing, International Conference on Adaptivity, Personalization and Fusion of Heterogeneous Information, p.95, 2004.

E. El and K. , Unsupervised video indexing based on audiovisual characterization of persons, Thèse de doctorat, p.17, 2010.
URL : https://hal.archives-ouvertes.fr/tel-00515424

E. El, K. Sylvain, M. Christine, and S. , Speaker diarization : combination of the LIUM and IRIT systems, Rapport technique, IRIT / LIUM, vol.80, p.81, 2008.

E. El, K. , C. Senac, and J. Philippe, Face-and-clothing based people clustering in video content, les actes de : ACM International Conference on Multimedia Information Retrieval, p.95, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01433881

K. Ewa, G. Guillaume, O. Lionel, and G. Patrick, Audiovisual integration for tennis broadcast structuring, Multimedia Tools and Applications, pp.289-311, 2006.

K. Kihwan, G. Matthias, S. Ariel, M. Iain, J. Hodgins et al., Motion fields to predict play evolution in dynamic sport scenes, Conference on Computer Vision and Pattern Recognition, p.62, 2010.

L. Julien, G. Gregory, G. Jean-luc, G. Guillaume, L. Lori et al., VoxaleadNews : robust automatic segmentation of video content into browsable and searchable subjects, p.16, 2010.

L. Marie and N. Erik, Quelques dispositifs de talk-shows français, Réseaux, vol.118, pp.201-207, 1998.

L. Dongge, D. Nevenka, L. Mingkun, and S. Ishwar, Multimedia content processing through cross-modal association, les actes de : ACM International Conference on Multimedia, p.125, 2003.

L. Rainer, Comparison of automatic shot boundary detection algorithms Dans les actes de : Storage and retrieval for image and video databases, p.59, 1998.

L. Guy, Débats, talk-shows : de la radio filmée ? Communication et langages, pp.92-100, 1990.

G. David and . Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.

L. Bruce and K. Takeo, An iterative image registration technique with an application to stereo vision, les actes de : International Joint Conference on Artificial Intelligence, p.168, 1981.

M. Subhransu, A. C. Berg, and M. Jitendra, Classification using intersection kernel support vector machines is efficient. Dans les actes de : Computer Vision and Pattern Recognition, p.124, 2008.

B. S. Manjunath, P. Salembier, and S. Thomas, Introduction to MPEG-7 multimedia content description interface, p.100, 2002.

M. Gaël and B. Sid-ahmed, Automatic TV broadcast structuring, Journal of Digital Multimedia Broadcasting, vol.2010, issue.1, pp.1-16, 2010.

M. Benoît, E. Slim, F. Thomas, P. Jacques, and R. Gaël, YAAFE, an easy to use and efficient audio feature extraction software, les actes de : International Society for Music Information Retrieval Conference, p.94

M. Sylvain, B. Jean-françois, and I. Stéphane, E-HMM approach for learning and adapting sound models for speaker indexing. Dans les actes de : Odyssey Speaker and Language Recognition Workshop, p.88, 2001.

M. Daniel, B. Mathieu, and G. Guillaume, Experiments on speaker tracking and segmentation in radio broadcast news, les actes de : International Conference on Spoken Language Processing, p.81, 2005.

M. Katharina, B. Peter, and J. Thorsten, Combining statistical learning with a knowledge-based approach -a case study in intensive care monitoring, International Conference on Machine Learning, p.172, 1999.

M. Wayne, All talk : the talk show in media culture, p.30, 1993.

M. Kevin, Dynamic bayesian networks : representation, inference and learning, Thèse de doctorat, p.62, 2002.

N. Xavier and G. Patrick, Detecting repeats for video structuring. Multimedia Tools and Applications, pp.233-252, 2008.

H. J. Nock, I. Giridharan, and N. Chalapathy, Speaker localisation using audiovisual synchrony : an empirical study, les actes de : International conference on image and video retrieval, p.82, 2003.

O. Nuria and H. Eric, A comparison of HMMs and dynamic bayesian networks for recognizing office activities, les actes de : International conference on user modeling, p.16, 2005.

E. K. Patterson, G. Sabri, T. Zekeriya, and J. N. Gowdy, CUAVE : a new audiovisual database for multimodal human-computer interface research, International Conference on Acoustics, Speech and Signal Processing, p.82, 2002.

P. Hermine, Language and control in American TV talk shows : an Analysis of linguistic strategies, p.43, 1996.

P. Christian, Fraunhofer Heinrich Hertz Institute at TRECVID 2004 : shot boundary detection system, p.59, 2004.

P. Milan, M. Vojkan, J. Willem, and S. Djordjevic-kajan, Multi-modal extraction of highlights from TV formula 1 programs, International Conference on Multimedia and Expo, p.62, 2002.

P. Julien and A. Régine, Jingle detection and identification in audio documents, International Conference on Acoustics, Speech and Signal Processing, p.61, 2004.

C. John and . Platt, Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, p.120, 1999.

P. Jean-philippe, Structuration automatique de flux télévisuels, Thèse de doctorat, p.58, 2007.

P. Jean-philippe, An automatic television stream structuring system for television archives holder. Multimedia systems, pp.255-275, 2008.

P. Jacques, Conversion de fréquence, p.145, 2009.

R. Lawrence and J. Biing-hwang, Fundamentals of speech recognition, p.94, 1993.

R. Lawrence and . Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE, pp.257-286, 1989.

R. Javier, J. Manuel, G. , J. Carlos, and S. , Voice activity detection. fundamentals and speech recognition system robustness. Robust Speech Recognition and Understanding, pp.1-22, 2007.

A. Douglas, P. A. Reynolds, and . Torres-carrasquillo, Approaches and applications of audio diarization, International Conference on Acoustics, Speech and Signal Processing, p.77, 2005.

R. Gaël, R. Mathieu, and E. Slim, Combined supervised and unsupervised approaches for automatic segmentation of radiophonic audio streams, International Conference on Acoustics, Speech and Signal Processing, p.79, 2007.

R. Jamal-eddine, R. Mohammed, A. Driss, G. Marc, and M. José, Fast incremental clustering of gaussian mixture speaker models for scaling up retrieval in on-line broadcast, International Conference on Acoustics, Speech and Signal Processing, p.81, 2006.

M. Emre, S. Yücel, Y. Engin, E. , A. Murat et al., Audiovisual synchronization and fusion using canonical correlation analysis, IEEE Transactions on Multimedia, vol.9, issue.7, pp.1396-1403, 2007.

S. John, Real-time discrimination of broadcast speech/music, International Conference on Acoustics, Speech, and Signal Processing, p.78, 1996.

S. Eric and S. Malcolm, Construction and evaluation of a robust multifeature speech/music discriminator, International Conference on Acoustics, Speech and Signal Processing, p.78, 1997.

S. Bernhard, J. C. Platt, S. John, A. J. Smola, C. Robert et al., Estimating the support of a high-dimensional distribution, Neural Computation, vol.13, issue.126, pp.1443-1471, 2001.

S. Bernhard and A. J. Smola, Learning with kernels : support vector machines, regularization, optimization and beyond, p.172, 2001.

W. Hans and . Schussler, A stability theorem for discrete systems, IEEE Transactions on Acoustics, Speech and Signal Processing, vol.24, issue.1, pp.87-89, 1976.

S. Jianbo and T. Carlo, Good features to track, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition CVPR-94, p.169, 1994.
DOI : 10.1109/CVPR.1994.323794

A. Matthew, . Siegler, J. Uday, R. Bhiksha, and R. M. Stern, Automatic segmentation, classification and clustering of broadcast news audio. Dans les actes de : DARPA Speech Recognition Workshop, p.80, 1997.

S. Josef, E. Mark, and Z. Andrew, Who are you ? : Learning person specific classifiers from video, International Conference on Computer Vision and Pattern Recognition, p.114, 2009.

S. Malcolm and C. Michele, FaceSync : a linear operator for measuring synchronization of video facial images and audio tracks, les actes de : Neural Information Processing Systems, p.82, 2000.

S. Han and W. Peter, Annotation by category -ELAN and ISO DCR, International Conference on Language Ressources and Evaluation, p.48, 2008.

S. Alan, O. Paul, and D. Aiden, Video shot boundary detection : Seven years of TRECVid activity, Computer Vision and Image Understanding, vol.114, issue.4, pp.1-25, 2009.

W. George, W. G. Snedecor, and . Cochran, Statistical methods, p.116, 1967.

G. M. Cees, . Snoek, and W. Marcel, Multimodal video indexing : a review of the state-of-theart . Multimedia Tools and Applications, pp.5-35, 2005.

C. A. Sugar, M. Gareth, and . James, Finding the number of clusters in a data set : an information theoretic approach, Journal of the American Statistical Association, vol.98, issue.103, pp.397-408, 2003.

J. Michael, D. H. Swain, and . Ballard, Color indexing, International Journal of Computer Vision, vol.7, issue.56, pp.11-32, 1991.

T. Bernard, Television talk : a history of TV talk show, p.30, 2002.

E. Sue, D. A. Tranter, and . Reynolds, An overview of automatic speaker diarization systems, IEEE Transactions on Audio, Speech, and Language Processing, vol.14, issue.5, pp.1557-1565, 2006.

T. Wei-ho, C. Shih-sian, and W. Hsin-min, Speaker clustering of speech utterances using a voice characteristic reference space, International Conference on Spoken Language Processing, p.77, 2004.

V. Himanshu, I. Tanmoy, S. Sudeep, S. Ravi, and K. Ranga, Audio segmentation and speaker localization in meeting videos, International Conference on Pattern Recognition, p.83, 2006.

V. Félicien, E. Slim, C. Jean, and R. Gaël, Robust visual features for the multimodal identification of unregistered speakers, International Conference on Image Processing, pp.111-140, 2010.

V. Félicien, R. Gaël, E. Slim, and C. Jean, Detecting artist performances in a TV show, p.141, 2008.

V. Jeroen and W. Marcel, Systematic evaluation of logical story unit segmentation, IEEE Transactions on Multimedia, vol.4, issue.4, pp.492-499, 2002.

V. Alessandro, D. Alfred, F. Sarah, and S. Hugues, Canal9 : a database of political debates for analysis of social interactions, International Conference on Affective Computing and Intelligent Interaction, p.81, 2009.

V. Paul and J. Michael, Robust real-time object detection Dans les actes de : International Workshop on Statistical and Computational Theories of Vision-Modeling, Learning, Computing and Sampling, p.114, 2001.

V. Timo, T. Seyed, and T. James, RMIT university video retrieval experiments at TRECVid, p.59, 2004.

H. D. Wactlar, K. Takeo, M. A. Smith, and M. Scott, Intelligent access to digital video: Informedia project, Computer, vol.29, issue.5, pp.46-52, 1996.
DOI : 10.1109/2.493456

W. Gethin, P. W. Daniel, and . Ellis, Speech/music discrimination based on posterior probability features, European Conference on Speech Communication and Technology, p.78, 1999.

W. Raymond, Television : technology and cultural form, p.28, 1974.

W. Chuck, F. James, P. Barbara, and A. Xavier, Towards robust speaker segmentation : the ICSI-SRI fall 2004 diarization system, Rich Transcription Workshop, p.79, 2004.

W. Chuck and H. Marijn, The ICSI RT'07s speaker diarization system. Multimodal Technologies for Perception of Humans, pp.509-519, 2008.

W. Ting-fan, L. Chih-jen, and C. Ruby, WENG : Probability estimates for multi-class classification by pairwise coupling, Journal of Machine Learning Research, vol.5, pp.975-1005, 2004.

X. Ziyou, R. Regunathan, and D. Ajay, Generation of sports highlights using motion activities in combination with a common audio feature extraction framework, International Conference on Image Processing, p.62, 2003.

X. Min, D. Ling-yu, X. Changsheng, K. Mohan, and T. Qi, Event detection in basketball video using multiple modalities, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, p.62, 2003.
DOI : 10.1109/ICICS.2003.1292722

Y. Minerva and Y. Boon-lock, Time-constrained clustering for segmentation of video into story units, International Conference on Pattern Recognition, p.59, 1996.

Y. Xinguo, L. Liyuan, H. Wai, and L. , Interactive broadcast services for live soccer video based on instant semantics acquisition. Visual Communication and Image Representation, pp.117-130, 2009.

Z. Dongqing and C. Shih-fu, Event detection in baseball video using superimposed caption recognition, les actes de : ACM Conference on Multimedia, p.62, 2002.

S. Kevin, Z. Rama, and C. , From sample similarity to ensemble similarity : probabilistic distance measures in reproducing kernel hilbert space, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, issue.6, pp.917-929, 2006.

Z. Wensheng, A. Vellaikal, and C. J. Kuo, Rule based video classification system for basketball video indexing, les actes de : ACM workshops on Multimedia, p.62, 2000.