.. .. Sonie,

D. .. De, 134 6.2.3.1 Le Paramètre Spectrale Parabolique (Parabolic Spectral Parameter : PSP)

, Le rapport harmonique sur bruit (Harmonic to Noise Ratio : HNR)

, Le quotient de quasi-ouverture (Quasi-Open Quotient : QOQ)

, Le quotient d'amplitude normalisé (Normalized Amplitude Quotient : NAQ)

, La différence en amplitude entre les premiers harmoniques (H1H2 et H1A3)

). .. R-d, , p.137

D. .. De-qualité-de-voix-provenant-d'ondelettes,

.. .. Représentations,

.. .. Tableau,

. .. Cachés, HCRF : Champs Aléatoires Conditionnels

. .. , 151 7.2.1 Modèle d'apprentissage intra-locuteur

, Adapté à des corpus de tailles restreintes

. .. Modèle-d'apprentissage-interlocuteur, 155 7.2.2.1 Présentation des différents systèmes d'interaction

, Un des aspects que nous avons remarqués par rapport aux représentations du signal audio est que ces représentations étaient trop brutes pour un modèle relativement simple comme un HCRF. Une solution appropriée pourrait être d'avoir une représen-tation du signal vocal de plus haut niveau. Ce type de représentation peut être obtenue avec l

, Cependant, ce type de stratégie relève surtout de la compression et n'a pas permis d'obtenir de meilleurs résultats. A contrario, l'utilisation d'un modèle plus complexe appris sur un corpus extérieur de vidéo comme le modèle profond de Abu-El-Haija et collab. (2016) appris sur le corpus Youtube-8M composé de plus de 500 000 heures de vidéos pourrait être une solution. Ce modèle permet, à la manière de word2vec, Nous avons essayé une approche non supervisée avec des auto-encodeurs appris sur le corpus SEMAINE entier selon la méthode de Freitag et collab, 2017.

, Depuis, de nombreuses avancées ont eu lieu en traitement du langage naturel et pour la création de représentation textuelle distribuée. 11.2. PERSPECTIVES DE RECHERCHE gradient stochastique et de les entraîner de manière conjointe avec un réseau de neurones. Ceci pourrait être un moyen d'effectuer une fusion des modalités avant l'entrée de la couche HCRF. Nous avons d'ailleurs observé une amélioration des performances des systèmes multimodaux sur le corpus SEMAINE-Léger, Utilisation d'une modélisation plus complexe du langage Dans les travaux présentés dans ce manuscrit nous avons utilisé des mots-vecteurs provenant de Mikolov et collab, 2013.

, Une des multiples versions codées de la plateforme d'annotation mise en place pour l'annotation du corpus SEMAINE permet de récupé-rer ce type d'information précise à la granularité du mot. Ceci permettrait d'utiliser un système de type ABSA, Une annotation plus précise du corpus permettrait de faire un système repérant les cibles associées aux opinions, 2012.

, Une des limites de ce travail est la prise en compte d'un contexte dialogique restreint à une paire de tours de paroles. Cependant, dans une interaction entre deux individus le contexte général est important et étendre le contexte dialogique à la conversation entière est une amélioration qui serait notable pour notre système, 2013.

. Figure-a, 2: Page de présentation de la plate-forme d'annotation, premier contact des annotateurs avec la tache

. Figure-a, 3: Page de présentation de la plate-forme d'annotation

. Figure-a, 4: Page de présentation de la plate-forme d'annotation, premier contact des annotateurs avec la tache (4/4) FIGURE A.5: Instructions de la plate

. Figure-b, Pour éviter de reproduire un schéma similaire, nous avons choisi d'utiliser le maté-riel brut fournit avec le corpus SEMAINE afin d'obtenir des méta-données de qualité. Nous avons décidé de faire annoter la base de données depuis les transcriptions faites à la main, que nous avons vérifié automatiquement pour que le type de problème pré-cédent n'ait pas lieu. De plus, en utilisant les transcriptions manuelles, nous pouvons utiliser toutes les autres informations fournies dans ces transcriptions comme la ponctuation et les annotations para-linguistiques

. Figure-d, 1: Cycle complet du flux glottal (haut) et ses dérivées (bas) comme décrit dans Scherer et collab, 2013.

, F1 global afin d'avoir une métrique globale autre que le taux de reconnaissance qui est faussée car le corpus n'est pas totalement balancé

J. Abric, Psychologie de la communication : théories et méthodes, vol.21, p.26, 2008.

S. Abu-el-haija, N. Kothari, J. Lee, P. Natsev, G. Toderici et al., «YouTube-8M : A Large-Scale Video Classification Benchmark», vol.47, p.216, 2016.

M. Airas and P. Alku, «Comparison of multiple voice source parameters in different phonation types», 2007.

K. Albrecht, «Social Intelligence», vol.22, 2006.

P. Alku, «Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering», vol.11, pp.109-118, 1992.

P. Alku, T. Bäckström, and E. Vilkman, «Normalized amplitude quotient for parametrization of the glottal flow, The Journal of the Acoustical Society of America, vol.112, issue.2, pp.701-710, 2002.

P. Alku, H. Strik, and E. Vilkman, «Parabolic spectral parameter -A new method for quantification of the glottal flow», Speech Communication, vol.22, issue.1, p.135, 1997.

S. Amiriparian, N. Cummins, S. Ottl, and M. Gerczuk, «Sentiment Analysis Using Image-based Deep Spectrum Features», dans ACIIW, ISBN 9781538606803, vol.47, pp.26-29, 2017.

S. Angelidis and M. Lapata, «Multiple Instance Learning Networks for FineGrained Sentiment Analysis», TACL, vol.62, p.217, 2017.

P. Arias, P. Belin, and J. J. Aucouturier, «Auditory smiles trigger unconscious facial imitation», 2018.

P. K. Atrey, M. A. Hossain, A. Saddik, and M. S. Kankanhalli, 2010, «Multimodal fusion for multimedia analysis : A survey», Multimedia Systems, vol.16, issue.6, pp.345-379

S. Baccianella, A. Esuli, and F. Sebastiani, 0 : An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining», Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), vol.3, p.124, 2010.

C. F. Baker, C. J. Fillmore, and J. B. Lowe, «The Berkeley FrameNet Project», dans Proceedings of the 36th annual meeting on Association for Computational Linguistics, vol.1, p.86, 1998.

C. Barras, E. Geoffrois, Z. Wu, and M. Liberman, «Transcriber : Development and use of a tool for assisting speech corpora production», Speech Communication, vol.33, issue.1-2, pp.5-22, 2001.

V. Barriere, «Hybrid Models for Opinion Analysis in Speech Interactions», dans ICMI, ISBN 9781450355438, vol.34, pp.647-651, 2017.

V. Barriere, C. Clavel, and S. Essid, 2017, «Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields»

V. Barriere, C. Clavel, and S. Essid, «Attitude Classification in Adjacency Pairs of a Human-Agent Interaction with Hidden Conditional Random Fields», 2018.

V. Barriere and C. , Clavel et S. Essid. 2018b, «Classification d'attitude dans des paires adjacentes à l'aide de champs aléatoires conditionnels cachés»

A. Ben-youssef, M. Chollet, H. Jones, N. Sabouret, C. Pelachaud et al., «To-wards a Socially Adaptive Virtual Agent Vizart3D : Visual articulatory feedback for Speech Therapy View project EmotionML View project Towards a Socially Adaptive Virtual Agent», dans IVA, pp.1-14, 2015.

F. Benamara, M. Taboada, and Y. Mathieu, «Evaluative Language Beyond Bags of Words : Linguistic Insights and Computational Applications», 2016.

Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, «A Neural Probabilistic Language Model», vol.3, pp.1137-1155, 2003.

J. Biel, D. Et, and . Gatica-perez, «The youtube lens : Crowdsourced personality impressions and audiovisual analysis of vlogs», IEEE Transactions on Multimedia, vol.15, issue.1, pp.41-55, 2013.

S. Bilakhia, S. Petridis, A. Nijholt, and M. Pantic, «The MAHNOB Mimicry Database : A database of naturalistic human interactions», Pattern Recognition Letters, vol.66, p.112, 2015.

V. Bisot, R. Serizel, S. Essid, and G. Richard, «Acoustic scene classification with matrix factorization for unsupervised feature learning», dans 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6445-6449, 2016.

K. Bousmalis, L. Morency, and M. Pantic, «Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition», dans 2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, vol.60, p.146, 2011.

K. Bousmalis, S. Zafeiriou, L. Morency, and M. Pantic, «Infinite hidden conditional random fields for human behavior analysis.», IEEE transactions on neural networks and learning systems, vol.24, pp.170-177, 2013.

B. Bozkurt, T. Dutoit, B. Doval, and C. , «Improved differential phase spectrum processing for formant tracking», dans Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH) -ICSLP, pp.1-4, 2004.

M. Bradley and P. J. Lang, «Measuring Emotion : The Self-Assessment Semantic Differential Manikin and the», Journal of Behavior Therapy and Experimental Psychiatry, 1994.

M. M. Bradley and P. J. Lang, «Affective Norms for English Words(ANEW) : Affective ratings of words and instruction manual», p.42, 2010.

M. M. Bradley and P. P. Lang, «Affective Norms for English Words ( ANEW ) : Instruction Manual and Affective Ratings», p.0, 1999.

E. Breck, Y. Choi, and C. Cardie, «Identifying expressions of opinion in context», IJCAI International Joint Conference on Artificial Intelligence, vol.41, p.66, 2007.

M. S. Brilman and . Scherer, «A Multimodal Predictive Model of Successful Debaters or How I Learned to Sway Votes», Proceedings of the 23rd ACM international conference on Multimedia, vol.52, pp.149-158, 2015.

T. Brychcin, M. Konkol, and J. Steinberger, «UWB : Machine Learning Approach to Aspect-Based Sentiment Analysis», Proceedings of the 8th International Workshop on Semantic Evaluation, pp.817-822, 2014.

C. Busso, M. Bulut, C. C. Lee, A. Kazemzadeh, E. Mower et al., «IEMOCAP : Interactive emotional dyadic motion capture database», Language Resources and Evaluation, vol.42, p.82, 2008.

C. Busso, S. Parthasarathy, A. Burmania, M. Abdelwahab, N. Sadoughi et al., «MSP-IMPROV : An Acted Corpus of Dyadic Interactions to Study Emotion Perception», IEEE Transactions on Affective Computing, vol.8, issue.1, p.84, 2017.

A. Cafaro, N. Glas, and C. Pelachaud, «The Effects of Interrupting Behavior on Interpersonal Attitude and Engagement in Dyadic Interactions», Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016), pp.911-920, 2016.

A. Cafaro, J. Wagner, T. Baur, S. Dermouche, M. Torres et al., «The NoXi database : multimodal recordings of mediated noviceexpert interactions», vol.79, p.82, 2017.

Z. Callejas, B. Ravenet, M. Ochs, and C. Pelachaud, «A computational model of social attitudes for a virtual recruiter», dans 13th International Conference on Autonomous Agents and Multiagent Systems, vol.1, pp.93-100, 2014.

E. Cambria and A. Hussain, «Sentic Computing», vol.7, pp.183-185, 2015.

E. Cambria, D. Olsher, and R. Dheeraj, A Common and CommonSense Knowledge Base for Cognition-Driven Sentiment Analysis», Proceeding of Twenty-Eighth AAAI Conference on Artificial Intelligence, vol.3, p.68, 2014.

E. Cambria, S. Poria, D. Hazarika, and K. Kwok, Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings», vol.5, p.69, 2018.

N. Campbell and P. Mokhtari, «Voice Quality : the 4th Prosodic Dimension», 15th ICPhS, vol.136, pp.2417-2420, 2003.

E. Campione and J. Véronis, «A large-scale multilingual study of pause duration», dans Speech Prosody, Proceedings of the1st International Conference on Speech Prosody, vol.48, p.145, 2002.

D. Cer, Y. Yang, S. Kong, N. Hua, N. Limtiaco et al., , vol.44, p.217, 2018.

M. S. Chelliah and . Sarkar, «Product Recommendations Enhanced with Reviews», Proceedings of the Eleventh ACM Conference on Recommender Systems -RecSys '17, vol.77, pp.398-399, 2017.

J. Chen, X. Qiu, P. Liu, and X. Huang, «Meta Multi-Task Learning for Sequence Modeling», Aaai, p.217, 2018.

M. Chen, S. Wang, P. P. Liang, T. Baltru?aitis, and A. Zadeh-et-l.-p.-morency, «Mul-timodal Sentiment Analysis with Word-Level Fusion and Reinforcement Learning», dans ICMI 2017, vol.72, p.193, 2017.

Y. Choi, E. Breck, and C. Cardie, «Joint extraction of entities and relations for opinion recognition», Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, vol.42, p.123, 2006.

Y. Choi, C. Cardie, E. Riloff, and S. Patwardhan, «Identifying sources of opinions with conditional random fields and extraction patterns», Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing HLT 05, vol.66, p.123, 2003.

F. Chollet, , vol.163, p.174, 2015.

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, «Gated feedback recurrent neural networks», Proceedings of the 32nd International Conference on Machine Learning, {ICML} 2015, vol.37, pp.2067-2075, 2015.

C. Clavel, Analyse et reconnaissance des manifestations acoustiques des émotions de type peur en situations anormales, thèse de doctorat, vol.46, p.118, 2007.

C. Clavel and Z. Callejas, «Sentiment Analysis : From Opinion Mining to HumanAgent Interaction», vol.28, p.54, 2016.

J. Cohen, «A Coefficient of Agreement for Nominal Scales», Educational and Psychological Measurement, 1960.

R. Collobert and J. Weston, «A unified architecture for natural language processing : Deep neural networks with multitask learning», Proceedings of the 25th international conference on Machine learning, vol.126, p.217, 2008.

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu et al., «Natural Language Processing (Almost) from Scratch», vol.12, p.127, 2011.

A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, «Supervised Learning of Universal Sentence Representations from Natural Language Inference Data», vol.44, p.45, 2017.

L. J. Cronbach, «Coefficient alpha and the internal structure of tests», Psychometrika, 1951.

P. Dai, U. Iurgel, and G. Rigoll, «A Novel Feature Combination Approach for Spoken Document Classification with Support Vector Machines», Multimedia Information Retrieval Workshop, vol.120, pp.1-5, 2003.

C. D'alessandro, Analyse, synthèse et codage de la parole, 2002.

T. Dang, V. Sethu, and E. Ambikairajah, «Dynamic Multi-Rater Gaussian Mixture Regression Incorporating Temporal Dependencies of Emotion Uncertainty Using Kalman Filters», dans ICASSP, ISBN 9781538646588, pp.4929-4933, 2018.

G. Degottex, J. Kane, T. Drugman, T. Raitio, and S. Scherer, «COVAREP -A collaborative voice analysis repository for speech technologies», dans ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing -Proceedings, ISBN 9781479928927, vol.52, p.145, 2014.

G. Degottex, A. Roebel, and X. Rodet, «Function of phase-distortion for glottal model estimation», dans ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing -Proceedings, ISBN 9781457705397, pp.4608-4611, 2011.

D. Devault, R. Artstein, G. Benn, T. Dey, E. Fast et al., «SimSensei Kiosk : A Virtual Human Interviewer for Healthcare Decision Support», 2014 International Conference on Autonomous Agents and Multi-Agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 2014.

E. Douglas-cowie, R. Cowie, I. Sneddon, C. Cox, O. Lowry et al., «The HU-MAINE Database : Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data», Affective Computing and Intelligent Interaction, pp.488-500, 2007.

T. Drugman and . Et-a.-alwan, 2011, «Joint robust voicing detection and pitch estimation based on residual harmonics», dans Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISBN 19909772, vol.133, pp.1973-1976

T. Drugman and T. Dutoit, «Glottal closure and opening instant detection from speech signals», dans Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol.49, pp.2891-2894, 2009.

G. Dubuisson-duplessis, Modèle de comportement communicatif conventionnel pour un agent en interaction avec des humains : Approche par jeux de dialogue, thèse de doctorat, vol.25, p.26, 2014.

F. Eyben, «openSMILE -The Munich Versatile and Fast Open-Source Audio Feature Extractor Categories and Subject Descriptors», vol.51, pp.1459-1462, 2010.

F. Eyben, K. Scherer, B. Schuller, J. Sundberg, E. Andre et al., «The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing», vol.51, p.145, 2015.

F. Eyben, F. Weninger, F. Groß, B. Schuller, F. Gross et al., «Recent Developments in openSMILE, the Munich Open-Source Multimedia Feature Extractor», dans Proceedings of the 21st ACM International Conference on Multimedia (MM 2013), vol.140, p.145, 2013.

F. Eyben, M. Wöllmer, and B. Schuller, «OpenEAR -Introducing the Munich opensource emotion and affect recognition toolkit», dans Proceedings -2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, ISBN 9781424447992, 2009.

G. Fant, Acoustic Theory of Speech Production : With Calculations Based on X-Ray Studies of Russian Articulations, vol.9027916004, 1970.

G. Fant, «The source filter concept in voice production», cahier de recherche, p.49, 1981.

G. Fant, «The LF-model revisited. Transformations and frequency domain analysis», Speech Trans. Lab. Q. Rep., Royal Inst. of Tech, vol.2, issue.3, p.233, 1995.

G. Fant, J. Liljencrants, and Q. Lin, «A four-parameter model of glottal flow», Stlqpsr, vol.4, p.233, 1985.

C. Fellbaum, ISBN 026206197X, 423 p, vol.71, p.124, 1998.

J. L. Fleiss, «Measuring nominal scale agreement among many raters», Psychological Bulletin, 1971.

V. Freeman, J. Chan, G. Levow, and R. Wright, Corpus collection and initial task validation», Depts.Washington, vol.1, p.82, 2014.

M. Freitag, S. S.-amiriparian, N. Pugachevskiy, B. Cummins, and . Schuller, «auDeep : Unsupervised Learning of Representations from Audio with Deep Recurrent Neural Networks», vol.46, p.216, 2017.

J. H. Friedman, «Greedy function approximation : A gradient boosting machine», Annals of Statistics, 2001.

A. Garcia, «Annotation guide Movie review corpus», p.78, 2017.

D. Ghosh, A. R. Fabbri, and S. Muresan, 2017, «The Role of Conversation Context for Sarcasm Detection in Online Interactions», vol.55, p.174

D. Ghosh, A. R. Fabbri, and S. Muresan, «Sarcasm Analysis using Conversation Context», 1990.

S. Ghosh, E. Laksana, L. Morency, and S. Scherer, «Learning Representations of Affect from Speech», pp.1-8, 2016.

S. Ghosh, E. Laksana, L. Morency, and S. Scherer, «Representation Learning for Speech Emotion Recognition», pp.3603-3607, 2016.

C. Gobl and A. Ní-chasaide, «The role of voice quality in communicating emotion, mood and attitude», Speech Communication, vol.40, p.233, 2003.

D. Goleman, «The Socially Intelligent Leader.», Educational Leadership, 2006.

T. Hacki, «Klassifizierung von glottisdysfunktionen mit hilfe der elektroglottographie», Folia Phoniatrica et Logopaedica, vol.136, 1989.

J. Hall and W. H. Watson, «The Effects of a Normative Intervention on Group Decision-Making Performance», 1970.

K. A. Hallgren, «Computing Inter-Rater Reliability for Observational Data : An Overview and Tutorial», Tutorials in Quantitative Methods for Psychology, vol.8, issue.1, pp.23-34, 2012.

Z. S. Harris, «Distributional Structure», Word, vol.10, pp.146-162, 1954.

D. Hazarika, E. Cambria, and R. Zimmermann, «Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos», dans Naacl, vol.32, p.64, 2018.

D. Hazarika, S. Poria, R. Mihalcea, E. Cambria, and R. Zimmermann, «ICON : Interactive Conversational Memory Network for Multimodal Emotion Detection», dans EMNLP, vol.32, pp.2594-2604, 2018.

X. He, Q. H. Tran, W. Havard, L. Besacier, I. Zukerman et al., 2018, «Exploring Textual and Speech information in Dialogue Act Classification with Speaker Domain Adaptation», p.54

F. Heider, «The Psychology of Interpersonal Relations, 1958.

L. Hemamou, G. Felhi, V. Vandenbussche, J. Martin, and C. Clavel, «HireNet : a Hierarchical Attention Model for the Automatic Analysis of Asynchronous Video Job Interviews», 2019.

M. Henderson, «Machine Learning for Dialog State Tracking : A Review», Proceedings of The First International Workshop on Machine Learning in Spoken Language Processing, p.54, 2015.

H. Hermansky, «Perceptual linear predictive (PLP) analysis of speech», The Journal of the Acoustical Society of America, 1990.

G. E. Hinton, J. L. Mcclelland, and D. E. Rumelhart, Parallel Distributed Processing, p.127, 1986.

S. Hochreiter, J. Schmidhuber, ;. , «. Short-term, and . Memory», Neural Computation, vol.9, issue.8, p.174, 1997.

J. S. Howard and . Ruder, «Fine-tuned Language Models for Text Classification», vol.44, p.63, 2018.

M. Hu and B. Liu, «Mining and summarizing customer reviews», Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining KDD 04, vol.04, p.168, 2004.

M. A. Hughes and D. E. Garrett, «Intercoder Reliability Estimation Approaches in Marketing : A Generalizability Theory Framework for Quantitative Data», Journal of Marketing Research, 1990.

C. J. Hutto and . Gilbert, «Vader : A parsimonious rule-based model for sentiment analysis of social media text», Eighth International AAAI Conference on Weblogs and, vol.43, p.67, 2014.

O. Irsoy and C. Cardie, «Bidirectional Recursive Neural Networks for Token-Level Labeling with Structure», vol.62, pp.1-9, 2013.

O. Irsoy and C. Cardie, «Opinion mining with deep recurrent neural networks», dans Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, vol.45, p.62, 2014.

R. Johansson and M. A. , «Reranking models in fine-grained opinion analysis», of the 23rd International Conference on, p.41, 2010.

J. Kane and C. Gobl, «Identifying regions of non-modal phonation using features of the wavelet transform», dans Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISBN 19909772, vol.138, pp.177-180, 2011.

J. Kane and C. Gobl, «Evaluation of glottal closure instant detection in a range of voice qualities», Speech Communication, vol.55, issue.2, pp.295-314, 2013.

J. Kane and C. Gobl, «Wavelet maxima dispersion for breathy to tense voice discrimination», IEEE Transactions on Audio, Speech and Language Processing, vol.21, issue.6, pp.1170-1179, 2013.

A. Kennedy and D. Inkpen, «Sentiment classification of movie reviews using contextual valence shifters», dans Computational Intelligence, vol.22, p.123, 2006.

M. C. Kenton, L. Kristina, and J. Devlin, «BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding», p.217, 2018.

S. Khosla, N. Chhaya, and K. Chawla, , vol.2, p.45, 2017.

M. Kim and V. Pavlovic, «Structured output ordinal regression for dynamic facial emotion intensity prediction», dans Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISBN 364215557X, 2010.

D. Kingma and J. Ba, «Adam : A Method for Stochastic Optimization», International Conference on Learning Representations, vol.164, p.174, 2014.

S. Kiritchenko, X. Zhu, and S. M. Mohammad, «Sentiment analysis of short informal texts», Journal of Artificial Intelligence Research, vol.50, pp.723-762, 2014.

R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, A. Torralba et al., «Skip-Thought Vectors», ArxiV, vol.45, p.118, 2015.

K. Krippendorff, Content Analysis : An Introduction to Its Methodology, p.761915451, 2004.

K. Krippendorff, «Computing Krippendorff ' s Alpha-Reliability», 2011.

K. Krippendorff, «Content Analysis : An Introduction to Its Methodology», dans Content Analysis : An Introduction to Its Methodology, ISBN 9781412983150, vol.109, p.111, 2013.

A. Kumar, D. Kawahara, and S. Kurohashi, «Knowledge-enriched Two-layered Attention Network for Sentiment Analysis», p.68, 2018.

C. Lai, J. Carletta, and S. Renals, «Modelling Participant Affect in Meetings with Turn-Taking Features», Proceedings of WASSS 2013, p.54, 2013.

C. Langlet, Analyse des Sentiments dans les Conversations Humain-Agent : Vers un Modèle des Goûts de l'Utilisateur, vol.10, p.94, 2018.

C. Langlet and C. Clavel, «Modelling user's attitudinal reactions to the agent utterances : focus on the verbal content», dans 5th International Workshop on Corpora for Research on Emotion, Sentiment & Social Signals, vol.85, p.95, 2014.

C. Langlet and C. Clavel, «Adapting sentiment analysis to face-to-face humanagent interactions : From the detection to the evaluation issues», International Conference on Affective Computing and Intelligent Interaction, ACII 2015, vol.94, pp.14-20, 2015.

C. Langlet and C. Clavel, «Improving social relationships in face-to-face humanagent interactions : when the agent wants to know user's likes and dislikes», vol.43, p.95, 2015.

C. Langlet and C. Clavel, «Grounding the detection of the user's likes and dislikes on the topic structure of human-agent interactions», Knowledge-Based Systems, vol.106, p.104, 2016.

C. Langlet, G. D. Duplessis, and C. Clavel, «A web-based platform for annotating sentiment-related phenomena in human-agent conversations», dans Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and, ISBN 9783319674001, 2017.

Q. Le and T. Mikolov, «Distributed Representations of Sentences and Documents», International Conference on Machine Learning -ICML 2014, vol.32, pp.1188-1196, 2014.

M. Lewis, D. Yarats, Y. N. Dauphin, D. Parikh, and D. Batra, 2017, «Deal or No Deal ? Endto-End Learning for Negotiation Dialogues», dans EMNLP

P. P. Liang, Z. Liu, A. Zadeh, and L. Morency, «Multimodal Language Analysis with Recurrent Multistage Fusion», 2018.

R. Likert, «A technique for the measurement of attitudes», 1932.

B. Liu, Analysis and Opinion Mining», Synthesis Lectures on Human Language Technologies, vol.5, pp.1-167, 2012.

M. Lombard, J. Snyder-duch, and C. C. Bracken, «Content Analysis in Mass Communication : Assessment and Reporting of Intercoder Reliability», vol.108, p.109, 2002.

R. Lotfian and C. Busso, «Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech From Existing Podcast Recordings», IEEE Transactions on Affective Computing, vol.80, p.82, 2017.

Y. Ma, H. Peng, and E. Cambria, «Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM», dans AAAI Conference on Artificial Intelligence (AAAI-18, p.68, 2018.

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng et al., «Learning Word Vectors for Sentiment Analysis», Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics : Human Language Technologies, vol.71, pp.142-150, 2011.

F. Mairesse, J. Polifroni, and G. D. Fabbrizio, «Can prosody inform sentiment analysis ? Experiments on short spoken reviews», IEEE International Conference on Acoustics, Speech and Signal Processing -Proceedings, pp.5093-5096, 2012.

N. Majumder, S. Poria, D. Hazarika, R. Mihalcea, A. Gelbukh et al., «Dia-logueRNN : An Attentive RNN for Emotion Detection in Conversations», p.32, 2018.

B. Mandelbrot, fréquences des mots dans le discours», 1957.

E. Marchi, A. Batliner, B. Schuller, S. Fridenzon, S. Tal et al., 2012, «Speech, emotion, age, language, task, and typicality : Trying to disentangle performance and feature relevance», dans Proceedings -2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust and 2012 ASE/IEEE International Conference on Social Computing, SocialCom/PASSAT 2012, vol.9780769548487, pp.961-968

J. R. Martin and P. R. White, «The Language of Evaluation : The Appraisal Framework», Lecture Notes in Computer Science, vol.93, p.94, 2003.

O. Martin, I. Kotsia, and B. , Macq et I. Pitas. 2006, «The eNTERFACE'05 audio-visual emotion database», International Conference on Data Engineering Workshops, vol.94, pp.8-15

J. D. Mayer, P. Salovey, and D. R. Caruso, «Emotional Intelligence : New Ability or Eclectic Traits ?», American Psychologist, vol.63, pp.503-517, 2008.

A. Mccallum, D. Freitag, and F. Pereira, «Maximum Entropy Markov Models for Information Extraction and Segmentation», Icml, vol.59, pp.591-598, 2000.

B. Mccann, N. S. Keskar, C. Xiong, and R. Socher, «The Natural Language Decathlon : Multitask Learning as Question Answering», Nips, vol.62, p.63, 2018.

I. Mccowan, J. Carletta, and W. Kraaij, «The AMI meeting corpus», Proceedings Methods and Techniques in Behavioral Research, vol.54, p.79, 2005.

G. Mckeown, M. Valstar, R. Cowie, M. Pantic, and M. Schröder, «The SEMAINE database : Annotated multimodal records of emotionally colored conversations between a person and a limited agent», IEEE Transactions on Affective Computing, vol.3, issue.1, p.98, 2012.

G. Mckeown, M. F. Valstar, R. Cowie, and M. Pantic, «The semaine corpus of emotionally coloured character interactions», IEEE International Conference on Multimedia and Expo, ICME 2010, vol.82, p.83, 2010.

A. Mehrabian, Silent messages, ISBN 0534000592, viii, pp.152-182, 1971.

T. Mikolov, G. Corrado, K. Chen, and J. Dean, «Efficient Estimation of Word Representations in Vector Space», Proceedings of the International Conference on Learning Representations, vol.119, p.216, 2013.

T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin, «Advances in PreTraining Distributed Word Representations», vol.1, p.44, 2017.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, «Distributed Representations of Words and Phrases and their Compositionality», dans Proc. NIPS, pp.1-9, 2013.

T. Mikolov, G. W.-t.-yih, and . Zweig, «Linguistic regularities in continuous space word representations», Proceedings of NAACL-HLT, vol.45, pp.746-751, 2013.

G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller, «Introduction to wordnet : An on-line lexical database», International Journal of Lexicography, vol.3, pp.235-244, 1990.

M. Mitchell, J. Aguilar, T. Wilson, and B. V. Durme, «Open Domain Targeted Sentiment», vol.61, p.67, 2013.

S. M. Mohammad, «Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 English Words», Proceedings of the 56th Annual Meeting ofthe Association for Computational Linguistics, vol.42, pp.1-11, 2018.

I. Mordatch and P. Abbeel, «Emergence of Grounded Compositional Language in Multi-Agent Populations», p.21, 2017.

L. Morency, «Hidden-state Conditional Random Field (HCRF) Library», vol.164, p.175, 2007.

L. Morency, R. Mihalcea, and P. Doshi, «Towards Multimodal Sentiment Analysis : Harvesting Opinions from the Web», Proceedings of the 13th International Conference on Multimodal Interfaces, vol.9, p.146, 2011.

L. Morency, L. Morency, A. Quattoni, A. Quattoni, T. Darrell et al., «Latent-Dynamic Discrimiative Models for Continuous Gesture Recognition», IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.58, 2007.

M. Munezero, C. S. Montero, E. Sutinen, and J. Pajunen, «Are they different ? affect, feeling, emotion, sentiment, and opinion detection in text», IEEE Transactions on Affective Computing, vol.5, issue.2, p.30, 2014.

C. Musto, G. Semeraro, and M. Polignano, «A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts», dans DART 2014 8th Internation Workshop on Information Filtering and Retrieval, 2014.

S. Narayanan and A. Potamianos, 2002, «Creating conversational interfaces for children», IEEE Transactions on Speech and Audio Processing, vol.10, issue.2, pp.65-78

F. Å. Nielsen, «A new ANEW : Evaluation of a word list for sentiment analysis in microblogs», dans CEUR Workshop Proceedings, vol.718, pp.93-98, 2011.

B. Nojavanasghari, D. Gopinath, J. Koushik, T. Baltru?aitis, and L. Morency, «Deep Multimodal Fusion for Persuasiveness Prediction», dans ICMI 2016 -Proceedings of the 2016 ACM International Conference on Multimodal Interaction, ISBN 9781450345569, p.71, 2016.

R. M. Ochshorn and M. Hawkins, , p.101, 2017.

R. M. Palau, . Et-m.-f, and . Moens, «Argumentation Mining : The Detection, Classification and Structure of Arguments in Text», dans ICAIL, vol.25, pp.98-107, 2009.

M. Paleari and B. Huet, «Toward emotion indexing of multimedia excerpts», dans 2008 International Workshop on Content-Based Multimedia Indexing, pp.425-432, 2008.

B. Pang and L. Lee, «A sentimental education : Sentiment analysis using subjectivity summarization based on minimum cuts», Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, vol.40, p.77, 2004.

S. Park, J. Gratch, and L. Morency, «I already know your answer : Using nonverbal behaviors to predict immediate outcomes in a dyadic negotiation», of the 14th ACM international conference, vol.82, pp.19-22, 2012.

S. Park, S. Scherer, J. Gratch, P. J. Carnevale, and L. Morency, «I can already guess your answer : Predicting respondent reactions during dyadic negotiation», IEEE Transactions on Affective Computing, vol.6, issue.2, pp.86-96, 2015.

S. Park, H. S. Shim, M. Chatterjee, K. Sagae, and L. Morency, «Computational Analysis of Persuasiveness in Social Multimedia : A Novel Dataset and Multimodal Prediction Approach», Proceedings of the 16th International Conference on Multimodal Interaction -ICMI '14, vol.52, p.78, 2014.

R. Paul, A. Augustyn, A. Klin, and F. R. Volkmar, «Perception and production of prosody by speakers with autism spectrum disorders», Journal of Autism and Developmental Disorders, vol.26, p.48, 2005.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Journal of Machine Learning Research, vol.12, p.174, 2012.

T. Pellegrini and V. Barriere, «Time-continuous estimation of emotion in music with recurrent neural networks», dans CEUR Workshop Proceedings, vol.1436, 2015.

J. Pennington, R. Socher, and C. D. Manning, «GloVe : Global Vectors for Word Representation», Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, p.127, 2014.

A. Pentland, «Social Signal Processing, 2007.

V. Perez-rosas, R. Mihalcea, and L. Morency, «Multimodal sentiment analysis of spanish online videos», IEEE Intelligent Systems, vol.28, pp.38-45, 2013.

V. Perez-rosas, R. Mihalcea, and L. Morency, «Utterance-Level Multimodal Sentiment Analysis», Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol.1, p.164, 2013.

M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark et al., «Deep contextualized word representations», dans Naacl, vol.44, p.45, 2018.

M. E. Peters, M. Neumann, L. Zettlemoyer, and W. Yih, «Dissecting Contextual Word Embeddings : Architecture and Representation», vol.217, pp.1499-1509, 2018.

L. Polanyi and A. Zaenen, «Contextual valence shifters», Computing attitude and affect in text : Theory and Applications, vol.20, p.154, 2006.

S. Poria, E. Cambria, R. Bajpai, and A. Hussain, «A Review of Affective Computing : From Unimodal Analysis to Multimodal Fusion», vol.9, p.88, 2017.

S. Poria, E. Cambria, and A. Gelbukh, «Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis», dans Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, vol.45, p.127, 2015.

S. Poria, E. Cambria, and A. Gelbukh, «Aspect Extraction for Opinion Mining with a Deep Convolutional Neural Network», Knowledge-Based Systems, vol.108, p.68, 2016.

S. Poria, E. Cambria, D. Hazarika, C. Science, and N. Mazumder, «Context-Dependent Sentiment Analysis in User-Generated Videos», dans ACL 2017, vol.72, p.183, 2017.

M. Porter, «An algorithm for suffix stripping», Program : electronic library and information systems, vol.14, pp.130-137, 1980.

A. Quattoni, S. Wang, L. Morency, M. Collins, and T. Darrell, «Hidden-state conditional random fields.», IEEE transactions on pattern analysis and machine intelligence, vol.29, p.149, 2007.

L. Rabiner and B. Juang, «An Introduction to Hidden Markov Models», IEEE ASSP MAGAZINE, 1986.

A. Radford and T. Salimans, «Improving Language Understanding by Generative Pre-Training», vol.63, pp.1-12, 2018.

S. S. Rajagopalan, L. Morency, T. Baltru?aitis, and R. Goecke, «Extending long short-term memory for multi-view structured learning», dans Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9911 LNCS, ISBN 9783319464770, pp.338-353, 2016.

N. Rakicevic, O. Rudovic, S. Petridis, and M. Pantic, «Multi-modal Neural Conditional Ordinal Random Fields for agreement level estimation», Proceedings -International Conference on Pattern Recognition, pp.2228-2233, 2017.

G. A. Ramirez, T. Baltru?aitis, and L. Morency, «Modeling latent discriminative dynamic of multi-dimensional affective signals», Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol.6975, pp.396-406, 2011.

F. Ringeval, Ancrages et modèles dynamiques de la prosodie : application à la reconnaissance des émotions actées et spontanées, thèse de doctorat, vol.22, p.25, 2011.

F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, «Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions», dans, 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013, ISBN 9781467355452, 2013.

A. A. Rizzo, S. Scherer, D. Devault, J. Gratch, R. Artstein et al., Morency. 2014, «Detection and computational analysis of psychological signals using a virtual human interviewing agent», Proceedings of the 10th Intl Conf. Disability, Virtual Reality & Associated Technologies, pp.2-4

A. Röbel and X. Rodet, «Efficient spectral envelope estimation and its application to pitch shifting and envelope preservation», DAFx-05, vol.132, pp.1-6, 2005.

X. Rong, «word2vec Parameter Learning Explained», vol.11, p.236, 2014.

S. Sahay, S. H. Kumar, R. Xia, J. Huang, and L. Nachman, «Multimodal Relational Tensor Network for Sentiment and Emotion Classification», vol.24, p.145, 2018.

V. Sanh, T. Wolf, S. Ruder, H. Court, and H. Row, «A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks», p.217, 2018.

K. R. Scherer, «Vocal communication of emotion : A review of research paradigms», Speech Communication, vol.40, pp.227-256, 2003.

K. R. Scherer, «What are emotions ? and how can they be measured ?, Social Science Information, vol.44, p.130, 2005.

S. Scherer, Z. Hammal, Y. Yang, L. Morency, and J. F. Cohn, «Dyadic Behavior Analysis in Depression Severity Assessment Interviews», dans Proceedings of the 16th International Conference on Multimodal Interaction -ICMI '14, ISBN 9781450328852, vol.52, p.136, 2014.

S. Scherer, J. Kane, C. Gobl, and F. Schwenker, «Investigating fuzzy-input fuzzyoutput support vector machines for robust voice quality classification», Computer Speech and Language, vol.27, issue.1, p.234, 2013.

B. Schuller, «Intelligent Audio Analysis», Signals and Communication Technology, p.135, 2013.

B. Schuller, A. Batliner, D. Seppi, S. Steidl, T. Vogt et al., «The relevance of feature type for the automatic classification of emotional user states : Low level descriptors and functionals», dans Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol.2, pp.881-884, 2007.

B. Schuller and G. Rigoll, «Recognising interest in conversational speech -Comparing bag of frames and supra-segmental features», dans Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol.49, pp.1999-2002, 2009.

B. Schuller, J. Schenk, G. Rigoll, and T. Knaup, Chaos" : Comparing linguistic analysis based on on-line knowledge sources and bags-of-N-grams for movie review valence estimation», Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, vol.119, p.164, 2009.

B. Schuller, S. Steidl, and A. Batliner, «The INTERSPEECH 2009 emotion challenge», dans Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol.13, p.118, 2009.

B. Schuller, S. Steidl, A. Batliner, J. Hirschberg, J. K. Burgoon et al., «The INTERSPEECH 2016 Computational Paralinguistics Challenge : Deception, Sincerity & Native Language», dans Proceedings of the Annual Conference of the International Speech Communication Association, 2016.

B. Schuller, S. Steidl, A. Batliner, F. Schiel, and J. Krajewski, «The INTERSPEECH 2011 speaker state challenge», dans Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISBN 19909772, vol.51, pp.3201-3204, 2011.

B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer et al., «The INTERSPEECH 2013 computational paralinguistics challenge : Social signals, conflict, emotion, autism», dans Proceedings of the Annual Conference of the International Speech Communication Association, vol.51, pp.148-152, 2013.

B. Schuller, M. Valstar, F. Eyben, G. Mckeown, R. Cowie et al., «AVEC 2011 -The first international audio/visual emotion challenge», dans Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and, ISBN 9783642245701, vol.115, p.217, 2011.

S. Schuster and C. D. Manning, «Enhanced English Universal Dependencies : An Improved Representation for Natural Language Understanding Tasks», Proceedings of LREC 2016, vol.99, p.173, 2016.

W. A. Scott, «Reliability of Content Analysis : The Case of Nominal Scale Coding», Public Opinion Quarterly, 1955.

C. E. Shannon, «A Mathematical Theory of Communication», Bell System Technical Journal, vol.27, p.22, 1928.

B. Shin, T. Lee, and J. D. Choi, «Lexicon Integrated CNN Models with Attention for Sentiment Analysis», vol.67, pp.149-158, 2016.

E. Shouse, «Feeling, emotion, affect», 2005.

E. Shriberg, A. Stolcke, D. Hakkani-tür, and G. Tür, «Prosody-based automatic segmentation of speech into sentences and topics», Speech Communication, vol.25, p.48, 2000.

M. Sigmund, «Statistical analysis of fundamental frequency based features in speech under stress», Information Technology and Control, vol.42, pp.286-291, 2013.

R. Socher, A. Perelygin, and J. Wu, 2013, «Recursive deep models for semantic compositionality over a sentiment treebank», EMNLP-2013 : Conference on Empirical Methods in Natural Language Processing, vol.62, p.63

M. Soleymani, D. Garcia, B. Jou, B. Schuller, S. F. Chang et al., «A survey of multimodal sentiment analysis», Image and Vision Computing, vol.65, pp.3-14, 2017.

S. Somasundaran, J. Wiebe, and J. Ruppenhofer, «Discourse Level Opinion Interpretation», Proceedings of the 22nd International Conference on Computational Linguistics, vol.54, pp.801-808, 2008.

Y. Song, L. Morency, and R. Davis, «Multi-view latent variable discriminative models for action recognition», Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol.60, p.211, 2012.

Y. Song, L. Morency, and R. Davis, «Multimodal Human Behavior Analysis : Learning Correlation and Interaction Across Modalities», Proceedings of the 14th ACM international conference on Multimodal interaction -ICMI '12, vol.152, p.211, 2012.

Y. Song, R. L.-p.-morency, and . Davis, «Action Recognition by Hierarchical Sequence Summarization», dans IEEE Conference on Computer Vision and Pattern Recognition, 2013.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, «Dro-pout : A Simple Way to Prevent Neural Networks from Overfitting», Journal of Machine Learning Research, vol.15, p.174, 2014.

C. Strapparava and A. Valitutti, «WordNet-Affect : an affective extension of WordNet», Proceedings of the 4th International Conference on Language Resources and Evaluation, vol.42, pp.1083-1086, 2004.

I. Sutskever, O. Vinyals, and Q. V. Le, «Sequence to Sequence Learning with Neural Networks», Nips, p.9, 2014.

M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, «Lexicon-Based Methods for Sentiment Analysis», Computational Linguistics, vol.37, issue.2, p.125, 2011.

O. Täckström and R. Mcdonald, «Discovering Fine-Grained Sentiment with Latent Variable Structured Prediction Models», dans European Conference on Information Retrieval, p.153, 2011.

M. Tahon and G. , Degottex et L. Devillers. 2012, «Usual voice quality features and glottal features for emotional valence detection», Proceedings of Speech, pp.2-5

M. Tahon and L. Devillers, 2016, «Towards a Small Set of Robust Acoustic Features for Emotion Recognition : Challenges», IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol.24, issue.1, pp.16-28

D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu et al., «Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification», Acl, pp.1555-1565, 2014.

T. Bosch, L. , N. Oostdijk, and L. Boves, «On temporal aspects of turn taking in conversational dialogues», 2005.

I. R. Titze and J. Et, Sundberg. 1992, «Vocal intensity in speakers and singers», Journal of the Acoustical Society of America, vol.91, issue.5, pp.2936-2946

K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, «MEL-GENERALIZED CEPSTRAL ANALYSIS -A UNIFIED APPROACH TO SPEECH SPECTRAL ESTIMATION», dans In Proc. of ICSLP, vol.133, pp.1043-1046, 1994.

G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou et al., «Adieu features ? End-to-end speech emotion recognition using a deep convolutional recurrent network», dans 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol.47, p.140, 2016.

P. D. Turney, «Thumbs up or thumbs down ? Semantic Orientation applied to Unsupervised Classification of Reviews», Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp.417-424, 2002.

P. Tzirakis, G. Trigeorgis, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, End Multimodal Emotion Recognition using Deep Neural Networks», vol.14, p.47, 2017.

G. Varni, I. Hupont, C. Clavel, and M. Chetouani, «Computational Study of Primitive Emotional Contagion in Dyadic Interactions», IEEE Transactions on Affective Computing, vol.14, issue.8, p.112, 2017.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., «Attention Is All You Need», 2017.

V. Venek, S. Scherer, L. Morency, A. S. Rizzo, and J. Pestian, 2014, «Adolescent suicidal risk assessment in clinician-patient interaction : A study of verbal and acoustic behaviors», dans, SLT 2014 -Proceedings, ISBN 9781479971299, pp.277-282, 2014.

A. Vinciarelli, M. Pantic, and H. Bourlard, «Social signal processing : Survey of an emerging domain», Image and Vision Computing, vol.27, pp.1743-1759, 2009.

A. Vinciarelli, M. Pantic, H. Bourlard, and A. Pentland, «Social Signal Processing : State-of-the-Art and Future Perspectives of an Emerging Domain», Proceedings of the 16th ACM international conference on Multimedia, pp.1061-1070, 2008.

A. Vinciarelli, M. Pantic, D. Heylen, C. Pelachaud, I. Poggi et al., «Bridging the Gap Between Social Animal and Unsocial Machine : A Survey of Social Signal Processing», IEEE Transactions on Affective Computing, vol.3, pp.1-20, 2011.

W. Wang, S. J. Pan, D. Dahlmeier, and X. Xiao, «Recursive Neural Conditional Random Fields for Aspect-based Sentiment Analysis», Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, p.67, 2016.

P. Warren, «Prosody and Parsing : An Introduction», Language and Cognitive Processes, vol.25, p.48, 1996.

A. B. Warriner, V. Kuperman, and M. Brysbaert, «Norms of valence, arousal, and dominance for 13,915 English lemmas», Behavior Research Methods, vol.45, p.124, 2013.

F. Weninger, F. Eyben, B. Schuller, M. Mortillaro, and K. R. Scherer, «On the acoustics of emotion in audio : What speech, music, and sound have in common», Frontiers in Psychology, vol.4, pp.1-12, 2013.

J. Wiebe, T. Wilson, and C. Cardie, «Annotating expressions of opinions and emotions in language», Language Resources and Evaluation, vol.39, issue.2-3, p.65, 2005.

A. Wierzbicka, Emotions Across Languages and Cultures : Diversity and Unversals, vol.29, 1999.

T. Wilson, J. Wiebe, and P. Hoffman, «Recognizing contextual polarity in phrase level sentiment analysis», ACL, vol.7, p.67, 2005.

M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox et al., «Abandoning emotion classes -Towards continuous emotion recognition with modelling of long-range dependencies», dans Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol.58, pp.597-600, 2008.

M. Wöllmer, M. Kaiser, F. Eyben, B. Schuller, and G. Rigoll, «LSTM-modeling of continuous emotions in an audiovisual affect recognition framework», Image and Vision Computing, vol.31, issue.2, p.162, 2013.

M. Wöllmer, F. Weninger, T. Knaup, B. Schuller, C. Sun et al., «YouTube Movie Reviews : Sentiment Analysis in an Audio-Visual Context», IEEE Intelligent Systems, vol.28, p.189, 2013.

C. Xu, D. Tao, and C. Xu, «A Survey on Multi-view Learning», pp.1-59, 2013.

B. Yang and C. Cardie, 2012, «Extracting opinion expressions with semi-markov conditional random fields», dans Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, vol.66, pp.1335-1345

B. Yang and C. Cardie, «Joint Inference for Fine-grained Opinion Extraction», Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol.41, p.67, 2013.

Z. Yang, D. Yang, C. Dyer, X. He, A. Smola et al., «Hierarchical Attention Networks for Document Classification», Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies, vol.62, p.173, 2016.

S. Yildirim, S. Narayanan, and A. Potamianos, «Detecting emotional state of a child in a conversational computer game», Computer Speech and Language, vol.25, issue.1, pp.29-44, 2011.

A. Zadeh, M. Chen, S. Poria, E. Cambria, and L. Morency, «Tensor Fusion Network for Multimodal Sentiment Analysis», dans EMNLP. 45, vol.52, 2017.

A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E. Cambria et al., «Me-mory Fusion Network for Multi-view Sequential Learning», dans AAAI, vol.52, p.73, 2018.

A. Zadeh, P. P. Liang, S. Poria, E. Cambria, and L. Morency, «Multimodal Language Analysis in the Wild : CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph», Proceedings of ACL, vol.52, p.154, 2018.

A. Zadeh, P. P. Liang, S. Poria, P. Vij, and E. Cambria-et-l.-p.-morency, «Multi-attention Recurrent Network for Human Communication Comprehension», dans AAAI. 45, vol.52, p.193, 2018.

A. Zadeh, P. L.-p.-morency, S. Liang, E. Poria, S. Cambria et al., «First Grand Challenge and Workshop on Human Multimodal Language ( Challenge-HML )», dans Workshop on Human Multimodal Language ( Challenge-HML ) -ACL, ISBN 9781948087469, vol.32, p.78, 2018.

A. Zadeh, R. Zellers, and E. Pincus-et-l.-p.-morency, «MOSI : Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos», vol.78, p.108, 2016.

M. Zhang, Y. Zhang, and D. T. Vo, «Neural Networks for Open Domain Targeted Sentiment», Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, p.61, 2015.

Y. Zhou, S. Scherer, D. Devault, J. Gratch, G. Stratou et al., «Multimodal Prediction of Psychological Disorders : Learning Verbal and Nonverbal Commonalities in Adjacency Pairs», Proceedings of the 17th Workshop on the Semantics and Pragmatics of Dialogue, vol.59, p.60, 2013.