. De, R. E. Castilho, E. Mujdricza-maydt, S. M. Yimam, S. Hartmann et al., «A web-based tool for the integrated annotation of semantic and syntactic structures, Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), vol.101, pp.76-84, 2016.

C. Clavel, G. Adda, F. Cailliau, M. Garnier-rizet, A. Cavet et al., «Spontaneous speech and opinion detection: mining call-centre transcripts, Language resources and evaluation, vol.47, pp.1089-1125, 2013.

C. Clavel and Z. C. , «Sentiment analysis: from opinion mining to human-agent interaction, IEEE Transactions on affective computing, vol.7, issue.1, pp.74-93, 2016.

A. Garcia, S. Essid, C. Clavel, F. , and D. , Structured Output Learning with Abstention: Application to Accurate Opinion Prediction, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01950907

A. Garcia, S. Essid, F. D'alché-buc, and C. C. , «A multimodal movie 107 CHAPTER 6. A MULTIMODAL MOVIE REVIEW CORPUS FOR FINE-GRAINED OPINION MINING review corpus for fine-grained opinion mining, 2019.

N. Jakob and I. G. , «Extracting opinion targets in a single-and crossdomain setting with conditional random fields, Proceedings of the 2010 conference on empirical methods in natural language processing, Association for Computational Linguistics, vol.99, p.100, 2010.

J. Lafferty, A. Mccallum, and F. C. Pereira, «Conditional random fields: Probabilistic models for segmenting and labeling sequence data, p.105, 2001.

J. R. Landis and G. G. Koch, «The measurement of observer agreement for categorical data», biometrics, pp.159-174, 1977.

C. Langlet, G. D. Duplessis, and C. C. , «A web-based platform for annotating sentiment-related phenomena in human-agent conversations, International Conference on Intelligent Virtual Agents, vol.98, pp.239-242, 2017.

D. Marcheggiani, O. Täckström, A. Esuli, and F. Sebastiani, «Hierarchical multi-label conditional random fields for aspect-oriented opinion mining.», in ECIR, vol.96, pp.273-285, 2014.

J. R. Martin and P. R. White, The language of evaluation, vol.2, p.98

S. Mohammad, «A practical guide to sentiment annotation: Challenges and solutions, Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, vol.100, pp.174-179, 2016.

M. Munezero, C. S. Montero, E. Sutinen, and J. Pajunen, «Are they different? affect, feeling, emotion, sentiment, and opinion detection in text, IEEE Transactions on Affective Computing, vol.5, p.99, 2014.

S. Park, H. S. Shim, M. Chatterjee, K. Sagae, and L. Morency, Computational analysis of persuasiveness in social multimedia: A novel dataset and multimodal prediction approach, Proceedings of the 16th International Conference on Multimodal Interaction, vol.97, p.98, 2014.

C. Toprak, N. , and I. G. , «Sentence and expression level annotation of opinions in user-generated discourse, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, vol.96, pp.575-584, 2010.

W. Wei and J. A. Gulla, «Sentiment learning on product reviews via sentiment ontology tree, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pp.404-413, 2010.

J. Wiebe, T. Wilson, and C. C. , «Annotating expressions of opinions and emotions in language, Language resources and evaluation, vol.39, pp.165-210, 2005.

M. Wöllmer, F. Weninger, T. Knaup, B. Schuller, C. Sun et al.,

. Morency, «Youtube movie reviews: Sentiment analysis in an audio-visual context, IEEE Intelligent Systems, vol.28, issue.3, pp.46-53, 2013.

A. Zadeh, P. Liang, S. Poria, P. Vij, E. Cambria et al., «Multiattention recurrent network for human communication comprehension, 2018.

A. Zadeh, R. Zellers, E. Pincus, and L. Morency, «Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages, IEEE Intelligent Systems, vol.31, issue.6, pp.82-88, 2016.

L. Zhuang, F. Jing, and X. Zhu, «Movie review mining and summarization, Proceedings of the 15th ACM international conference on Information and knowledge management, vol.101, pp.43-50, 2006.

P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson et al., «Bottom-up and top-down attention for image captioning and visual question answering, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol.119, pp.6077-6086, 2018.

A. Argyriou, T. Evgeniou, and M. P. , «Multi-task feature learning, Advances in neural information processing systems, vol.114, pp.41-48, 2007.

S. Attardo, J. Eisterhold, J. Hay, and I. P. , «Multimodal markers of irony and sarcasm, vol.16, pp.243-260, 2003.

A. Ben-youssef, C. Clavel, and S. E. , «Early detection of user engagement breakdown in spontaneous human-humanoid interaction, IEEE Transactions on Affective Computing, pp.1949-3045, 2019.

K. Cho, B. Van-merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning phrase representations using rnn encoder-decoder for statistical machine translation», 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

C. Clavel and Z. C. , «Sentiment analysis: from opinion mining to human-agent interaction, IEEE Transactions on affective computing, vol.7, issue.1, pp.74-93, 2016.

A. Garcia, P. Colombo, F. D'alché-buc, S. Essid, and C. C. , «From the token to the review: A hierarchical multimodal approach to opinion mining, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, p.111, 2019.

L. Hemamou, G. Felhi, V. Vandenbussche, J. Martin, and C. Clavel, Hirenet: a hierarchical attention model for the automatic analysis of asynchronous video job interviews, AAAI 2019, p.112, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02370842

S. Hochreiter and J. Schmidhuber, «Long short-term memory, Neural computation, vol.9, issue.8, p.122, 1997.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization», 2014.

M. Luong, H. Pham, and C. D. Manning, Effective approaches to attention-based neural machine translation», 2015.

R. Nallapati, F. Zhai, and B. Zhou, Summarunner: A recurrent neural network based sequence model for extractive summarization of documents, Thirty-First AAAI Conference on Artificial Intelligence, vol.119, 2017.

B. Nojavanasghari, D. Gopinath, J. Koushik, T. Baltru?aitis, and L. Morency, Deep multimodal fusion for persuasiveness prediction, Proceedings of the 18th ACM International Conference on Multimodal Interaction, vol.112, pp.284-288, 2016.

J. Pennington, R. Socher, and C. Manning, «Glove: Global vectors for word representation, Proceedings of the 2014 conference on empirical methods in natural language processing, vol.123, p.124, 2014.

S. Ruder, An overview of multi-task learning in deep neural networks», 2017.

V. Sanh, T. Wolf, and S. Ruder, «A hierarchical multi-task approach for learning embeddings from semantic tasks, The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019, vol.112, p.115, 2018.

R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning et al., «Recursive deep models for semantic compositionality over a sentiment treebank, Proceedings of the 2013 conference on empirical methods in natural language processing, vol.112, pp.1631-1642, 2013.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, vol.15, pp.1929-1958, 2014.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., Advances in Neural Information Processing Systems, vol.112, pp.5998-6008, 2017.

Z. Yang, D. Yang, C. Dyer, X. He, A. Smola et al., «Hierarchical attention networks for document classification, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.120, pp.1480-1489, 2016.

A. Zadeh, P. Liang, S. Poria, P. Vij, E. Cambria et al., «Multiattention recurrent network for human communication comprehension, AAAI. 10, vol.112, p.125, 2018.

A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E. Cambria et al., «Memory fusion network for multi-view sequential learning, vol.112, p.125, 2018.

D. Belanger and A. M. , «Structured prediction energy networks, International Conference on Machine Learning, vol.132, pp.983-992, 2016.

J. Devlin, M. Chang, K. Lee, and K. Toutanova, «Bert: Pre-training of deep bidirectional transformers for language understanding», 2018.

M. Djerrab, A. Garcia, M. Sangnier, F. , and D. Buc, Output fisher embedding regression, vol.107, p.132, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02371087

A. Nowak-vila, F. Bach, and A. R. , «A general theory for structured prediction with smooth convex surrogates, 2019.

S. Ross, D. Munoz, M. Hebert, and J. A. Bagnell, Learning messagepassing inference machines for structured prediction, CVPR 2011, vol.133, pp.2737-2744, 2011.

A. Rudi, C. Ciliberto, G. Marconi, and L. Rosasco, Advances in Neural Information Processing Systems, pp.5611-5622, 2018.

K. Sohn, H. Lee, and X. Y. , «Learning structured output representation using deep conditional generative models, Advances in neural information processing systems, vol.132, pp.3483-3491, 2015.

J. J. Tompson, A. Jain, Y. Lecun, and C. Bregler, Advances in neural information processing systems, vol.133, pp.1799-1807, 2014.

Y. H. Tsai, P. P. Liang, A. Zadeh, L. Morency, and R. Salakhutdinov, International Conference on Learning Representations, 2019.

C. References, C. , L. Rosasco, A. R. , D. D. Lee et al., «A consistent regularization approach for structured prediction, Advances in Neural Information Processing Systems, vol.29, p.137, 2016.

I. Dimitrovski, D. Kocev, S. Loskovska, and S. D. , «Hierchical annotation of medical images, Proceedings of the 11th International Multiconference -Information Society IS, vol.147, pp.174-181, 2008.

T. M. Lehmann, H. Schubert, D. Keysers, M. Kohnen, and B. B. Wein, The irma code for unique classification of medical images, Medical Imaging 2003: PACS and Integrated Medical Information Systems: Design and Evaluation, vol.5033, pp.440-451, 2003.

P. Li, A. Mazumdar, and O. M. , Efficient rank aggregation via lehmer codes», 2017.

. Cependant, Les approches actuelles reposent sur la prédiction de représentations simplifiées d'expressions affectives. Par exemple, il est possible de se restreindre à la reconnaissance de l