Skip to Main content Skip to Navigation

Interactive learning of words and objects for a humanoid robot

Abstract : Future applications of robotics, especially personal service robots, will require continuous adaptability to the environment, and particularly the ability to recognize new objects and learn new words through interaction with humans. Though having made tremendous progress by using machine learning, current computational models for object detection and representation still rely heavily on good training data and ideal learning supervision. In contrast, two year old children have an impressive ability to learn to recognize new objects and at the same time to learn the object names during interaction with adults and without precise supervision. Therefore, following the developmental robotics approach, we develop in the thesis learning approaches for objects, associating their names and corresponding features, inspired by the infants' capabilities, in particular, the ambiguous interaction with humans, inspired by the interaction that occurs between children and parents.The general idea is to use cross-situational learning (finding the common points between different presentations of an object or a feature) and to implement multi-modal concept discovery based on two latent topic discovery approaches : Non Negative Matrix Factorization (NMF) and Latent Dirichlet Association (LDA). Based on vision descriptors and sound/voice inputs, the proposed approaches will find the underlying regularities in the raw dataflow to produce sets of words and their associated visual meanings (eg. the name of an object and its shape, or a color adjective and its correspondence in images). We developed a complete approach based on these algorithms and compared their behavior in front of two sources of uncertainties: referential ambiguities, in situations where multiple words are given that describe multiple objects features; and linguistic ambiguities, in situations where keywords we intend to learn are merged in complete sentences. This thesis highlights the algorithmic solutions required to be able to perform efficient learning of these word-referent associations from data acquired in a simplified but realistic acquisition setup that made it possible to perform extensive simulations and preliminary experiments in real human-robot interactions. We also gave solutions for the automatic estimation of the number of topics for both NMF and LDA.We finally proposed two active learning strategies, Maximum Reconstruction Error Based Selection (MRES) and Confidence Based Exploration (CBE), to improve the quality and speed of incremental learning by letting the algorithms choose the next learning samples. We compared the behaviors produced by these algorithms and show their common points and differences with those of humans in similar learning situations.
Document type :
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Thursday, August 10, 2017 - 4:05:06 PM
Last modification on : Wednesday, May 11, 2022 - 3:20:03 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01573823, version 1


Yuxin Chen. Interactive learning of words and objects for a humanoid robot. Machine Learning [cs.LG]. Université Paris Saclay (COmUE), 2017. English. ⟨NNT : 2017SACLY003⟩. ⟨tel-01573823⟩



Record views


Files downloads