Emotional identification systems using facial recognition are now offering innovations that can improve our daily lives.
For example, in the ability to make our dialogues with voice assistants more natural. In this paper we will focus on the recognition of emotions expressed by non-verbal language, based on image processing techniques.
Nowadays, the recognition of human emotions by machine learning techniques (e.g. neural networks) is a rapidly growing field of research that contributes to innovations in many industries.
For example, in the automotive industry, these AI systems contribute to improve safety on board a vehicle by detecting and identifying the driver's behavior (signs of fatigue, stress, unusual behaviors) and then informing him of a danger (2). Emotion recognition technologies are also a tool to help children with autism. Indeed, with Google Glass (among others), autistic children have been able to improve their social skills by using an application that helps them identify the emotions transmitted by the facial expressions of their interlocutors (3).
Facial emotion recognition is performed in two steps. First, a facial recognition algorithm is responsible for detecting the face of the targeted individual using a camera. Then a second AI system is applied by identifying the key points of the face, such as the mouth, eyes and nose. The latter uses this information to recognize emotions and micro-expressions. To train this type of neural network it is necessary to provide data composed of a large quantity of images containing human faces expressing various emotions.
However, emotion detection through facial recognition technology has its own limitations. Misinterpretation is one of the major risks for these emotion detection algorithms due to their very subjective nature.
This opinion is shared by many specialists including Kate Crawford (researcher at Microsoft):
"Alongside the deployment of these technologies, a large number of studies show that there is ... no substantial evidence that there is a perfect match between the emotion you feel and the expression on your face." (4).
Among these limitations, one was detailed by researcher Ekman during a cross-cultural experiment conducted in Papua New Guinea with a tribe called the Fores. Ekman sought to test the controversial hypothesis that all humans exhibit a small number of universal emotions, or affects, that are innate and identical throughout the world.
Fundamentally isolated from Western culture, the Fores tribe was an ideal sample to conduct this experiment. The results led him to deduce that facial expressions are context-dependent and that they diverge according to the culture studied (1). Thus, the performance of an AI system that analyzes human emotions may differ depending on the socio-cultural domain in which it evolves compared to the one in which it was designed. However, Ekman's thesis nevertheless noted that among a large panel of emotions, six would be common to all humans (happiness, sadness, disgust, surprise, anger and fear).
AI systems and their algorithms are parameterized to provide satisfactory performances in a predetermined domain of use. However, these performances may deteriorate when the AI system is exposed to situations that are too far from what it knows. Thus, for an AI system aiming at identifying emotions through facial recognition, each cultural difference or particularity must be taken into account.
For example, we can mention the head swaying specific to the Indian culture. A system developed in the West might not be able to interpret this behavior correctly, since it is unknown to a Westerner. In response, a multi-cultural design team will limit these risks. If large companies can have sufficiently diverse staff, it will be more difficult for small teams.
The interpretation of the smile is also a good example illustrating the complexity of setting the domain of use of an AI system. If in Japan the smile is seen as a way to show respect or to hide what one really feels, it is perceived, in western societies, as a way to express satisfaction.
To conclude, despite the identification of six primary emotions, there is a multitude of more subtle and complex emotions to define for an AI system. These primary emotions can be the elementary bricks for reading behavior, but they do not constitute the complete grammar of all emotions.
The detection of an emotion by an AI system is a first step, its interpretation and understanding, varying according to the cultural environment, is a second and much more complex step for which much progress remains to be made.
Written by Arnault Ioualalen & Léo Limousin
Pictures credits : Header : Hello I'm Nik (unsplash) ; Core : bruce mars, 张 学欢, Marcos Paulo Prado (Unsplash)