Will voice assistants become your best interlocutors?

Voice assistants have become part of our daily lives in a few years. It is very likely that a voice assistant is permanently in your company.

And this, whether it is through connected objects such as connected speakers (Amazon’s Alexa, Google Home) or via your smartphone with a voice-activated personal assistant (Apple’s SIRI, Microsoft’s Cortana or even more recently the Google Home Assistant) helping you with your daily tasks and home appliances. This large-scale deployment of natural language processing technologies is due in particular to the significant development of artificial intelligence systems.

What is automatic natural language processing (NLP)?

Today, speech-to-text technologies performed by an AI system offer very satisfactory performance and are used in many industries. For example, medical staff use these voice dictation tools to transcribe their notes. These automatic speech transcription tools are also used on a daily basis by thousands of internet users who perform Google voice input for their searches. Indeed, Google has revealed that 20% of searches on the Android Google App are now by voice¹. And the trend is growing.

However, while it is easy to translate a sequence of sounds into a sequence of words, getting an AI system to understand the meaning of a sentence is a different challenge. And on this point work is still needed. For nearly 50 years, researchers have been working to build computer algorithms capable of semantically understanding human language in real time. In the 2000s, a giant leap forward was made through the use of neural networks. These connectionist AI systems have revolutionised language processing. Although the first voice dictation software was launched in 1997 by the company Dragon, it was not until 2011, with the release of the SIRI application using a Deep Neural Network (DNN), that the general public was able to take advantage of real-time automatic language processing functionalities².

There is understanding and there is understanding

To understand, we must first distinguish between Natural Language Processing (NLP) and Natural Language Comprehension (NLC). NLP allows a machine to understand grammatically what a human says, whereas NLC provides a semantic understanding of a conversational exchange between a human and a machine. Understanding the meaning of words and their intent by a machine is still a challenge for researchers. There are still many errors in the recognition and semantic processing of text.

For example, when an individual listens to music on a Google Home speaker and dictates the command: “Ok Google, change”, the connected speaker, instead of changing the music, will inform the individual about the exchange offices near his or her location. In this example, the speaker’s AI system was able to find every word that was spoken but failed to interpret them semantically. So, as we discussed in our article “AI: the difference between learning and generalizing“, AI is not really intelligent as it simply follows predefined rules.

What makes language processing difficult?

The semantic understanding of human language by a machine is very complex and can be influenced by many factors. Whether it is the (sometimes incorrect) grammatical structure, the repetition of words in moments of hesitation or in order to emphasize a point: the variations in sentence structure are almost endless. In addition, the meaning of an interaction can vary according to the context and, for example, second degree or irony are still difficult to detect and interpret by AI systems.

What is symbolic and connectionist AI?

Symbolic AI

The first NLP methods were based on a symbolic approach developed by Noam Chomsky in the late 1960s³. Symbolic AI relied on logic or ontologies to model knowledge and perform NLP tasks⁴. Thus, linguistic knowledge was manually encoded in the form of an algorithm that combined a set of grammatical rules and lexical databases.

Connectionist AI for NLP

Since the 1990s, a so-called statistical approach has emerged in the world of artificial intelligence applied to NLP. The technologies of connectionist AI systems made their arrival through the appearance of Machine Learning, including neural networks and support vector machines. Connectionist AI is an artificial intelligence algorithm that seeks to draw inspiration from living things to model behavior. Thus, AI systems employing convolutional neural networks (CNNs) are schematically inspired by the functioning of the biological brain for image processing⁵.

Neural networks have demonstrated their ability to scale on Speech to Text tasks, compared to traditional machine learning models such as hidden Markov models or logistic regression⁶. However, connectionist AI systems also have their own limitations in performing certain tasks aimed at semantically understanding and establishing a human-machine conversational exchange⁷.

Indeed, although they are better than symbolic AIs at generalizing their behavior in real life situations, they do not allow for the modeling of the human-machine interaction. Moreover, they do not allow to model reasoning and to build an understanding of the context they are confronted with.

Towards hybrid AI to improve semantic understanding?

An interaction between symbolic AI and connectionist AI could allow more complex tasks involving the semantics of natural language processing to be performed more accurately. The deployment of connectionist AI has given a new lease of life to language processing.

While symbolic AI is better at modeling reasoning, it has difficulty operating in a real-world environment with perturbations or uncertainty due to its lack of robustness. Indeed, it is probably impossible to manually characterize a domain of use comprising all the situations that the AI system might face in conversational exchanges. Even if the length of each sentence is generally short, the number of possible texts can quickly become too large.

Connectionist AI could then complement symbolic AI, bringing its ability to generalize in real-life conditions. Indeed, connectionist AI trains and learns to generalize from a vast amount of data that illustrates real situations that it will have to deal with. For example, to perform natural language processing tasks, engineers provide their connectionist AI systems with data containing audio files of conversations. When the connectionist AI system is faced with a situation it has never encountered before, it will use the most similar situation it knows from its training base to extrapolate a response to the unfamiliar situation.

However, today, connectionist AI faces two challenges that symbolic AI can address. Firstly, it is not able to mobilise reasoning as symbolic AI can. Secondly, connectionist AI is still very complex to characterize, as it is able to achieve relevant results through statistical models without understanding the symbolic concept of the latter.

By hybriding these two approaches, researchers are trying to take advantage of the best of both types of AI, by developing a hybrid AI system built around a contextual learning process. Thus the system could be able to generalize by learning how the model should be structured based on what it perceives of a real situation (connectionist AI). But the system would also have the ability to mobilise reasoning (symbolic AI) from what the system has perceived in order to make the most appropriate decision for the situation⁸.