Smooth talker like an AI

Artificial intelligences learn to speak thanks to “language models”. The simplest models allow the autocompletion function on the smartphone: they suggest the next word. But the prowess and progress of the most modern language models such as GPT-3, LaMDA, PaLM or ChatGPT are breathtaking, with for example computer programs capable of writing in the style of a given poet, simulating people deceased, explaining jokes, translating languages, and even producing and correcting computer code – which would have been unthinkable just a few months ago. To do this, the models are based on increasingly complex models of neurons.

[Près de 80 000 lecteurs font confiance à la newsletter de The Conversation pour mieux comprendre les grands enjeux du monde. Abonnez-vous aujourd’hui]

When artificial intelligences speak indiscriminately

That said, the models are more superficial than these examples lead us to believe. We compared stories generated by language models to stories written by humans and found them to be less coherent, yet engaging, and less surprising than stories written by humans.

More importantly, we can show that current language models have problems even with simple reasoning tasks. For example, when we ask:

“The lawyer visited the doctor; did the doctor visit the lawyer? »

…simple language models tend to say yes. GPT3 even replies that the lawyer did not visit the doctor. One possible reason we are exploring is that these language models encode word positions symmetrically, and therefore they do not distinguish between “before the verb” and “after the verb”, which complicates the distinction between subject and object in a sentence.

Additionally, the theoretical limitations of “transformer”-based language models mean that they cannot distinguish even and odd sequences of a certain element, if these are interspersed with another element. In practice, this means that models cannot solve a task we call the “pizza task” – a simple riddle of the form:

“The light is off. I press the light switch. I’m eating a pizza. I press the light switch. Is the light on? »

Here, an even sequence of light switches means the light is off, but a BERT model fails to learn it. The most powerful models currently (GPT-3 and chatGPT) categorically refuse to conclude that the light is off.

[Près de 80 000 lecteurs font confiance à la newsletter de The Conversation pour mieux comprendre les grands enjeux du monde. Abonnez-vous aujourd’hui]

Today’s language models also have difficulty with negation, and generally do poorly on reasoning tasks as soon as these are more complex. For example, consider the following riddle from China’s national civil servant exam:

David knows Mr. Zhang’s friend, Jack, and Jack knows David’s friend, Ms. Lin. Everyone who knows Jack has a master’s degree, and everyone who knows Ms. Lin is from Shanghai. Who is from Shanghai and has a master’s degree? »

Current models only respond correctly in 45% of cases and chatGPT refuses to respond… while the best human performance is 96%.

The model does not know that he does not know

Most likely, the main drawback of current language models is that they are made for making analogies, but not for memorizing exactly too large a set. For example, they cannot remember a list of a million customers or a set of ten thousand proteins. The model, at its own discretion, will forget some of the items. An even bigger problem is that he can also, again at his own discretion, invent new stuff (a process aptly called “hallucination”). This is a fundamental consequence of machine learning techniques of language models, optimized for interpolation and generalization, not for memorization.

The problem is that it’s impossible to tell from the output of a language model whether an answer was memorized or invented. For example, when GPT-3 is asked where Alan Turing died, the model deadpans answers “Warrington”, instead of the correct answer (Wilmslow). ChatGPT answers this question correctly, but if asked about lesser-known researchers, like the authors of this article, it makes up facts. And when asked more complicated questions, he provides elegant, but false answers:

“Is the addition of two odd numbers odd? »

… the model politely replies:

“Yes, when you add two odd numbers, the result is always an odd number. This is because an odd number can be expressed as 2n+1, where n is an integer. When you add two numbers in this form, the result is (2n+1)+(2m+1) = 2(n+m)+2, which is also in the form 2p+1, where p is an integer. This means that the result is an odd number. »

On the form, the answer is very convincing, but false.

With these examples, we want to show that it is currently imprudent to rely on a language model to reason or make decisions. Models get better with time, know more things, and know more and more about refraining from responding when they don’t have the information. However, apart from simple questions, a language model can easily invent an answer and with an equally invented and approximate explanation or proof.

Other methods excellent at reasoning about exact facts

All of this is not to say that language models wouldn’t be amazing tools with mind-blowing capabilities. Nor is it to say that language models can never overcome these challenges, or that other methods of deep learning will not be developed for this purpose. Rather, it is to say that at the time of this writing, in 2022, language models are not the tool of choice for reasoning or for storing exact data.

For these functions, the preferred tool currently remains “symbolic representations”: databases, knowledge bases and logic. These representations store data not implicitly, but as sets of entities (such as people, commercial products, or proteins) and relationships between those entities (such as who bought what, what contains what, etc.). Logical rules or constraints are then used to reason about these relationships in a way that is proven correct – although usually disregarding probabilistic information. Such reasoning was for example used in 2011 by the computer Watson, during the game Jeopardy to answer the following question:

“Who is the Spanish king whose portrait, painted by Titian, was stolen with a weapon from an Argentinian museum in 1987? »

Indeed, the question can translate into applicable logic rules on a knowledge base, and only King Philip II can match. Language models currently do not know how to answer this question, probably because they cannot memorize and manipulate enough knowledge (links between known entities).

A very simple example of a “knowledge graph”. These objects make it possible to connect concepts and entities. They are widely used by search engines and social networks.
Fuzheado/Wikidata, CC BY-SA

It’s probably no coincidence that the same large companies that build some of the most powerful language models (Google, Facebook, IBM) also build some of the largest knowledge bases. These symbolic representations are today often constructed by extracting information from a text in natural language, ie an algorithm tries to create a knowledge base by analyzing press articles or an encyclopedia. The methods that are used for this are in this case the language models. In this case, language models are not the end goal, but a way to build knowledge bases. They are suitable for this because they are very noise resistant, both in their training data and in their inputs. They are therefore very well suited to deal with ambiguous or noisy inputs, which are ubiquitous in human language.

Language models and symbolic representations are complementary: language models excel at parsing and generating natural language text. Symbolic methods are the tool of choice when it comes to storing exact items and reasoning about them. An analogy with the human brain can be instructive: some tasks are easy enough for the human brain to perform them unconsciously, intuitively, in a few milliseconds (reading simple words or entering the sum “2 + 2”); but abstract operations require laborious, conscious and logical thinking (eg memorizing telephone numbers, solving equations or determining the price/quality ratio of two washing machines).

Daniel Kahneman has dichotomized this spectrum into “System 1” for subconscious reasoning and “System 2” for effortful reasoning. With current technology, it seems that language models solve “System 1” problems. Symbolic representations, on the other hand, are suitable for “System 2” problems. At least for the moment, therefore, it appears that both approaches have their raison d’être. Moreover, a whole spectrum between the two remains to be explored. Researchers are already exploring the coupling between language models and databases and some see the future in merging neural and symbolic models into “neurosymbolic” approaches.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *