Scientists say they have developed a method that uses functional magnetic resonance imaging brain recordings to reconstruct continuous speech. These results are the next step in the search for better brain-computer interfaces, which are being developed as assistive technology for people who cannot speak, write or type text.
In an edition published September 29 on bioRxiv, a team from the University of Texas Austin presents a “decoder”, or algorithm, capable of “reading” the words that a person hears or thinks during a brain scan by functional magnetic resonance imaging (fMRI). While other teams have previously reported some success in reconstructing language or images from signals from brain implants, the new decoder is the first to use a non-invasive method to achieve this. If you had asked any cognitive neuroscientist in the world twenty years ago if it was doable, they would have walked you out of the room laughing, says Alexander Huth, a neuroscientist at the University of Texas Austin and co-author of the study.
Yukiyasu Kamitani, a computer science neuroscientist at Kyoto University who was not involved in the research, writes about this method that it is exciting to see intelligible language sequences generated by a non-invasive decoder. This study…lays a solid foundation for brain-computer interface applications, he adds.
It is difficult to use fMRI data for this type of research because it is rather slow compared to the speed of human thought. Instead of detecting the firing of neurons, which occurs on the scale of milliseconds, MRI machines measure changes in blood flow in the brain as indicators of brain activity; these changes take a few seconds. According to Huth, the setup used in this research works because the system does not decode language word for word, but rather discerns the higher meaning of a sentence or thought.
Huth and his colleagues trained their algorithm using fMRI brain recordings taken while three study subjects (one woman and two men, all between the ages of 20 and 30) listened to 16 hours of podcasts and radio reports. To build an accurate and widely applicable decoder, Huth says it was important for research subjects to listen to a wide range of media. He notes that the amount of fMRI data collected matches most other studies using fMRI recordings, although his had fewer research subjects.
Based on its learning of the 16 hours of fMRI recordings of the individual’s brain, the decoder made a series of predictions about what the fMRI readings would look like. According to Huth, using these guesses is key to ensuring the decoder is able to translate thoughts that are unrelated to any of the known audio recordings used in training. These guesses were then compared to the real-time fMRI recording, and the prediction closest to the actual reading determined which words the decoder ultimately generated.
To determine the degree of success of the decoder, the researchers noted the similarity between the generation of the decoder and the stimulus presented to the subject. They also noted the language generated by the same decoder which had not been compared to an fMRI recording. They then compared these scores and tested the statistical significance of the difference between the two.
Limits of the method
The results indicate that the algorithm’s guessing and checking process eventually generates a complete story from the fMRI recordings, which Huth says matches the real story told in the audio recording quite well. However, this method has some shortcomings; for example, it fails to retain pronouns and often confuses the first and third person. According to Huth, the decoder knows quite precisely what is going on, but not who is doing things.
Sam Nastase, a researcher and lecturer at the Princeton Institute for Neuroscience, who was not involved in this research, says the use of fMRI recordings for this kind of brain decoding is mind-boggling because such data is usually very slow. and noisy. “What they show in this paper is that if you have a smart enough modeling framework, you can actually extract a surprising amount of information from fMRI recordings,” he says.
Because the decoder uses non-invasive fMRI brain recordings, Huth believes its potential for real-world application is greater than that of invasive methods, although the cost and inconvenience of using MRI machines are an obvious challenge. Magnetoencephalography, another non-invasive but more portable and time-accurate brain imaging technique than fMRI, could be used with a similar computational decoder to provide people without language impairments with a method of communication, he adds.
The decoder opens a window on the functioning of the human brain
According to Professor Huth, the most exciting element of the decoder’s success is the insight it provides into how the brain works. For example, he notes, the results reveal which parts of the brain are responsible for creating meaning. By using the decoder on recordings of specific areas such as the prefrontal cortex or the parital temporal cortex, the team was able to determine which part represented which semantic information. In particular, the team found that these two parts of the brain represented the same information for the decoder and that the latter worked equally well with recordings from one or the other of the regions of the brain.
Most surprisingly, adds Huth, the decoder was able to reconstruct stimuli that did not use semantic language, even though it was trained on subjects listening to spoken language. For example, after training, the algorithm succeeded in reconstructing the meaning of a silent film viewed by the subjects, as well as the imaginary experience of a participant telling a story. The fact that these elements overlap so much in the brain is something we are only just beginning to appreciate, he explains.
For Kamitani and Nastase, the results from Huth’s lab, which have not yet been peer-reviewed, raise questions about how decoders process underlying meaning in relation to textual language. or voice. Since the new decoder detects meaning, or semantics, rather than individual words, its success can be difficult to measure, as many combinations of words could be considered a good output, Nastase explains. It’s an interesting problem that they introduce, he adds.
The issue of privacy protection
Huth acknowledges that, for some, technology that can read minds can be a little scary. He says his team thought deeply about the implications of the research and, out of concern for people’s privacy, considered whether the decoder could work without the willing cooperation of the participant. In some trials, while the audio was playing, the researchers asked the subjects to distract themselves by performing other mental tasks, such as counting, naming and imagining animals, and imagining telling another story. They found that naming and imagining animals was the most effective in making decoding inaccurate.
From a privacy perspective, it’s also worth noting that a decoder trained from one person’s brain scans couldn’t reconstruct another person’s language, Huth explains, meaning that the study provided virtually no usable information. A person would therefore have to go through extensive training sessions before his thoughts could be decoded accurately.
For Nastase, the fact that researchers have been looking for evidence of privacy in the mental domain is encouraging. We could very well have published this article six months ago without any of these experiments in protecting privacy,” he says. However, he adds that he is not convinced that the authors have definitively demonstrated that privacy will not be an issue in the future, since future research may eventually find ways around the obstacles to mental privacy described by Researchers. The question is whether the advantages of such technology outweigh the possible disadvantages, concludes Nastase.
What is your opinion on the subject?
See as well :
AI: it is now possible to transform brain activity into words, by establishing a neural pattern specific to each person
Brain implant successfully translated paralyzed man’s thoughts into text with 94% accuracy, study finds
Researchers demonstrate the first human use of a high-bandwidth wireless brain-computer interface, a breakthrough for people with paralysis
The Speech2Face AI can reconstruct a person’s face based on their voice, it needs at least three seconds of speech to generate a face