There has been another scientific breakthrough first, brought to us by Colombia neuroengineers. They have harnessed the power of speech synthesizers and artificial intelligence with the idea that it could lead to new ways for computers to communicate directly with the brain. By doing so, they created a system that translates thought into intelligible, recognizable speech.
This technology can reconstruct the words a person hears, with unprecedented clarity, by monitoring their brain activity. Their research has laid the groundwork for helping people who cannot speak, such as those living with as amyotrophic lateral sclerosis (ALS) or recovering from stroke, to regain their ability to communicate with the outside world. The study findings were published in Scientific Reports.
Nima Mesgarani, Ph.D., the paper’s senior author and a principal investigator at Columbia University’s Mortimer B. Zuckerman Mind Brain Behavior Institute, said:
“Our voices help connect us to our friends, family and the world around us, which is why losing the power of one’s voice due to injury or disease is so devastating. With today’s study, we have a potential way to restore that power. We’ve shown that, with the right technology, these people’s thoughts could be decoded and understood by any listener.”
When people speak, or even just imagine speaking, telltale patterns of activity appear in their brain. That is already known knowledge from decades of previous research. What they’ve also found by now is that distinct, but recognizable, patterns of signals also emerge when we listen to someone speak, or imagine listening.
Therefore, speaking, imagining speaking, listening to someone speaking, or imagining listening, all produce a pattern of signals. Experts through the years have tried to record and decode these patterns. They see a future in which thoughts need not remain hidden inside the brain—but instead could be translated into verbal speech at will. Although it was one thing to theorize doing this, it was another thing altogether to actually do this. The feat was much more challenging than expected.
Dr. Mesgarani is an associate professor of electrical engineering at Columbia’s Fu Foundation School of Engineering and Applied Science. At first, to decode the brain signals, he and others focused on simple computer models that analyzed spectrograms – which are visual representations of sound frequencies.
When that didn’t work well (it failed to produce anything resembling intelligible speech) Dr. Mesgarani’s team turned to a vocoder – which is a computer algorithm that can synthesize speech after being trained on recordings of people talking. Dr. Mesgarani explains, “this is the same technology used by Amazon Echo and Apple Siri to give verbal responses to our questions.”
Above: representation of researchers’ new approach to audio reconstruction that uses a vocoder and deep neural network model.
Dr. Mesgarani teamed up with Ashesh Dinesh Mehta MD, Ph.D. to teach the vocoder to interpret brain activity. Dr. Mehta is a neurosurgeon at Northwell Health Physician Partners Neuroscience Institute and co-author of today’s paper, who treats epilepsy patients, some of whom must undergo regular surgeries. Dr. Mesgarani said:
“Working with Dr. Mehta, we asked epilepsy patients already undergoing brain surgery to listen to sentences spoken by different people, while we measured patterns of brain activity. These neural patterns trained the vocoder.”
After training the vocoder, the process of producing the recording was as follows:
- They asked those same patients to listen to speakers reciting digits between 0 to 9.
- As they did this, the researchers recorded the patient’s brain signals so that the recordings could be run through the vocoder.
- They then used neural networks – which are a type of artificial intelligence that mimics the structure of neurons in the biological brain – to analyze and clean up the sound produced by the vocoder. (The sounds the vocoder made were in response to the signals it was fed from the recordings of the brain patterns.)
- The final result was a robotic-sounding voice reciting a sequence of numbers.
Dr. Mesgarani and his team wanted to test the clarity and accuracy of the recording. To do so they asked individuals, who didn’t know what the recording was supposed to be saying, to listen to the recording and report what they heard. An unbelievable ¾ of the people understood the recording. Dr. Mesgarani said:
“We found that people could understand and repeat the sounds about 75% of the time, which is well above and beyond any previous attempts. The sensitive vocoder and powerful neural networks represented the sounds the patients had originally listened to with surprising accuracy.”
The team is moving forwards with this idea by testing more complicated words and sentences first; then, by modifying the experiment a little so that rather than listening to someone speaking, they will have the patient think about speaking. They hope that someday soon their system could be a part of an implant, similar to those worn by some epilepsy patients, that translates the wearer’s thoughts directly into words. Dr. Mesgarani explained:
“In this scenario, if the wearer thinks ‘I need a glass of water,’ our system could take the brain signals generated by that thought, and turn them into synthesized, verbal speech…This would be a game changer. It would give anyone who has lost their ability to speak, whether through injury or disease, the renewed chance to connect to the world around them.”