Synthesizing Speech From Neural Activity

Tech ID: 33460 / UC Case 2023-556-0

Abstract

Researchers at the University of California, Davis have developed a computer-based method to synthesize continuous speech from biosignals, including brain activity, in real-time.

Full Description

Neurological injuries such as a stroke or amyotrophic lateral sclerosis (ALS) can result in speech disorders whereby an individual cannot properly communicate. One emerging solution for speech-impaired populations is to bypass damaged parts of the nervous system using a brain-computer interface (BCI) which provides a direct communication pathway between the brain and an external device. Such a device can directly decode/interpret the individual's intended/targeted speech from brain activity that is normally used to control the tongue, jaw, lips, voice box diaphragm, and other muscles associated with speech. To date, speech BCIs have achieved accurate output of the user’s intended words as text, thanks in part to reducing the complexity of the problem to predicting phonemes and the statistical power of language models. However, these current text-first BCI devices do not accurately capture the full expressive range of speech patterns and the full richness and instantaneous communication abilities of the human voice.

Researchers at UC Davis have developed a novel computer-based technique to train a speech decoder to instantaneously output the intended voice of individuals who cannot speak. The developed training technique obtains biosignals from an individual attempting to generate a speech pattern/sequence using a sensor (e.g., video, audio, intracortical) data. Using a developed speech synthesis model, a synthetic target speech signal is generated that is representative of the user’s intended/targeted speech sequence. The biosignals data and generated synthetic target speech signal are then aligned, which allows them used together to train causal algorithms (e.g., using deep learning) that accurately map biosignals to the targeted speech sequence/signal. This voice synthesis approach works with very low latency and is capable of providing intelligible, naturalistic voice even for a BCI user with anarthria who is unable to provide any ground truth speech data for algorithm training.

Applications

  • Synthesis/generation of continuous speech from neural data or other biosignals 
  • Real-time speech facilitation

Features/Benefits

  • Biosignal to synthetic speech conversion within 10 milliseconds 
  • Real-time speech generation in individuals incapable of speech (e.g., stroke patients, ALS patients), or those that cannot speak in an intelligible manner 
  • Speech restoration allows an individual to not only speak but to sing or convey a tone

Patent Status

Patent Pending

Contact

Learn About UC TechAlerts - Save Searches and receive new technology matches

Inventors

  • Brandman, David
  • Stavisky, Sergey
  • Wairagkar, Maitreyee

Other Information

Categorized As