Researchers at the University of California, Davis have developed a computer-based method to synthesize continuous speech from biosignals, including brain activity, in real-time.
Neurological injuries such as a stroke or amyotrophic lateral sclerosis (ALS) can result in speech disorders whereby an individual cannot properly communicate. One emerging solution for speech-impaired populations is to bypass damaged parts of the nervous system using a brain-computer interface (BCI) which provides a direct communication pathway between the brain and an external device. Such a device can directly decode/interpret the individual's intended/targeted speech from brain activity that is normally used to control the tongue, jaw, lips, voice box diaphragm, and other muscles associated with speech. To date, speech BCIs have achieved accurate output of the user’s intended words as text, thanks in part to reducing the complexity of the problem to predicting phonemes and the statistical power of language models. However, these current text-first BCI devices do not accurately capture the full expressive range of speech patterns and the full richness and instantaneous communication abilities of the human voice.
Researchers at UC Davis have developed a novel computer-based technique to train a speech decoder to instantaneously output the intended voice of individuals who cannot speak. The developed training technique obtains biosignals from an individual attempting to generate a speech pattern/sequence using a sensor (e.g., video, audio, intracortical) data. Using a developed speech synthesis model, a synthetic target speech signal is generated that is representative of the user’s intended/targeted speech sequence. The biosignals data and generated synthetic target speech signal are then aligned, which allows them used together to train causal algorithms (e.g., using deep learning) that accurately map biosignals to the targeted speech sequence/signal. This voice synthesis approach works with very low latency and is capable of providing intelligible, naturalistic voice even for a BCI user with anarthria who is unable to provide any ground truth speech data for algorithm training.
Patent Pending