Using Automatic Speech Recognition to Measure the Intelligibility of Speech Synthesized from Brain Signals

Tech ID: 33459 / UC Case 2023-555-0

Abstract

Researchers at the University of California, Davis have developed a computer-based method to evaluate/quantify the intelligibility of speech synthesized by a brain-computer interface or other speech prosthesis.

Full Description

Neurological injuries such as a stroke or amyotrophic lateral sclerosis (ALS) can result in speech disorders whereby an individual cannot properly communicate. One emerging solution for injured or speech-impaired populations is to bypass damaged parts of the nervous system using a brain-computer interface (BCI) which provides a direct communication pathway between the brain and an external device. Such a device can directly decode/interpret the individual's intended/targeted speech from brain activity normally used to move the tongue, jaw, lips, voice box, and diaphragm. Closely related types of speech prostheses may instead generate speech from non-invasively measured biosignals, for example muscle activity of retained movement after laryngectomy or for silent speech applications. A current limitation of these technologies is that there are no established metrics in place to gauge/quantify how well the speech prosthesis is functioning (e.g., is the quality of the synthesized speech intelligible/understood by an intended listener). This problem is particularly acute if there’s no ground truth target voice to compare the synthesized voice to, but rather just the text of the intended speech.

Researchers at UC Davis have developed a novel system, a deep-learning based, automated speech recognition (ASR) technique to access/evaluate how intelligible speech is. The developed ‘Artificial Intelligence (AI) Listener’ can generate a numeric score to quantify the quality and understandability of the speech generated by a BCI, other speech-producing technology, or natural (but potentially dysarthric) speech. The synthesized speech signals of interest (e.g., originating from the brain via a BCI) are input into an ASR model for analysis. The ASR generates an intelligibility metric by comparing the intended/targeted speech sequence with the synthesized output speech pattern. The system can produce a variety of intelligibility assessment metrics, ranging from wholistic phrase word error rates down to phoneme error rates and lower-level acoustic feature scores.

Applications

Using automated/computerized speech recognition to assess speech intelligibility.
Speech restoration in hearing-impaired individuals or those affected by neurologic injuries/diseases resulting in speech disorders.
Quantifying and comparing the performance of different voice synthesis technologies and algorithms.
Providing a cost function for training voice synthesis algorithms Setting objective targets for device performance, for example for clinical trial efficacy or reimbursement criteria.

Features/Benefits

Generation of a numeric score to gauge the quality of BCI-synthesized speech, with benchmarks to compare this score to human listener assessments.
Applications in evaluating the quality of non-BCI-synthesized speech (e.g., for consumer or non-invasive medical applications.)
Measurement of speech improvement in an individual over time.

Patent Status

Country	Type	Number	Dated	Case
Patent Cooperation Treaty	Published Application	WO 2024/186818	09/12/2024	2023-555

Additional Patent Pending

Contact

Learn About UC TechAlerts - Save Searches and receive new technology matches

Inventors

Brandman, David
Miller, Lee M.
Stavisky, Sergey
Varshney, Suvi

Other Information

Keywords

speech, automated speech recognition, brain-computer interface, artificial intelligence, deep-learning, computerized speech, neurodegenerative disease, neurotechnology, speech intelligibility, neuroprosthesis

Categorized As

Medical
- Devices
Engineering
- Other