Methods For Dysfluent Speech Transcription And Detection

Tech ID: 33377 / UC Case 2024-062-0

Patent Status

Country Type Number Dated Case
United States Of America Published Application 20250246187 07/31/2025 2024-062
 

Brief Description

Researchers at UC Berkeley have pioneered a systematic AI framework titled Hierarchical Unconstrained Disfluency Modeling (H-UDM) to solve a long-standing bottleneck in speech processing. While traditional speech-to-text systems struggle with disfluencies—such as repetitions, fillers, and stutters—this approach treats disfluent speech as a multi-layered hierarchical problem. By integrating transcription and detection into a single unified model, H-UDM eliminates the historical dependency on labor-intensive manual annotation of speech errors. This hierarchical extension allows the AI to better understand the structural context of "broken" speech, significantly improving accuracy for speech therapy applications and language learning tools where precise feedback on fluency is critical.

Advantages

  • Reduced Data Requirements: The H-UDM approach significantly lowers the need for expensive, human-labeled datasets by utilizing its hierarchical structure to infer disfluencies.

  • Dual-Task Efficiency: Unlike previous models that only transcribe or only detect, this framework performs both tasks simultaneously, ensuring the transcription is contextually aware of the detected errors.

  • Systematic AI Framework: Provides a formalized, scalable method for modeling disfluent speech where previous solutions were fragmented or ineffective.

  • High Reliability: Experimental results demonstrate that the hierarchical approach is more robust across different types of disfluency compared to standard linear speech models.

  • Seamless Integration: Designed to be compatible with existing automated speech recognition (ASR) pipelines, making it a viable "drop-in" enhancement for current software.


Suggested uses

  • Digital Speech Therapy: Powering remote health platforms that provide real-time feedback and progress tracking for individuals with speech disorders or stutters.

  • Language Learning (ESL): Enhancing educational software by identifying specific fluency gaps and disfluency patterns in non-native speakers to guide personalized practice.

  • Inclusive Voice User Interfaces (VUI): Improving the reliability of smart assistants (like Alexa or Siri) for users with diverse speech patterns or neurological conditions.

  • Automated Transcription Services: Increasing the readability of raw interview transcripts or court proceedings by accurately detecting and tagging "um," "ah," and repeated phrases.

  • Clinical Diagnostics: Assisting medical professionals in the early detection of cognitive decline or neurological disorders through the automated analysis of speech fluency.


Related Materials

Contact

Learn About UC TechAlerts - Save Searches and receive new technology matches

Inventors

  • Anumanchipalli, GopalaKrishna

Other Information

Categorized As