Patent Pending
Accurate automatic pronunciation assessment, particularly the core subtask of phonetic error detection, is significantly hampered by speech variability stemming from accents and dysfluencies, which current models fail to capture effectively. This innovation, developed by UC Berkeley researchers, addresses this by disclosing a verbatim phoneme recognition framework specifically designed to transcribe what speakers actually say rather than what they are supposed to say . The framework uses multi-task training combined with novel phoneme similarity modeling. The present disclosure also includes the development and open-sourcing of VCTK-accent, a simulated dataset containing phonetic errors, and proposes two novel metrics for assessing pronunciation differences. This work establishes a new, more accurate benchmark for phonetic error detection, enabling more precise and effective articulatory feedback for pronunciation training.
To power automatic pronunciation assessment (APA) systems for language learning and speech therapy. To provide phonetic error-based articulatory feedback to language learners. For transcribing speech more accurately by accounting for accents and dysfluencies. As the new benchmark for phonetic error detection research and development.
Achieves verbatim phoneme recognition, capturing what is actually said. Effectively handles speech variability from accents and dysfluencies. Employs multi-task training and novel phoneme similarity modeling for enhanced accuracy. Includes the VCTK-accent simulated dataset and two novel metrics for assessment. Establishes a new benchmark for the field of phonetic error detection.