Historically, speech therapy apps have relied primarily on online cloud-based speech recognition systems like those used in digital assistants (Cortana, Siri, Google Assistant), which are designed to best guess speech rather than critically evaluate articulation errors. For children with cleft palate specifically, affecting 1 in 700 babies globally, speech therapy is essential follow-up care after reconstructive surgery. Approximately 25% of children with clefts use compensatory articulation errors, and when these patterns become habituated during ages 3-5, they become particularly resistant to change in therapy. Traditional approaches to mobile speech therapy apps have included storybook-style narratives that proved expensive with low replayability and engagement, as well as fast-paced arcade-style games that failed to maintain user interest. Common speech therapy applications require a facilitator to evaluate speech performance and typically depend on continuous internet connectivity, creating barriers for users in areas with poor network coverage or those concerned about data privacy and roaming costs. The shift toward gamified therapy solutions showed that game elements can serve as powerful motivators for otherwise tedious activities. Speech recognition systems face inherent limitations in accuracy compared to cloud-based solutions and require substantial processing power and memory that can impact device performance and battery life, particularly on older mobile devices. Automatic speech recognition (ASR) models struggle significantly with children's speech due to non-fluent pronunciation and variability in speech patterns, with phoneme error rates reaching almost 12%, and consonant recognition errors affecting the reliability of speech disorder detection. The challenge becomes even more pronounced for populations with speech impairments, as conventional ASR systems are optimized for typical adult speech rather than atypical articulation patterns of cleft palate speech or developmental disabilities. Moreover, maintaining user engagement over extended therapy periods is hard, and many apps fail to provide sufficient motivation for daily practice, which is essential for speech improvement.