Controllable Spoken Dialogue Generation: An LLM-Driven Grading System for K-12 Non-Native English Learners

April 24, 20262604.22542

Haidong Yuan, Haokun Zhao, Wanshi Xu, Songjun Cao, Qingyu Zhou + 2 more

cs.CLcs.AI

TLDR

This paper introduces an LLM-driven grading system (DDPO) to generate proficiency-aligned spoken dialogues for K-12 non-native English learners.

Key contributions

Introduces a proficiency-aligned LLM framework for K-12 non-native English learners, adaptable to curricula.
Develops a four-tier grading system for precise control over lexical complexity in generated dialogues.
Releases new resources: graded vocabulary lists and a multi-turn dialogue corpus for English learning.
Proposes DDPO, a GRPO-based algorithm, to optimize dialogue quality and diversity for pedagogical value.

Why it matters

This paper addresses the critical challenge of LLMs failing to meet K-12 non-native English learners' pedagogical needs due to proficiency mismatch. It offers a scalable, personalized speaking practice platform, crucial for learners in non-immersive environments.

Original Abstract

Large language models (LLMs) often fail to meet the pedagogical needs of K-12 English learners in non-native contexts due to a proficiency mismatch. To address this widespread challenge, we introduce a proficiency-aligned framework that adapts LLM outputs to learner abilities, using China's national curriculum (CSE) as a representative case. Our framework enables precise control over lexical complexity through a four-tier grading system, supported by a comprehensive suite of new resources: graded vocabulary lists and a multi-turn dialogue corpus. Our core technical contribution is the \textbf{DDPO} algorithm,Diversity Driven Policy Optimization, a multi-turn GRPO-based approach designed to preserve dialogue diversity while holistically optimizing dialogue quality. This method significantly outperforms conventional approaches, achieving low out-of-vocabulary rates and high diversity while enhancing conversational naturalness and pedagogical value. While grounded in the CSE, our framework is designed for flexibility and can be readily adapted to other educational standards. Our models, data, and code will all be open-sourced, providing a scalable platform for personalized English speaking practice that effectively addresses the unique challenges faced by K-12 learners in non-immersive environments.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers