Children's English Reading Story Generation via Supervised Fine-Tuning of Compact LLMs with Controllable Difficulty and Safety

May 13, 20262605.13709

Qian Shen, Fanghua Cao, Min Yao, Shlok Gilda, Bonnie J. Dorr + 1 more

cs.CLcs.AIcs.LG

TLDR

Fine-tuning compact 8B LLMs with expert curricula generates children's English stories with controllable difficulty and safety, outperforming larger models.

Key contributions

Fine-tuned compact 8B LLMs to generate English reading stories for children.
Leveraged an expert-designed curriculum and stories from GPT-4o/Llama 3.3 70B for SFT.
Achieved controllable difficulty and high safety in the generated children's stories.
Outperformed zero-shot GPT-4o and Llama 3.3 70B on difficulty-related metrics.

Why it matters

This paper addresses the challenge of generating age-appropriate and safe children's stories using LLMs, which often produce overly difficult content and are costly. By fine-tuning compact models, it offers an affordable and controllable solution for educators and parents. This allows for wider adoption in classrooms and homes.

Original Abstract

Large Language Models (LLMs) are widely applied in educational practices, such as for generating children's stories. However, the generated stories are often too difficult for children to read, and the operational cost of LLMs hinders their widespread adoption in educational settings. We used an existing expert-designed children's reading curriculum and its corresponding generated stories from GPT-4o and Llama 3.3 70B to design different experiments for fine-tuning three 8B-parameter LLMs, which then generated new English reading stories that were subjected to quantitative and qualitative evaluation. Our method prioritizes controllability over scale, enabling educators to target reading levels and error patterns with a compact, affordable model. Our evaluation results show that with appropriate fine-tuning designs, children's English reading stories generated by 8B LLMs perform better on difficulty-related metrics than those from zero-shot GPT-4o and Llama 3.3 70B, with almost no discernible safety issues. Such fine-tuned LLMs could be more broadly used by teachers, parents, and children in classrooms and at home to generate engaging English reading stories with children's interests, controllable difficulty and safety.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers