From Standard English to Singlish: A Retrieval-Augmented Approach for Code-Switched Creole Generation in Large Language Models
Foong Ming Lai, Yujin Tan, Han Meng, Yi-Chieh Lee
TLDR
A RAG framework generates code-switched Singlish by externalizing dialectal knowledge, achieving naturalness with minimal edits and high semantic preservation.
Key contributions
- Proposes a RAG framework for controlled code-switching in LLMs without fine-tuning.
- Externalizes dialectal knowledge into a lexicon, enabling sparse lexical substitution for creole generation.
- Human evaluation confirms RAG's generated Singlish is as natural and appropriate as zero-shot prompting.
- RAG achieves minimal edits (median 1) and high semantic preservation (0.978) compared to zero-shot.
Why it matters
This paper introduces a RAG framework for generating code-switched creoles like Singlish, tackling limited data and rapid evolution. It provides a controlled, auditable, and semantically preserved method for LLMs, crucial for evolving contact varieties.
Original Abstract
Code-switching in contact varieties like Singaporean English (Singlish) challenges natural language generation due to limited parallel data and rapid lexical evolution. We propose a retrieval-augmented generation (RAG) framework that externalizes dialectal knowledge into a curated lexicon, enabling controlled lexical code-switching without fine-tuning. Our approach retrieves candidate Singlish expressions and guides generation through sparse lexical substitution. Human evaluation with 164 Singaporean participants found RAG and zero-shot prompting equally natural and appropriate. Automatic analyses reveal different transformation regimes: zero-shot prompting induces extensive paraphrasing (median 23 token edits), whereas RAG performs minimal substitutions (median 1 edit) with higher semantic preservation (mean cosine similarity 0.978 vs. 0.926). Our results demonstrate that externalizing code-switching into lexical resources enables control and auditability without sacrificing perceived quality, offering practical advantages for rapidly evolving contact varieties.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.