ArXiv TLDR

Adapting Large Language Models to a Low-Resource Agglutinative Language: A Comparative Study of LoRA and QLoRA for Bashkir

🐦 Tweet
2605.04948

Mullosharaf K. Arabov, Svetlana S. Khaybullina

cs.CL

TLDR

This paper compares LoRA and QLoRA for adapting LLMs to Bashkir, finding QLoRA on 7B models offers a strong quality-cost trade-off.

Key contributions

  • Compared LoRA and QLoRA for adapting LLMs to Bashkir, a low-resource agglutinative language.
  • QLoRA on 7B models achieved perplexity comparable to full fine-tuning with 40x fewer trainable parameters.
  • Found that QLoRA-tuned models produced more coherent, monolingual Bashkir than the lowest perplexity model.

Why it matters

This research provides crucial insights into adapting LLMs for low-resource languages like Bashkir. It demonstrates that QLoRA offers an excellent balance of quality and computational efficiency. Furthermore, it highlights that perplexity alone isn't sufficient for evaluating model coherence.

Original Abstract

This paper presents a comparative study of parameter-efficient fine-tuning (PEFT) methods, including LoRA and QLoRA, applied to the task of adapting large language models to the Bashkir language, a low-resource agglutinative language of the Turkic family. Experimental evaluation is conducted on a Bashkir text corpus of 71k documents (46.9M tokens) using models of various architectures: DistilGPT2, GPT-2 (base, medium), Phi-2, Qwen2.5-7B, DeepSeek-7B, and Mistral-7B. To improve the reliability of results, each configuration was trained with three different random seeds. The lowest perplexity on the test set was obtained for GPT-2 medium with full fine-tuning (3.34). Meanwhile, QLoRA applied to Mistral-7B (3.79) and Phi-2 (3.81) achieved comparable quality with over 40 times fewer trainable parameters. However, we also observed cases of significant quality degradation when using PEFT for certain architectures (e.g., DeepSeek-7B with rank 8, perplexity = 129.55), indicating that the outcome depends critically on the choice of the base model and its tokenizer. Additionally, a qualitative analysis of generated texts based on Bashkir prompts revealed that models with the best perplexity do not necessarily produce the most coherent outputs: QLoRA-tuned models generated monolingual Bashkir continuations, whereas the fully fine-tuned model with the lowest perplexity frequently switched to English. The results suggest that QLoRA on 7B-scale models offers an effective compromise between quality and computational cost for Bashkir. To ensure reproducibility, open data, code, and trained adapters will be released upon acceptance.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.