ArXiv TLDR

Retrieval-Augmented Reasoning for Chartered Accountancy

🐦 Tweet
2605.00257

Jatin Gupta, Akhil Sharma, Saransh Singhania, Ali Imam Abidi

cs.CLcs.AIcs.IR

TLDR

CA-ThinkFlow is a parameter-efficient RAG framework that matches large LLMs on complex Indian Chartered Accountancy tasks using a 14B model.

Key contributions

  • Introduces CA-ThinkFlow, a parameter-efficient RAG framework for complex financial tasks.
  • Combines a 14B, 4-bit-quantized model with a layout-aware document extraction system.
  • Achieves performance comparable to GPT-4o and Claude 3.5 Sonnet on the CA-Ben benchmark.

Why it matters

This paper introduces CA-ThinkFlow, a significant step towards making advanced AI accessible for specialized financial tasks like Chartered Accountancy. By achieving high performance with a smaller, efficient model, it addresses resource constraints in real-world applications. This opens doors for broader AI adoption in regulated financial sectors.

Original Abstract

The inception of Large Language Models (LLMs) has catalyzed AI adoption in the finance sector, yet their reliability in complex, jurisdiction-specific tasks like Indian Chartered Accountancy (CA) remains limited. The models display difficulty in executing numerical tasks which require multiple steps while also needing advanced knowledge about legal regulations and the method of scaling their operations is not feasible in settings which have limited access to resources. We present CA-ThinkFlow as a parameter-efficient Retrieval-Augmented Generation (RAG) framework which operates with a 14B, 4-bit-quantized reasoning model, 14B-DeepSeek-R1, and a layout-aware Docling extraction system which maintains document structure during extraction. CA-ThinkFlow uses a basic RAG method which automatically adds retrieved information into the prompt, while it depends on the model's built-in Chain-of-Thought (CoT) functions to create context and produce correct answers. The system we developed system operates at performance levels which match large proprietary models when we tested it on the multi-level CA-Ben benchmark, achieving Scholastic Reliability Coefficient (SRC) results which equal 68.75\% of GPT-4o and Claude 3.5 Sonnet. The framework shows high efficiency and strength in handling parameters, but essential reasoning abilities fail to process complex regulatory texts which exist in fields such as Taxation.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.