Less Approximates More: Harmonizing Performance and Confidence Faithfulness via Hybrid Post-Training for High-Stakes Tasks
Haokai Ma, Lee Yan Zhen, Gang Yang, Yunshan Ma, Ee-Chien Chang + 1 more
TLDR
HyTuning improves LLM accuracy and confidence faithfulness for high-stakes tasks by adaptively combining reasoning distillation and internal feedback.
Key contributions
- Addresses confidence faithfulness in LLMs for high-stakes tasks, preventing severe real-world harm.
- Introduces Progressive Reasoning Gain (PRG) to measure how reasoning steps strengthen answer support.
- Proposes HyTuning, a hybrid post-training framework adaptively reweighting RD and RLIF.
- Leverages scarce supervised reasoning traces and abundant unlabeled queries for scalability.
Why it matters
This paper tackles the critical issue of LLMs being confidently wrong in high-stakes applications, which can cause significant harm. By improving both accuracy and confidence faithfulness, HyTuning makes LLMs more reliable for sensitive tasks. This is crucial for safe and trustworthy AI deployment.
Original Abstract
Large language models are increasingly deployed in high-stakes tasks, where confident yet incorrect inferences may cause severe real-world harm, bringing the previously overlooked issue of confidence faithfulness back to the forefront. A promising solution is to jointly optimize unsupervised Reinforcement Learning from Internal Feedback (RLIF) with reasoning-trace-guided Reasoning Distillation (RD), which may face three persistent challenges: scarcity of high-quality training corpora, factually unwarranted overconfidence and indiscriminate fusion that amplifies erroneous updates. Inspired by the human confidence accumulation from uncertainty to certainty, we propose Progressive Reasoning Gain (PRG) to measure whether reasoning steps progressively strengthen support for the final answer. Furthermore, we introduce HyTuning, a hybrid post-training framework that adaptively reweights RD and RLIF via a PRG-style metric, using scarce supervised reasoning traces as a stable anchor while exploiting abundant unlabeled queries for scalability. Experiments on several domain-specific and general benchmarks demonstrate that HyTuning improves accuracy while achieving confidence faithfulness under limited supervision, supporting a practical "Less Approximates More" effect.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.