Budget-Aware Uncertainty for Radiotherapy Segmentation QA Using nnU-Net
Ricardo Coimbra Brioso, Lorenzo Mondo, Damiano Dei, Nicola Lambri, Pietro Mancosu + 2 more
TLDR
This paper proposes a budget-aware uncertainty framework for radiotherapy segmentation QA using nnU-Net, improving error detection for manual review.
Key contributions
- Develops a budget-aware QA framework for radiotherapy segmentation using nnU-Net and uncertainty quantification.
- Generates voxel-wise uncertainty maps based on predictive entropy to guide targeted manual review of segmentations.
- Demonstrates that calibrated checkpoint-based inference significantly improves uncertainty-error alignment for QA.
Why it matters
Accurate radiotherapy segmentation is critical but time-consuming. This framework provides reliable cues for model errors, enabling efficient, targeted manual review. It improves safety and reduces workload in clinical settings by focusing human effort where it's most needed.
Original Abstract
Accurate delineation of the Clinical Target Volume (CTV) is essential for radiotherapy planning, yet remains time-consuming and difficult to assess, especially for complex treatments such as Total Marrow and Lymph Node Irradiation (TMLI). While deep learning-based auto-segmentation can reduce workload, safe clinical deployment requires reliable cues indicating where models may be wrong. In this work, we propose a budget-aware uncertainty-driven quality assurance (QA) framework built on nnU-Net, combining uncertainty quantification and post-hoc calibration to produce voxel-wise uncertainty maps (based on predictive entropy) that can guide targeted manual review. We compare temperature scaling (TS), deep ensembles (DE), checkpoint ensembles (CE), and test-time augmentation (TTA), evaluated both individually and in combination on TMLI as a representative use case. Reliability is assessed through ROI-masked calibration metrics and uncertainty--error alignment under realistic revision constraints, summarized as AUC over the top 0-5% most uncertain voxels. Across configurations, segmentation accuracy remains stable, whereas TS substantially improves calibration. Uncertainty-error alignment improves most with calibrated checkpoint-based inference, leading to uncertainty maps that highlight more consistently regions requiring manual edits. Overall, integrating calibration with efficient ensembling seems a promising strategy to implement a budget-aware QA workflow for radiotherapy segmentation.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.