LLM StructCore: Schema-Guided Reasoning Condensation and Deterministic Compilation
TLDR
LLM StructCore employs a two-stage, schema-guided reasoning and deterministic compilation pipeline for accurate, contract-driven clinical CRF filling.
Key contributions
- Introduces LLM StructCore, a two-stage pipeline for accurate clinical Case Report Form (CRF) filling.
- Stage 1 uses an LLM for schema-guided reasoning, generating a concise 9-key JSON summary.
- Stage 2 is a deterministic, 0-LLM compiler for normalization, filtering, and 134-item expansion.
- Achieves competitive macro-F1 scores (e.g., 0.63 EN) and demonstrates language-agnostic performance.
Why it matters
Clinical CRF filling is hard due to noisy data and strict output. LLM StructCore uses a two-stage pipeline: LLM reasoning then deterministic compilation. This improves accuracy, reliability, and reduces costly false positives.
Original Abstract
Automatically filling Case Report Forms (CRFs) from clinical notes is challenging due to noisy language, strict output contracts, and the high cost of false positives. We describe our CL4Health 2026 submission for Dyspnea CRF filling (134 items) using a contract-driven two-stage design grounded in Schema-Guided Reasoning (SGR). The key task property is extreme sparsity: the majority of fields are unknown, and official scoring penalizes both empty values and unsupported predictions. We shift from a single-step "LLM predicts 134 fields" approach to a decomposition where (i) Stage 1 produces a stable SGR-style JSON summary with exactly 9 domain keys, and (ii) Stage 2 is a fully deterministic, 0-LLM compiler that parses the Stage 1 summary, canonicalizes item names, normalizes predictions to the official controlled vocabulary, applies evidence-gated false-positive filters, and expands the output into the required 134-item format. On the dev80 split, the best teacher configuration achieves macro-F1 0.6543 (EN) and 0.6905 (IT); on the hidden test200, the submitted English variant scores 0.63 on Codabench. The pipeline is language-agnostic: Italian results match or exceed English with no language-specific engineering.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.