ArXiv TLDR

Temporal Data Requirement for Predicting Unplanned Hospital Readmissions

🐦 Tweet
2605.00738

Ramin Mohammadi, Vahab vahdat, Sarthak Jain, Amir T. Namin, Ramya Palacholla + 1 more

cs.LG

TLDR

This study reveals optimal historical data time windows for predicting hospital readmissions differ significantly between structured EHR data and unstructured clinical notes.

Key contributions

  • Investigates optimal EHR data windows (day of surgery to 3 years) for 30-day readmission prediction.
  • Finds unstructured notes perform best with 3-6 months of data, while structured data needs up to 12 months.
  • Challenges the assumption that more historical data always improves predictive model accuracy.

Why it matters

This paper provides crucial, modality-specific guidelines for selecting historical data windows in readmission prediction models. It optimizes resource use by showing that 'more data' isn't always 'better data,' especially for clinical notes. These findings can significantly improve model accuracy and efficiency in healthcare.

Original Abstract

With the proliferation of Electronic Health Records (EHRs), a critical challenge in building predictive models is determining the optimal historical data time window to maximize accuracy. This study investigates the impact of various observation windows ranging from the day of surgery to three years prior on predicting 30-day readmission following hip and knee arthroplasties. The dataset encompasses both structured encounter records (over 4 million) and unstructured clinical notes (80,000) from 7,174 patients. To extract meaning from the clinical notes, we employed a suite of non neural (BOW, count BOW, TF IDF, LDA) and neural encoders (BERT, 1D CNN, BiLSTM, Average). We subsequently evaluated models utilizing clinical notes alone, structured data alone, and a combination of both modalities. Our results demonstrate that the optimal time window for unstructured clinical notes is significantly shorter than for structured data, maximum predictive performance was achieved using notes from just three to six months prior to surgery. In contrast, performance using structured data improved as the time window lengthened, but strictly plateaued after twelve months. These modality-specific temporal patterns remained consistent regardless of model complexity or encoder type. Ultimately, these findings challenge the general assumption that more historical data inherently yields better machine learning predictions, establishing targeted time-window guidelines for optimizing readmission prediction models.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.