Interpretable Machine Learning for Antepartum Prediction of Pregnancy-Associated Thrombotic Microangiopathy Using Routine Longitudinal Laboratory Data
Chuanchuan Sun, Zhen Yu, Qin Fan, Qingchao Chen, Feng Yu
TLDR
Machine learning predicts pregnancy-associated thrombotic microangiopathy (P-TMA) antepartum using routine longitudinal lab data with high accuracy.
Key contributions
- An interpretable ML model was developed to predict pregnancy-associated thrombotic microangiopathy (P-TMA).
- The model utilized 146 longitudinal laboratory predictors from 300 pregnancies.
- Gradient boosting achieved an AUROC of 0.872 and AUPRC of 0.883 on a held-out test set.
- Cystatin C at week 6 was identified as a promising early monitoring indicator for P-TMA risk.
Why it matters
Early prediction of P-TMA is crucial due to its life-threatening nature and diagnostic challenges. This paper demonstrates that machine learning can effectively identify subtle, time-dependent risk signatures from routine lab data. This approach offers a significant step towards earlier intervention and improved patient outcomes.
Original Abstract
Background: Pregnancy-associated thrombotic microangiopathy (P-TMA) is rare but life-threatening. Early risk prediction before overt clinical presentation remains challenging, as the associated laboratory abnormalities are subtle, multidimensional, and frequently masked by common physiological changes such as gestational thrombocytopenia and pregnancy-related proteinuria, thus overlapping heavily with benign obstetric and renal conditions. This complexity is poorly captured by univariate or rule-based approaches; however, it is addressable by machine learning, which can extract latent, time-dependent risk signatures from longitudinal clinical tests. Methods: This retrospective study included 300 pregnancies comprising 142 P-TMA cases and 158 controls. After exclusion of identifiers and non-informative variables, 146 longitudinal laboratory predictors were retained. Participants were divided into a training cohort (80%) and a held-out test cohort (20%) using stratified sampling. Five algorithms were evaluated: logistic regression, support vector machine with radial basis function kernel, random forest, extra trees, and gradient boosting. The final model was selected by mean cross-validated AUROC, refitted on the full training cohort, and evaluated once in the held-out test cohort. Interpretability analyses examined global feature importance and distributional patterns of leading predictors. Results: Gradient boosting was prespecified by cross-validation in the training cohort. The model achieved an AUROC of 0.872 (95% CI: 0.769-0.952) and an AUPRC of 0.883 (95% CI: 0.780-0.959) in a held-out test cohort, with sensitivity of 0.750 and specificity of 0.812. Conclusions: Longitudinal clinical laboratory tests obtained during routine care contained informative and clinically plausible signals for P-TMA risk. Notably, cystatin C at week 6 showed promise as an early monitoring indicator.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.