Patterns in Individual Blood Count Trajectories in the UK Biobank Characterise Disease-Specific Signatures and Anticipate Pan-Cancer Risk
Riya Nagar, Abicumaran Uthamacumaran, Adelaide de Vecchi, Hector Zenil
TLDR
This paper uses machine learning on longitudinal Complete Blood Count (CBC) data from the UK Biobank to identify disease-specific patterns and anticipate pan-cancer risk early.
Key contributions
- Analyzes longitudinal Complete Blood Count (CBC) data to identify disease-specific patterns.
- Applies machine learning to detect disease signatures from CBC trajectories before symptom onset.
- Demonstrates CBC markers contribute the majority of predictive signal for various diseases, including pan-cancer.
- Suggests routine CBC monitoring with ML can advance precision healthcare and predictive medicine.
Why it matters
This research shows how readily available Complete Blood Count (CBC) tests, combined with machine learning, can reveal early disease patterns. It offers a scalable and cost-effective approach to predictive medicine, potentially improving early diagnosis and personalized healthcare on a mass scale.
Original Abstract
We investigate the longitudinal behaviour of blood markers from common haematological tests as a marker of disease and as a function of disease progression in a variety of conditions including cancer, cardiovascular disease, and infections. We study confounding and non-confounding factors to allow for the earlier detection of disease and conditions based on their longitudinal signatures from biomarker patterns commonly measured in popular and scalable common blood tests across routine clinical tests, in particular the Complete Blood Count (CBC or FBC). Our analysis with normalised temporal profiles and machine learning techniques even before any symptoms appear demonstrates that analyte-group patterns found in blood testing are disease sensitive and disease specific. We demonstrate that CBC markers contribute to the majority of the predictive signal, while biochemistry and other blood panels provide only a modest additional gain mostly associated to very the individual disease for which the test was designed (e.g. CRP, liver enzymes, blood sugar). Our results demonstrate how regular monitoring, computational intelligence, and machine learning applied to longitudinal CBC data can converge to uncover disease patterns, advancing the potential for precision healthcare and predictive medicine on a mass scale leveraging an existing and pervasive blood test.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.