A Machine Learning Framework for Turbofan Health Estimation via Inverse Problem Formulation
Milad Leyli-Abadi, Lucas Thil, Sebastien Razakarivony, Guillaume Doquet, Jesse Read
TLDR
This paper introduces a new dataset and benchmarks machine learning and self-supervised methods for turbofan health estimation, addressing real-world complexities.
Key contributions
- Introduces a new turbofan health dataset with realistic degradation and maintenance patterns.
- Establishes a benchmark comparing steady-state, nonstationary, and Bayesian filter models.
- Explores self-supervised learning (SSL) for health estimation without true health labels.
- Provides a practical lower bound on problem difficulty using unsupervised representations.
Why it matters
This paper addresses the challenging turbofan health estimation problem, crucial for predictive maintenance, by introducing a new realistic dataset. It benchmarks various ML and self-supervised methods, providing a foundational step for future research and highlighting the need for advanced inference strategies.
Original Abstract
Estimating the health state of turbofan engines is a challenging ill-posed inverse problem, hindered by sparse sensing and complex nonlinear thermodynamics. Research in this area remains fragmented, with comparisons limited by the use of unrealistic datasets and insufficient exploration of the exploitation of temporal information. This work investigates how to recover component-level health indicators from operational sensor data under realistic degradation and maintenance patterns. To support this study, we introduce a new dataset that incorporates industry-oriented complexities such as maintenance events and usage changes. Using this dataset, we establish an initial benchmark that compares steady-state and nonstationary data-driven models, and Bayesian filters, classic families of methods used to solve this problem. In addition to this benchmark, we introduce self-supervised learning (SSL) approaches that learn latent representations without access to true health labels, a scenario reflective of real-world operational constraints. By comparing the downstream estimation performance of these unsupervised representations against the direct prediction baselines, we establish a practical lower bound on the difficulty of solving this inverse problem. Our results reveal that traditional filters remain strong baselines, while SSL methods reveal the intrinsic complexity of health estimation and highlight the need for more advanced and interpretable inference strategies. For reproducibility, both the generated dataset and the implementation used in this work are made accessible.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.