Automated Batch Distillation Process Simulation for a Large Hybrid Dataset for Deep Anomaly Detection
Jennifer Werner, Justus Arweiler, Indra Jungjohann, Jochen Schmid, Fabian Jirasek + 2 more
TLDR
This paper introduces a novel hybrid dataset for deep anomaly detection in batch distillation, combining experimental data with automated simulations.
Key contributions
- Developed a novel Python simulator for automated batch distillation process simulation.
- Created a large, fully annotated hybrid dataset by combining experimental and simulated data.
- Automated translation of experimental records into simulation scenarios with accurate prediction.
Why it matters
Deep learning for anomaly detection in chemical processes requires large, diverse datasets, which are often unavailable. This paper addresses this gap by providing a unique, openly available hybrid dataset. It will advance research in deep AD methods and simulation-to-experiment style transfer.
Original Abstract
Anomaly detection (AD) in chemical processes based on deep learning offers significant opportunities but requires large, diverse, and well-annotated training datasets that are rarely available from industrial operations. In a recent work, we introduced a large, fully annotated experimental dataset for batch distillation under normal and anomalous operating conditions. In the present study, we augment this dataset with a corresponding simulation dataset, creating a novel hybrid dataset. The simulation data is generated in an automated workflow with a novel Python-based process simulator that employs a tailored index-reduction strategy for the underlying differential-algebraic equations. Leveraging the rich metadata and structured anomaly annotations of the experimental database, experimental records are automatically translated into simulation scenarios. After calibration to a single reference experiment, the dynamics of the other experiments are well predicted. This enabled the fully automated, consistent generation of time-series data for a large number of experimental runs, covering both normal operation and a wide range of actuator- and control-related anomalies. The resulting hybrid dataset is released openly. From a process simulation perspective, this work demonstrates the automated, consistent simulation of large-scale experimental campaigns, using batch distillation as an example. From a data-driven AD perspective, the hybrid dataset provides a unique basis for simulation-to-experiment style transfer, the generation of pseudo-experimental data, and future research on deep AD methods in chemical process monitoring.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.