Automated Batch Distillation Process Simulation for a Large Hybrid Dataset for Deep Anomaly Detection

April 10, 20262604.09166

Jennifer Werner, Justus Arweiler, Indra Jungjohann, Jochen Schmid, Fabian Jirasek + 2 more

cs.LG

TLDR

This paper introduces a novel hybrid dataset for deep anomaly detection in batch distillation, combining experimental data with automated simulations.

Key contributions

Developed a novel Python simulator for automated batch distillation process simulation.
Created a large, fully annotated hybrid dataset by combining experimental and simulated data.
Automated translation of experimental records into simulation scenarios with accurate prediction.

Why it matters

Deep learning for anomaly detection in chemical processes requires large, diverse datasets, which are often unavailable. This paper addresses this gap by providing a unique, openly available hybrid dataset. It will advance research in deep AD methods and simulation-to-experiment style transfer.

Original Abstract

Anomaly detection (AD) in chemical processes based on deep learning offers significant opportunities but requires large, diverse, and well-annotated training datasets that are rarely available from industrial operations. In a recent work, we introduced a large, fully annotated experimental dataset for batch distillation under normal and anomalous operating conditions. In the present study, we augment this dataset with a corresponding simulation dataset, creating a novel hybrid dataset. The simulation data is generated in an automated workflow with a novel Python-based process simulator that employs a tailored index-reduction strategy for the underlying differential-algebraic equations. Leveraging the rich metadata and structured anomaly annotations of the experimental database, experimental records are automatically translated into simulation scenarios. After calibration to a single reference experiment, the dynamics of the other experiments are well predicted. This enabled the fully automated, consistent generation of time-series data for a large number of experimental runs, covering both normal operation and a wide range of actuator- and control-related anomalies. The resulting hybrid dataset is released openly. From a process simulation perspective, this work demonstrates the automated, consistent simulation of large-scale experimental campaigns, using batch distillation as an example. From a data-driven AD perspective, the hybrid dataset provides a unique basis for simulation-to-experiment style transfer, the generation of pseudo-experimental data, and future research on deep AD methods in chemical process monitoring.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers