This is the code and dataset repository for Smart-Manufacturing (SM) Anomaly Detection (AD) that supports a paper in preparation:
"Anomaly detection in Smart-manufacturing era: A review"
Faults and unscheduled stops resulting from machine breakdowns in manufacturing processes have a substantial impact on both global direct and indirect production costs, as well as compliance with established delivery deadlines. With the continuous expansion of smart manufacturing environments, where shared data are readily available in real time, the implementation of anomaly detection strategies has become increasingly necessary. However, this task has also become more complex due to the multitude of diverse scenarios and methods that need to be considered. The paper presents a comprehensive state-of-the-art review, supported by experiments conducted on a repository of real manufacturing datasets included here. We provide a practical classification framework specifically tailored to smart manufacturing environments, which describes recent and successful methods and algorithms that have been applied to real scenarios. Furthermore, we introduce an experimental evaluation by executing state-of-the-art anomaly detection algorithms, aimed at gaining practical insights into the most suitable methods for various manufacturing environments.
SmartManuAD is an effort of the Department of Statistics, Informatics and Mathematics of the Public University of Navarra (UPNA), in a collaborative work between Miguel Pagola, PhD professor in Computer Science and Artificial Intelligence at the UPNA, and Iñaki Elía, PhD student in Artificial Intelligence as well as Robotics & Automation Engineer for different Aerospace companies.
SM is a specific application of the Industrial Internet of Things (IIoT) that utilizes embedded sensors to collect operational data and integrate it with physical and digital processes within factories. Through the analysis of data streams across entire factories or multiple facilities, manufacturing engineers and data analysts can, among other purposes, identify faults or anomalies in different processes, diagnose their potential causes, assess the health of systems or prognose the future state of health based on their current condition.
As shown in the following image, we established 5 categories related to SM concepts and 6 categories associated with common manufacturing processes or systems. Our intention was to map publications and datasets to a reformulated classification that is more focused on the emerging SM scenarios.
Our research evaluates 16 state-of-the-art AD algorithms on 29 publicly available real-world datasets in the field of Smart-Manufacturing. These datasets were selected from a pool of 100 publicly available ones, covering various industrial domains, such as additive and subtractive manufacturing, machinery and equipment, automotive, aerospace, rail, energy, logistics, chemicals, electronics, semiconductors, and multi-stage mass manufacturing, among others. The 29 chosen datasets contain real anomalies, which were extrapolated from the information provided by the sources, processed, and curated to create balanced datasets for the analysis.
As an example of the results obtained, see the following figure that shows the AUC-ROC and AUC-PR metrics for the Motors & Drivetrains category and for the 16 selected algorithms.
We use important anomaly detection libraries for our study, including anomaly detection for tabular (PyOD) and time-series data (TODS). We also use ADBench as the basis for our work with the intention of extending the benchmark of AD algorithms to the SM domain. Please refer to the links above for more information.
The code used from ADBench is written in Python 3 (see related dependencies). We simply tuned a notebook ("SmartManuAD_Test_Notebook.ipynb") to help with the specifics of Smart-Manufacturing environments.
You can find below the list of 29 publicly SM datasets used in our research. They include real anomalies. In the datasets folder you can find the expanded list of 100 collected datasets, though most of them do not include real anomalies, but we injected 10% synthetic ones. Both lists include links to the original dataset sources so that you can get the original metadata for the datasets.
The datasets are processed and converted to the Numpy .npz file format, which is a compressed archive of files. This also allows you to work with some of the utilities and code introduced in ADBench.
ID | Transformed Name | Link to source | Description | Original_Format | Economic Activity | Industrial Sector | SM_Category | SM_Subcategory | Labels | Data_Type | # Processed Instances | # Processed Features | # Anomalies | % Anomalies |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | cfrp | Anonymous | CFRP (Carbon Fiber Reinforcement Polymer): Automatic Fiberplacement fabrication priocess for aerostructures. | CSV | Aerospace | CFRP aerostructures manufacturing | Cutting & Additive Manufacturing | Additive Manufacturing | Real | Time Series Multivariate | 52.268 | 49 | 262 | 0,50% |
14 | PlantMonitoring | link | Production Plant Data for Condition Monitoring | CSV | Machinery & Equipment | Plant generic fabrication | Syst. Integration & Multistage manufacturing | Multistage manufacturing | Real | Time Series Multivariate | 18.429 | 22 | 1.394 | 7,56% |
15 | Multistage | link | Multi-stage continuous-flow manufacturing process | CSV | Machinery & Equipment | Machining manufacturing | Syst. Integration & Multistage manufacturing | Multistage manufacturing | Real | Time Series Multivariate | 14.088 | 41 | 13.050 | 92,63% |
18 | Robotfail | link | Robot Execution Failures Data Set This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque samples collected at regular time intervals. | CSV | Machinery & Equipment | Robotics | Robotics & AGVs | Robots | Real | Tabular Multivariate | 164 | 89 | 17 | 10,37% |
21 | Engine | link | Public D3M datasets: Engine (Simple engine data): | CSV | Automotive components | Automotive Motors | Motors & Drivetrains | Motors | Real | Time Series Multivariate | 383 | 6 | 126 | 32,90% |
31 | Milling | link | NASA - Mill Data Set: | MAT | Machinery & Equipment | Machining manufacturing | Cutting & Additive Manufacturing | CNC Machining | Real | Time Series Multivariate | 117.000 | 6 | 27.000 | 23,08% |
35 | IMS | link | IMS bearing defect data provided by the University of Cincinnati The vibration data were obtained by attaching an accelerometer to the four bearings connected to the shaft. Each dataset comprised three test datasets, each of which was the test-tofailure experimental data of the bearing. The data were recorded at intervals of 1 s for 10 min at a sampling rate of 20 kHz. | ASCII | Machinery & Equipment | Machinery & Equipment | Motors & Drivetrains | Bearings | Real | Time Series Multivariate | 29.154.880 | 8 | 6.654.880 | 22,83% |
36 | PHM | link | PRONOSTIA Bearing Dataset The PRONOSTIA (also called FEMTO) bearing dataset consists of 17 accelerated run-to-failures on a small bearing test rig. Both acceleration and temperature data was collected for each experiment. The dataset was used in the 2012 IEEE Prognostic Challenge. The dataset is from FEMTO-ST Institute in France. | CSV | Machinery & Equipment | Machinery & Equipment | Motors & Drivetrains | Bearings | Real | Time Series Multivariate | 7.175.680 | 2 | 11.750 | 0,16% |
42 | Motorcondition1 | link | Condition Monitoring Dataset (AI4I 2021) - HTW Berlin Measured Time-Series data for 8 different operating conditions. | CSV | Machinery & Equipment | Machinery & Equipment | Motors & Drivetrains | Motors | Real | Time Series Multivariate | 24.000 | 3 | 6.000 | 25,00% |
43 | Motorcondition2 | link | Condition Monitoring Dataset (AI4I 2021) - HTW Berlin Measured frequency data for 8 different operating conditions. | CSV | Machinery & Equipment | Machinery & Equipment | Motors & Drivetrains | Motors | Real | Time Series Multivariate | 2.000 | 169 | 500 | 25,00% |
47 | FordB | link | FordB: This data was originally used in a competition in the IEEE World Congress on Computational Intelligence, 2008. The classification problem is to diagnose whether a certain symptom exists or does not exist in an automotive subsystem. | TXT | Automotive components | Automotive Motors | Motors & Drivetrains | Motors | Real | Time Series Multivariate | 1.818.000 | 1 | 930.000 | 51,16% |
54 | CNCMachining | link | CNC Machining Data: The dataset provided is a collection of real-world industrial vibration data collected from a brownfield CNC milling machine. The acceleration has been measured using a tri-axial accelerometer (Bosch CISS Sensor) mounted inside the machine. The X- Y- and Z-axes of the accelerometer have been recorded using a sampling rate equal to 2 kHz. | H5 | Machinery & Equipment | Machining manufacturing | Cutting & Additive Manufacturing | CNC Machining | Real | Time Series Multivariate | 99.399 | 3 | 38.983 | 39,22% |
55 | Boschline | link | Bosch Production Line Performance: A good chocolate soufflé is decadent, delicious, and delicate. But, it's a challenge to prepare. When you pull a disappointingly deflated dessert out of the oven, you instinctively retrace your steps to identify at what point you went wrong. | CSV | Machinery & Equipment | Chocolate packaging line | Syst. Integration & Multistage manufacturing | Multistage manufacturing | Real | Time Series Multivariate | 1.183.747 | 158 | 6.879 | 0,58% |
65 | MachineryFault | link link | Machinery Fault Dataset - Induction Motor Faults Database: The used SpectraQuest Inc. Alignment/Balance Vibration Trainer (ABVT) Machinery Fault Simulator (MFS). | CSV | Automotive components | Induction Motors | Motors & Drivetrains | Bearings | Real | Time Series Multivariate | 2.250.000 | 8 | 1.250.000 | 55,56% |
66 | Cuttingblade | link | One Year Industrial Component Degradation - Degration of a cutting blade: This dataset contains the machine data of a degrading component recorded over the duration of 12 month total. It was initiated in the European research and innovation project IMPROVE. | CSV | Food and beverage packaging | Beverage packaging line | Syst. Integration & Multistage manufacturing | Multistage manufacturing | Real | Time Series Multivariate | 12.288 | 7 | 6.144 | 50,00% |
71 | Multimachine | link | Multiple Machine sensors data | CSV | Machinery & Equipment | Machining manufacturing | Syst. Integration & Multistage manufacturing | Multistage manufacturing | Real | Time Series Multivariate | 220.320 | 50 | 14.484 | 6,57% |
72 | CNCturning | link | CNC turning: roughness, forces and tool wear: Made in the COMPETENCE CENTER IN MANUFACTURING (CCM), a laboratory of the AERONAUTICS INSTITUTE OF TECHNOLOGY (ITA). | CSV | Machinery & Equipment | Machining manufacturing | Cutting & Additive Manufacturing | CNC Machining | Real | Tabular Multivariate | 612 | 20 | 96 | 15,69% |
76 | UCIAccelerometer | link | Dataser de Acelerómetro - UCI Datasets Accelerometer data from vibrations of a cooler fan with weights on its blades. It can be used for predictions, classification and other tasks that require vibration analysis, especially in engines. | CSV | Machinery & Equipment | Electric Motors | Motors & Drivetrains | Motors | Real | Time Series Multivariate | 153.000 | 3 | 102.000 | 66,67% |
80 | UCIRobotExectFail | link | Robot Execution Failures Data Set - UCI Datasets This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque samples collected at regular time intervals. | TXT | Machinery & Equipment | Robotics | Robotics & AGVs | Robots | Real | Time Series Multivariate | 458 | 90 | 291 | 63,54% |
84 | VersatileProduction | link | Versatile Production: Popcorn production process data with multiple process steps. | CSV | Food and beverage packaging | Popcorn packaging line | Syst. Integration & Multistage manufacturing | Multistage manufacturing | Real | Tabular Multivariate | 10.529 | 73 | 8 | 0,08% |
86 | HighStorageSystem | link | High Storage System Anomaly Detection: Storage test rig process data for anomaly detection. | CSV | Machinery & Equipment | Plant generic fabrication | Syst. Integration & Multistage manufacturing | Multistage manufacturing | Real | Time Series Multivariate | 49.552 | 18 | 5.670 | 11,44% |
88 | GenesisPickPlace | link | Genesis Pick-and-Place Demonstrator: Material sorting test rig process data for anomaly detection. | CSV | Machinery & Equipment | Manipulador Pick & Place | Robotics & AGVs | Robots | Real | Tabular Multivariate | 16.220 | 18 | 50 | 0,31% |
91 | PlantFaultDetection | link | Plant Fault Detection - PHM 2015: Anonymized process data for plant fault detection. | CSV | Machinery & Equipment | Plant generic fabrication | Syst. Integration & Multistage manufacturing | Multistage manufacturing | Real | Tabular Multivariate | 672.530 | 7 | 201.854 | 30,01% |
94 | ROSIndustrialArmAnomaly | link | ROS Anomaly Detector package - Anomaly Detection for Industrial Arm Applications: The ROS Anomaly Detector Module (ADM) is designed to execute alongside industrial robotic arm tasks to detect unintended deviations at the application level. The ADM utilizes a learning based technique to achieve this. The process has been made efficient by building the ADM as a ROS package that aptly fits in the ROS ecosystem. | CSV | Machinery & Equipment | Robotics | Robotics & AGVs | Robots | Real | Tabular Multivariate | 20.221 | 6 | 4.692 | 23,20% |
95 | CWRUBearing | link | CWRU Bearing Data: Bearing test rig accelerometer data for fault detection. | MAT | Machinery & Equipment | Bearings | Motors & Drivetrains | Bearings | Real | Time Series Multivariate | 1.210.422 | 2 | 242.616 | 20,04% |
97 | MFPT | link | Condition Based Maintenance Fault Database for Testing of Diagnostic and Prognostics Algorithms | MAT | Machinery & Equipment | Oil Pumps | Motors & Drivetrains | Bearings | Real | Time Series Univariate | 1.318.356 | 2 | 146.468 | 11,11% |
98 | DieselEngineFaults | link | Diesel Engine Faults Features: Fault detection based on pressure curves and vibration. | MAT | Automotive components | Automotive Motors | Motors & Drivetrains | Motors | Real | Time Series Univariate | 3.500 | 97 | 3.250 | 92,86% |
99 | TurningChatterDiagnosis | link | Turning Dataset for Chatter Diagnosis Using Machine Learning: Sensory data of a turning test rig and varying strengths of chatter. | MAT | Machinery & Equipment | Machining manufacturing | Cutting & Additive Manufacturing | CNC Machining | Real | Time Series Multivariate | 6.400.000 | 5 | 3.200.000 | 50,00% |