|
The first version "Relevant Alarms Detection v1" contains the data exploration and analysis of the alarms, and feature engineering.
We will label an alarm as irrelevant if it is cleared within a short period of time, denoted as "n". The value of "n" should ideally be chosen by a domain expert. For the purpose of this study, we will use "n = 7 minutes".
|
- Modeling:
- Split the data into 80% training and 20% testing sets.
- Trained a baseline Random Forest Classifier and evaluated its performance using precision, recall, f1-score, and accuracy metrics.
- Applied Stratified Shuffle Split to handle class imbalance and re-evaluated the model.
- Fine-tuned the model using Bayesian optimization to improve performance.
- Addressed class imbalance using SMOTE to oversample the minority class (Relevant alarms).
- Re-trained and evaluated the model post-oversampling, achieving significant performance improvements.
- Feature Importance Analysis:
- Analyzed feature importance from the Random Forest model, highlighting the key contributors: Severity, Technical ID, FM Receive Time, and First Occurrence.
- Documented insights on the impact of each feature on model predictions.
- Results and Conclusions:
- Achieved an F1-score, recall, precision, and accuracy of 96% with the oversampled model.
- Recommended future improvements, including the collection of more labeled data from domain experts to enhance model training.
|