Author Identifier (ORCID)
Ahmad Mohsin: https://orcid.org/0000-0001-9023-0851
Iqbal H. Sarker: https://orcid.org/0000-0003-1740-5517
Abstract
Artificial Intelligence, particularly machine learning (ML) algorithms, plays a crucial role in detecting cyberattacks, including anomalies and intrusions. However, machine learning models trained on imbalanced cybersecurity datasets often struggle to accurately detect minority data instances and potential threats, thereby weakening overall system security. Despite extensive research, a persistent challenge is the inadequate explanation for model predictions concerning minority data classes. This study aims to address these limitations by developing a generative AI-based approach to manage minority classes in anomaly detection, incorporating concept drift handling and explainability analysis. We introduce an over-sampling technique, CGGReaT, designed to enhance the presence of minority classes in the anomaly detection domain. Leveraging Large Language Models (LLMs) as a hybrid approach, we use pre-trained transformer-based LLM DistilGPT-2 for generating synthetic tabular data. Extensive experiments on two publicly available benchmark datasets, UNSW NB15 and CIC-IDS2017, underscore the efficacy of our proposed approach. We employed concept drift detection and adaptation techniques to maintain reliable and sustainable ML performance. To enhance interpretability, eXplainable Artificial Intelligence (XAI) methods, including SHAP and LIME, are employed to quantify feature contributions to model outputs. Extensive experiments reveal that testing ML algorithms on datasets balanced with synthetic samples generated by cGGReaT boosts the prediction accuracy on the UNSW NB15 and CIC-IDS2017 datasets, compared to classifiers tested on imbalanced datasets.
Keywords
Anomaly detection, concept drift, cybersecurity, data imbalance, explainable AI, LLM
Document Type
Journal Article
Date of Publication
5-1-2026
Volume
44
Issue
2
Publication Title
New Generation Computing
Publisher
Springer
School
Centre for Securing Digital Futures
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Comments
Mwiga, K. J., Dida, M. A., Mohsin, A., & Sarker, I. H. (2026). A generative AI method for minority class handling in anomaly detection with drift and explainability analysis. New Generation Computing, 44. https://doi.org/10.1007/s00354-026-00318-8