Author Identifier (ORCID)

Ahmad Mohsin: https://orcid.org/0000-0001-9023-0851

Iqbal H. Sarker: https://orcid.org/0000-0003-1740-5517

Abstract

Artificial Intelligence, particularly machine learning (ML) algorithms, plays a crucial role in detecting cyberattacks, including anomalies and intrusions. However, machine learning models trained on imbalanced cybersecurity datasets often struggle to accurately detect minority data instances and potential threats, thereby weakening overall system security. Despite extensive research, a persistent challenge is the inadequate explanation for model predictions concerning minority data classes. This study aims to address these limitations by developing a generative AI-based approach to manage minority classes in anomaly detection, incorporating concept drift handling and explainability analysis. We introduce an over-sampling technique, CGGReaT, designed to enhance the presence of minority classes in the anomaly detection domain. Leveraging Large Language Models (LLMs) as a hybrid approach, we use pre-trained transformer-based LLM DistilGPT-2 for generating synthetic tabular data. Extensive experiments on two publicly available benchmark datasets, UNSW NB15 and CIC-IDS2017, underscore the efficacy of our proposed approach. We employed concept drift detection and adaptation techniques to maintain reliable and sustainable ML performance. To enhance interpretability, eXplainable Artificial Intelligence (XAI) methods, including SHAP and LIME, are employed to quantify feature contributions to model outputs. Extensive experiments reveal that testing ML algorithms on datasets balanced with synthetic samples generated by cGGReaT boosts the prediction accuracy on the UNSW NB15 and CIC-IDS2017 datasets, compared to classifiers tested on imbalanced datasets.

Keywords

Anomaly detection, concept drift, cybersecurity, data imbalance, explainable AI, LLM

Document Type

Journal Article

Date of Publication

5-1-2026

Volume

44

Issue

2

Publication Title

New Generation Computing

Publisher

Springer

School

Centre for Securing Digital Futures

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Comments

Mwiga, K. J., Dida, M. A., Mohsin, A., & Sarker, I. H. (2026). A generative AI method for minority class handling in anomaly detection with drift and explainability analysis. New Generation Computing, 44. https://doi.org/10.1007/s00354-026-00318-8

Share

 
COinS
 

Link to publisher version (DOI)

10.1007/s00354-026-00318-8