Document Type

Journal Article

Publication Title

ACM Transactions on Management Information Systems


Association for Computing Machinery


School of Science




This is an Author's Accepted Manuscript of: Rashid, A. N. M. B., Ahmed, M., Sikos, L. F., & Haskell-Dowland, P. (2022). Anomaly detection in cybersecurity datasets via cooperative co-evolution-based feature selection. ACM Transactions on Management Information Systems, 13(3), article 29.


Anomaly detection from Big Cybersecurity Datasets is very important; however, this is a very challenging and computationally expensive task. Feature selection (FS) is an approach to remove irrelevant and redundant features and select a subset of features, which can improve the machine learning algorithms’ performance. In fact, FS is an effective preprocessing step of anomaly detection techniques. This article’s main objective is to improve and quantify the accuracy and scalability of both supervised and unsupervised anomaly detection techniques. In this effort, a novel anomaly detection approach using FS, called Anomaly Detection Using Feature Selection (ADUFS), has been introduced. Experimental analysis was performed on five different benchmark cybersecurity datasets with and without feature selection and the performance of both supervised and unsupervised anomaly detection techniques were investigated. The experimental results indicate that instead of using the original dataset, a dataset with a reduced number of features yields better performance in terms of true positive rate (TPR) and false positive rate (FPR) than the existing techniques for anomaly detection. For example, with FS, a supervised anomaly detection technique, multilayer perception increased the TPR by over 200% and decreased the FPR by about 97% for the KDD99 dataset. Similarly, with FS, an unsupervised anomaly detection technique, local outlier factor increased the TPR by more than 40% and decreased the FPR by 15% and 36% for Windows 7 and NSL-KDD datasets, respectively. In addition, all anomaly detection techniques require less computational time when using datasets with a suitable subset of features rather than entire datasets. Furthermore, the performance results have been compared with six other state-of-the-art techniques based on a decision tree (J48).