Author Identifier

A N M Bazlur Rashid

https://orcid.org/0000-0002-8672-5023

Date of Award

2021

Document Type

Thesis

Publisher

Edith Cowan University

Degree Name

Doctor of Philosophy

School

School of Science

First Supervisor

Dr Mohiuddin Ahmed

Second Supervisor

Dr Leslie F Sikos

Third Supervisor

Associate Professor Paul Haskell-Dowland

Fourth Supervisor

Dr Tonmoy Choudhury

Abstract

The rapid progress of modern technologies generates a massive amount of highthroughput data, called Big Data, which provides opportunities to find new insights using machine learning (ML) algorithms. Big Data consist of many features (attributes). However, irrelevant features may degrade the classification performance of ML algorithms. Feature selection (FS) is a combinatorial optimisation technique used to select a subset of relevant features that represent the dataset. For example, FS is an effective preprocessing step of anomaly detection techniques in Big Cybersecurity Datasets. Evolutionary algorithms (EAs) are widely used search strategies for feature selection. A variant of EAs, called a cooperative co-evolutionary algorithm (CCEA) or simply cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for large-scale optimisation problems. The goal of this thesis is to investigate and develop three key research issues related to feature selection in Big Data and anomaly detection using feature selection in Big Cybersecurity Data.

The first research problem of this thesis is to investigate and develop a feature selection framework using CCEA. The objective of feature selection is twofold: selecting a suitable subset of features or in other words, reducing the number of features to decrease computations and improving classification accuracy, which are contradictory, but can be achieved using a single objective function. Using only classification accuracy as the objective function for FS, EAs, such as CCEA, achieves higher accuracy, even with a higher number of features. Hence, this thesis proposes a penalty-based wrapper single objective function. This function has been used to evaluate the FS process using CCEA, henceforth called Cooperative Co-Evolutionary Algorithm-Based Feature Selection (CCEAFS). Experimental analysis was performed using six widely used classifiers on six different datasets, with and without FS. The experimental results indicate that the proposed objective function is efficient at reducing the number of features in the final feature subset without significantly reducing classification accuracy. Furthermore, the performance results have been compared with four other state-of-the-art techniques.

CC decomposes a large and complex problem into several subproblems, optimises each subproblem independently, and collaborates different subproblems only to build a complete solution of the problem. The existing decomposition solutions have poor performance because of some limitations, such as not considering feature interactions, dealing with only an even number of features, and decomposing the dataset statically. However, for real-world problems without any prior information about how the features in a dataset interact, it is difficult to find a suitable problem decomposition technique for feature selection. Hence, the second research problem of this thesis is to investigate and develop a decomposition method that can decompose Big Datasets dynamically, and can ensure the probability of grouping interacting features into the same subcomponent. Accordingly, this thesis proposes a random feature grouping (RFG) with three variants. RFG has been used in the CC-based FS process, hence called Cooperative Co-Evolution-Based Feature Selection with Random Feature Grouping (CCFSRFG). Experiment analysis performed using six widely used ML classifiers on seven different datasets, with and without FS, indicates that, in most cases, the proposed CCFSRFG-1 outperforms CCEAFS and CCFSRFG-2, and also does so when using all features. Furthermore, the performance results have been compared with five other state-of-theart techniques.

Anomaly detection from Big Cybersecurity Datasets is very important; however, this is a very challenging and computationally expensive task. Feature selection in cybersecurity datasets may improve and quantify the accuracy and scalability of both supervised and unsupervised anomaly detection techniques. The third research problem of this thesis is to investigate and develop an anomaly detection approach using feature selection that can improve the anomaly detection performance, and also reduce the execution time. Accordingly, this thesis proposes an Anomaly Detection Using Feature Selection (ADUFS) to deal with this research problem. Experiments were performed on five different benchmark cybersecurity datasets, with and without feature selection, and the performance of both supervised and unsupervised anomaly detection techniques were investigated by ADUFS. The experimental results indicate that, instead of using the original dataset, a dataset with a reduced number of features yields better performance in terms of true positive rate (TPR) and false positive rate (FPR) than the existing techniques for anomaly detection. In addition, all anomaly detection techniques require less computational time when using datasets with a suitable subset of features rather than entire datasets. Furthermore, the performance results have been compared with six other state-of-the-art techniques.

Related Publications

Rashid A.N.M.B., Ahmed M., Islam S.R. (2021). A supervised rare anomaly detection technique via cooperative co-evolution-based feature selection using benchmark UNSW_NB15 dataset. In: Wang G., Choo KK.R., Ko R.K.L., Xu Y., Crispo B. (eds) Ubiquitous Security. UbiSec 2021. Communications in Computer and Information Science, vol 1557 (pp. 279-291). Springer, Singapore. https://doi.org/10.1007/978-981-19-0468-4_21 and https://ro.ecu.edu.au/ecuworks2022-2026/151/

Rashid, A. N. M. B., Ahmed, M., Sikos, L. F., & Haskell-Dowland, P. (2022). Anomaly detection in cybersecurity datasets via cooperative co-evolution-based feature selection. ACM Transaction on Management Information Systems, 13(3), Article 29 (September 2022), 39 pages. DOI: https://doi.org/10.1145/3495165 and https://ro.ecu.edu.au/ecuworks2022-2026/150/

Rashid, A. N. M., Ahmed, M., & Pathan, A. S. K. (2021). Infrequent pattern detection for reliable network traffic analysis using robust evolutionary computation. Sensors, 21(9), article 3005. https://doi.org/10.3390/s21093005 and https://ro.ecu.edu.au/ecuworkspost2013/10165/

Rashid, A. N. M. B., & Choudhury, T. (2021). Cooperative co-evolution and mapreduce: A review and new insights for large-scale optimisation. International Journal of Information Technology Project Management (IJITPM), 12(1), 29-62. https://doi.org/10.4018/IJITPM.2021010102 Link to article available https://ro.ecu.edu.au/ecuworkspost2013/9390/

Rashid, A. N. M. B., & Choudhury, T. (2019). Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and MapReduce perspectives. Problems and Perspectives in Management, 17(4), 340-359. https://doi.org/10.21511/ppm.17(4).2019.28 Article available https://ro.ecu.edu.au/ecuworkspost2013/7533/

Rashid, A. N. M. B., Ahmed, M., Sikos, L. F., & Haskell-Dowland, P. (2020). Cooperative co-evolution for feature selection in big data with random feature grouping. Journal of Big Data, 7, article 107. https://doi.org/10.1186/s40537-020-00381-y Article available https://ro.ecu.edu.au/ecuworkspost2013/9318/

Rashid, A. N. M. B., Ahmed, M., Sikos, L. F., & Haskell-Dowland, P. (2020). Correction to: Cooperative co‑evolution for feature selection in big data with random feature grouping. Journal of Big Data, 7, article 111. https://doi.org/10.1186/s40537-020-00403-9 Article available https://ro.ecu.edu.au/ecuworkspost2013/9365/

Rashid, A. N. M. B., Ahmed, M., Sikos, L. F., & Haskell-Dowland, P. (2020). A Novel Penalty-Based Wrapper Objective Function for Feature Selection in Big Data Using Cooperative Co-Evolution. IEEE Access, 8, 150113-150129. https://doi.org/10.1109/ACCESS.2020.3016679 Article available https://ro.ecu.edu.au/ecuworkspost2013/8539/

Recommended Citation

Rashid, A. N. M. B. (2021). Cooperative co-evolution-based feature selection for big data analytics. https://ro.ecu.edu.au/theses/2428

Theses: Doctorates and Masters

Cooperative co-evolution-based feature selection for big data analytics

Author Identifier

Date of Award

Document Type

Publisher

Degree Name

School

First Supervisor

Second Supervisor

Third Supervisor

Fourth Supervisor

Abstract

Related Publications

Recommended Citation

Included in

Search

Links

Browse

Author Information

Links

Paper Locations

Theses: Doctorates and Masters

Cooperative co-evolution-based feature selection for big data analytics

Author

Author Identifier

Date of Award

Document Type

Publisher

Degree Name

School

First Supervisor

Second Supervisor

Third Supervisor

Fourth Supervisor

Abstract

Related Publications

Recommended Citation

Included in

Share

Search

Links

Browse

Author Information

Links

Paper Locations