BEWARE: A methodical approach to develop BEnign and MalWARE datasets
Date of Award
2024
Document Type
Thesis
Publisher
Edith Cowan University
Degree Name
Doctor of Philosophy
School
School of Science
First Supervisor
Ahmed Ibrahim
Second Supervisor
Peter Hannay
Third Supervisor
Mohi Ahmed
Abstract
Cyber-Physical Systems (CPSes) are continually advancing in many Critical Infrastructure (CI) sectors, such as energy, healthcare, the military, and telecommunication. These systems are persistently targeted by nation-sponsored cyber criminal groups, who continuously refine their techniques and attack methods. Malicious software (malware) attacks are the most common of these tactics and can have severe consequences, as observed in the Stuxnet, BlackEnergy, Industroyer, and Triton attacks.
A dataset consisting of validated benign and malware samples is a fundamental ingredient in developing solutions that help improve the security posture of interconnected computer systems and networks. The research community has used various datasets to build Artificial Intelligence (AI) models. However, these datasets are limited to static features, lack fine details on how they were developed or acquired, developed using a sandbox prone to anti-analysis and anti-sandbox techniques, and have no representation of malware targeting CPSes.
This research has proposed a systematic approach to develop benchmark labelled benign, general-purpose, and Cyber-Physical System malware datasets, collectively termed BEWARE. Further, the developed datasets were evaluated for their effectiveness using various machine learning algorithms. In doing so, this research has made significant and novel contributions to the body of knowledge by:
1. Proposing a systematic approach to gather and validate benign samples and acquire malware samples.
2. Assembling a curated dataset of real-world malware targeting CPSes that can be used to further extend CPS malware research.
3. Developing benign and malware datasets that include static and dynamic features obtained after executing these samples in an industry-standard sandbox environment.
4. Utilising a Natural Language Processing (NLP) technique to derive a comprehensive and effective set of features in the benign and malware datasets, resulting in fewer False Positives (FPs) and False Negatives (FNs).
DOI
10.25958/gz0e-xn50
Access Note
Access to this thesis is embargoed until 15th August 2029.
Recommended Citation
Malik, M. I. (2024). BEWARE: A methodical approach to develop BEnign and MalWARE datasets. Edith Cowan University. https://doi.org/10.25958/gz0e-xn50