BEWARE: A methodical approach to develop BEnign and MalWARE datasets

Author Identifier

Muhammad Imran Malik

https://orcid.org/0000-0003-3829-9282

Date of Award

2024

Document Type

Thesis

Publisher

Edith Cowan University

Degree Name

Doctor of Philosophy

School

School of Science

First Supervisor

Ahmed Ibrahim

Second Supervisor

Peter Hannay

Third Supervisor

Mohi Ahmed

Abstract

Cyber-Physical Systems (CPSes) are continually advancing in many Critical Infrastructure (CI) sectors, such as energy, healthcare, the military, and telecommunication. These systems are persistently targeted by nation-sponsored cyber criminal groups, who continuously refine their techniques and attack methods. Malicious software (malware) attacks are the most common of these tactics and can have severe consequences, as observed in the Stuxnet, BlackEnergy, Industroyer, and Triton attacks.

A dataset consisting of validated benign and malware samples is a fundamental ingredient in developing solutions that help improve the security posture of interconnected computer systems and networks. The research community has used various datasets to build Artificial Intelligence (AI) models. However, these datasets are limited to static features, lack fine details on how they were developed or acquired, developed using a sandbox prone to anti-analysis and anti-sandbox techniques, and have no representation of malware targeting CPSes.

This research has proposed a systematic approach to develop benchmark labelled benign, general-purpose, and Cyber-Physical System malware datasets, collectively termed BEWARE. Further, the developed datasets were evaluated for their effectiveness using various machine learning algorithms. In doing so, this research has made significant and novel contributions to the body of knowledge by:

1. Proposing a systematic approach to gather and validate benign samples and acquire malware samples.

2. Assembling a curated dataset of real-world malware targeting CPSes that can be used to further extend CPS malware research.

3. Developing benign and malware datasets that include static and dynamic features obtained after executing these samples in an industry-standard sandbox environment.

4. Utilising a Natural Language Processing (NLP) technique to derive a comprehensive and effective set of features in the benign and malware datasets, resulting in fewer False Positives (FPs) and False Negatives (FNs).

DOI

10.25958/gz0e-xn50

Access Note

Access to this thesis is embargoed until 15th August 2029.

Access to this thesis is restricted. Please see the Access Note below for access details.

Share

 
COinS