Matthew Gaber: Peekaboo

Publication Date

2024

Document Type

Dataset

Publisher

Edith Cowan University

Faculty

School of Science

Description

Cyber-attacks continue to evolve, increasing in frequency and sophistication where Artificial Intelligence (AI) is becoming essential in detecting modern malware. However, the accuracy of AI in malware detection is dependent on the quality of the features it is trained with. Static and dynamic analysis of malware is limited by the widespread use of obfuscation and anti-analysis techniques employed by malware authors, where if an analysis environment is detected the malware will hide its malicious behavior. However, Dynamic Binary Instrumentation (DBI) allows deep and precise control of the malware sample, thereby facilitating the extraction of authentic features from sophisticated and evasive malware. We developed Peekaboo, a DBI tool to defeat the anti-analysis techniques and extract authentic behavior from live malware samples. We collected 18,527 malware samples across ransomware, spyware, trojans, botnets, worms, Advanced Persistent Threats (APT) and post exploitation tools where every sample includes type, family, and variant information, for example Ransomware-WannaCry-SHA256. We also collected 1,973 benign software samples for analysis.

This dataset contains the results for each sample, that were run for up to 15 minutes, to observe not only the anti-analysis techniques used but also its complete behavior. For each malware sample, the network traffic, every opcode that is executed and every evasive technique that is used are captured.

Additional Information

There are three main folders in the linked repository.

  1. The Peekaboo Data folder contains zip files of the timestamped raw json files extracted by Peekaboo for each sample and are organised by the malware family. There is also a csv file generated with analysis.py for each family.
  2. The Peekaboo Network Traffic folder contains zip files of the .pcap files extracted by Peekaboo for every sample organised by family.
  3. The Python Scripts folder contains the Python scripts detailed below.

DOI

https://doi.org/10.25958/85p1-4w32

Research Activity Title

Defeating Evasive Malware with Peekaboo: Extracting Authentic Malware Behavior with Dynamic Binary Instrumentation

Research Activity Description

The accuracy and effectiveness of AI for malware detection is dependent on the quality and quantity of the features it is trained with. That is, an analysis tool that forces malware to expose it malicious intent and then extracts genuine features, along with large and diverse repositories of malware and benign software, are necessary to train accurate AI models. This research had two primary objectives: investigation of the evasive techniques used by modern malware and the creation of the Peekaboo DBI tool to extract authentic behavior from live malware samples.

Methodology

Dynamic Binary Instrumentation

Start of data collection time period

2024

End of data collection time period

2024

File Format(s)

json, csv, python, pcap

Viewing Instructions

Notes for the Python scripts:

  • analysis.py extracts the evasive techniques that are in the individual data json files and aggregates them into a family and writes a csv with the results.
  • check-json-pcap.py deletes network traffic pcap files that do not have a corresponding data json file which reconciles those samples that did not execute.
  • count-asm.py counts and categorizes the individual Assembly (ASM) instructions in the data json files.
  • count-techniques.py counts the various evasive techniques. For each csv file generated for each family with the analysis.py script, this will count the number of times each technique was used by the individual samples, that is the rows in the analysis.csv file.
  • extract-results.py is used to extract the data json files from the individual archives in the data folder.
  • get-opcodes-threads.py extracts the Opcodes and Threads columns for each analysis csv file in a folder.
  • splitjson.py is used to split huge json files into more manageable sizes.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Contact

Matthew Gaber

This item is not available for download.

Share

 
COinS