Author Identifiers
Matthew G. Gaber: https://orcid.org/0000-0003-1684-1392
Mohiuddin Ahmed: https://orcid.org/0000-0002-4559-4768
Michael N Johnstone: https://orcid.org/0000-0001-7192-7098
Publication Date
7-10-2025
Document Type
Dataset
Publisher
Edith Cowan University
School or Research Centre
School of Science
Description
ECU-MALNETT (ECU MALware NETwork Traffic) is a real world, reproducible dataset of labeled benign and malicious network flows built from the Peekaboo execution corpus. Peekaboo runs evasive malware with dynamic binary instrumentation and records raw host-level PCAPs while granting full Internet access, yielding noisy, real-world captures with background OS activity and concurrent processes. To derive trustworthy labels from these traces, we apply Construct, a baseline aware, zero-trust labeling framework. Construct first ingests a baseline capture to establish reference sets (DNS qnames, HTTP hosts, TLS SNIs, and socket endpoints) and grows a conservative benign IP pool only via whitelisted DNS resolutions. During per-sample analysis, flows are marked benign only if they match the baseline or an explicit whitelist; all others are treated as suspicious. Malicious evidence then propagates: DNS resolutions outside the benign pool label dependent flows as malicious, while beacon-like timing and anomalous HTTP/port usage extend labels across related endpoints. The result is a corpus of automatically inferred, reproducible, and explainable flow labels that preserves real-world noise and avoids synthetic ground-truth assumptions, enabling rigorous, comparable benchmarking for AI-based malware-traffic analytics. Both Construct and the ECU-MALNETT labels are released to support transparent evaluation and accelerate research on network-based detection of evasive malware.
Additional Information
From Peekaboo’s 20,500 executed samples, ECU-MALNETT comprises a stratified random subset capped at 20 samples per family, covering 58 families across worms, ransomware, trojans, spyware, botnets, post-exploitation tools, APTs, and benign software. The result pairs realistic, noisy captures with automatically inferred labels, enabling rigorous benchmarking of malware traffic analytics without unrealistic ground-truth assumptions.
Research Activity Title
ECU-MALNETT: A Reproducible Dataset of Benign and Malicious Network Traffic
Research Activity Description
As Peekaboo does not map processes to network flows, there is no PID to flow ground truth, we introduce Construct, a baseline-aware, zero-trust labeling framework for PCAPs. Construct first ingests a baseline capture to establish reference sets, DNS qnames, HTTP hosts, TLS SNIs, and socket endpoints, and grows a conservative benign IP pool only via whitelisted DNS resolutions. During per-file analysis, flows are labeled benign only if they match baseline sets or explicit whitelists; all others are treated as suspicious and escalated. Malicious signals then propagate: resolutions outside the benign pool mark dependent flows as malicious, while beacon-like timing or anomalous HTTP/port usage extend labels across related endpoints. This zero-trust approach reduces false positives while retaining sensitivity, yielding reproducible, explainable flow labels.
Methodology
Packet Captures, Python, Dynamic Binary Instrumentation
Start of data collection time period
2025
End of data collection time period
2025
File Format(s)
csv, json, python
File Size
3.47GB
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Contact
m.gaber@ecu.edu.au
Citation
Gaber, M. G., Ahmed, M., & Johnstone, M. N. (2025). ECU-MALNETT. Edith Cowan University. https://doi.org/10.25958/f4sc-7402
APT-20251006T071928Z-1-001.zip (1683687 kB)
Benign-20251006T072421Z-1-001.zip (864486 kB)
Botnet-20251006T072707Z-1-002.zip (19300 kB)
Ransomware-20251006T073342Z-1-002.zip (43660 kB)
Spyware-20251007T030534Z-1-002.zip (51721 kB)
Tool-20251007T033036Z-1-002.zip (17104 kB)
Trojan-20251007T033637Z-1-002.zip (85976 kB)
Worm-20251007T045948Z-1-001.zip (866150 kB)
Comments
The dataset contains 10 folders. The first folder, titled model_reports-20251007T052804Z-1-001, is available as the primary downloadable file. The remaining folders are listed as additional files. To access the supplementary data, please access the provided link: https://github.com/MatthewGaber/Construct