Author Identifiers

Matthew G. Gaber: https://orcid.org/0000-0003-1684-1392

Mohiuddin Ahmed: https://orcid.org/0000-0002-4559-4768

Michael N Johnstone: https://orcid.org/0000-0001-7192-7098

Publication Date

7-10-2025

Document Type

Dataset

Publisher

Edith Cowan University

School or Research Centre

School of Science

Comments

The dataset contains 10 folders. The first folder, titled model_reports-20251007T052804Z-1-001, is available as the primary downloadable file. The remaining folders are listed as additional files. To access the supplementary data, please access the provided link: https://github.com/MatthewGaber/Construct

Description

ECU-MALNETT (ECU MALware NETwork Traffic) is a real world, reproducible dataset of labeled benign and malicious network flows built from the Peekaboo execution corpus. Peekaboo runs evasive malware with dynamic binary instrumentation and records raw host-level PCAPs while granting full Internet access, yielding noisy, real-world captures with background OS activity and concurrent processes. To derive trustworthy labels from these traces, we apply Construct, a baseline aware, zero-trust labeling framework. Construct first ingests a baseline capture to establish reference sets (DNS qnames, HTTP hosts, TLS SNIs, and socket endpoints) and grows a conservative benign IP pool only via whitelisted DNS resolutions. During per-sample analysis, flows are marked benign only if they match the baseline or an explicit whitelist; all others are treated as suspicious. Malicious evidence then propagates: DNS resolutions outside the benign pool label dependent flows as malicious, while beacon-like timing and anomalous HTTP/port usage extend labels across related endpoints. The result is a corpus of automatically inferred, reproducible, and explainable flow labels that preserves real-world noise and avoids synthetic ground-truth assumptions, enabling rigorous, comparable benchmarking for AI-based malware-traffic analytics. Both Construct and the ECU-MALNETT labels are released to support transparent evaluation and accelerate research on network-based detection of evasive malware.

Additional Information

From Peekaboo’s 20,500 executed samples, ECU-MALNETT comprises a stratified random subset capped at 20 samples per family, covering 58 families across worms, ransomware, trojans, spyware, botnets, post-exploitation tools, APTs, and benign software. The result pairs realistic, noisy captures with automatically inferred labels, enabling rigorous benchmarking of malware traffic analytics without unrealistic ground-truth assumptions.

Research Activity Title

ECU-MALNETT: A Reproducible Dataset of Benign and Malicious Network Traffic

Research Activity Description

As Peekaboo does not map processes to network flows, there is no PID to flow ground truth, we introduce Construct, a baseline-aware, zero-trust labeling framework for PCAPs. Construct first ingests a baseline capture to establish reference sets, DNS qnames, HTTP hosts, TLS SNIs, and socket endpoints, and grows a conservative benign IP pool only via whitelisted DNS resolutions. During per-file analysis, flows are labeled benign only if they match baseline sets or explicit whitelists; all others are treated as suspicious and escalated. Malicious signals then propagate: resolutions outside the benign pool mark dependent flows as malicious, while beacon-like timing or anomalous HTTP/port usage extend labels across related endpoints. This zero-trust approach reduces false positives while retaining sensitivity, yielding reproducible, explainable flow labels.

Methodology

Packet Captures, Python, Dynamic Binary Instrumentation

Start of data collection time period

2025

End of data collection time period

2025

File Format(s)

csv, json, python

File Size

3.47GB

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Contact

m.gaber@ecu.edu.au

Included in

Cybersecurity Commons

Share

 
COinS