SRI Security Research Institute, Edith Cowan University, Perth, Western Australia
Shafiq et al. (2009a) propose a non–signature-based technique for detecting malware which applies data mining techniques to features extracted from executable files. Their technique has a high level of accuracy, a low false positive rate, and a speed on par with commercial anti-virus products. One portion of their technique uses a multi-layer perceptron as a classifier, which provides little insight into the reasons for classification. Our experience is that network security analysts prefer tools which provide human-comprehensible reasons for a classification, rather than operating as “black boxes”. We therefore build on the results of Shafiq et al. by demonstrating a technique which uses decision trees to distinguish packed from non-packed files, producing a classification diagram which can be understood by analysts. We show that the resulting detector still provides high accuracy and classifies files rapidly.