An empirical evaluation for feature selection methods in phishing email classification
Crl Publishing Ltd
Faculty of Health, Engineering and Science
Phishing email detection is highly dependent on the accuracy of anti-phishing classifiers. Classifiers that use Machine-Learning techniques achieve highest phishing email classification accuracy results according to the literature. Using effective features in Machine-Learning is a critical step in raising classifiers detection accuracy. This study aims at evaluating a number of feature subset selection methods as they relate to the phishing email classification domain. In order to perform this study, a total of 47 classification features were constructed as previously proposed in the literature. The primary outcome of this study is that the Wrapper evaluator and the Best-First: Forward searching method resulted in finding the most effective features subset among all other evaluated methods. This study addresses the gap that exists between fragmented literature items by evaluating them together following common evaluation metrics. Using the best performing feature selection method, an effective features subset was found among the 47 previously proposed features, which resulted in a highly accurate anti-phishing email classifier with an f1 score of 99.396%. This also shows that a highly competitive anti-phishing email classifier can still be constructed by only using existing Machine-Learning techniques and previously proposed features if an effective features subset is found.