A study of feature subset evaluators and feature subset searching methods for phishing classification
Faculty of Computing, Health and Science
School of Computer and Security Science / Security Research Centre (secAU)
Phishing is a semantic attack that aims to take advantage of the naivety of users of electronic services (e.g. e-banking). A number of solutions have been proposed to minimize the impact of phishing attacks. The most accurate email phishing classi ers, that are publicly known, use machine learning techniques. Previous work in phishing email classi cation via machine learning have primarily focused on enhancing the classi cation accuracy by studying the addition of novel features, ensembles, or classi cation algorithms. This study follows a di erent path by taking advantage of previously proposed features. The primary focus of this paper is to enhance the classi cation accuracy of phishing email classi- ers by nding an e ective feature subset out of a number of previously proposed features, by evaluating various feature selection methods. The selected feature subset in this study resulted in a classi cation model with an f1 score of 99.396% for 21 heuristic features and a single classi er.