RAIDER: Reinforcement-aided spear phishing detector
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
School of Science
Spear Phishing is one of the most difficult to detect cyber attacks facing businesses and individuals worldwide. In recent years, considerable research has been conducted into the use of Machine Learning (ML) techniques for spear-phishing detection. ML-based solutions are vulnerable to zero-day attacks, as when the algorithms do not have access to the relevant historical data, they cannot be reliably trained. Furthermore, email address spoofing is a low-effort yet widely applied forgery technique in spear phishing which the standard email protocol SMTP fails to detect without the use of extensions. Detecting this type of spear threat requires (i) a close investigation of each sender within the mailbox; and (ii) a thorough exploration of the similarity of its characteristics to the spoofed email. This raises scalability challenges due to the growing number of features relevant for investigation and comparison, which is proportional to the number of the senders within a particular mailbox. This differs from traditional phishing attacks, which typically look at email bodies and are generally limited to a binary classification between ‘phishing’ and ‘benign’ emails. We offer a possible solution to these problems, which we label RAIDER: Reinforcement AIded Spear Phishing DEtectoR. A reinforcement-learning based feature evaluation system that can automatically find the optimum features for detecting different types of attacks. By leveraging a reward and penalty system, RAIDER allows for autonomous features selection. RAIDER also keeps the number of features to a minimum by selecting only the significant features to represent phishing emails and detect spear phishing attacks. After extensive evaluation of RAIDER on over 11,000 emails and across 3 attack scenarios, our results suggest that using reinforcement learning to automatically identify the significant features could reduce the dimensions of the required features by 55 % in comparison to existing ML-based systems. It also increases the accuracy of detecting spoofing attacks by 4 %, from 90 % to 94 %. Furthermore, RAIDER demonstrates reasonable detection accuracy against a sophisticated attack named “Known Sender”, in which spear phishing emails greatly resemble those of the impersonated sender. By evaluating and updating the feature set, RAIDER is able to increase accuracy by close to 15 %, from 49 % to 62 % when detecting Known Sender attacks.