Seven pitfalls of using data science in cybersecurity
Document Type
Book Chapter
Publication Title
Data Science in Cybersecurity and Cyberthreat Intelligence
Publisher
Springer
School
School of Science
RAS ID
30916
Abstract
Machine learning, a subset of artificial intelligence, is used for many problems where a data-driven approach is required and the problem space involves either classification or prediction. The hype surrounding machine learning, coupled with the ease of use of machine learning tools can lead to a (mistaken) belief that machine learning is a panacea for all problems and simply feeding large volumes of data to an algorithm will generate a sensible and usable answer. In this chapter, we explore several pitfalls that a data scientist must evaluate in order to obtain some tangible meaning from the results provided by a machine learning algorithm. There is some evidence to suggest that algorithm choice is not a discriminator. In particular, we explore the importance of feature set selection and evaluate the inherent problems in relying on synthetic data.
DOI
10.1007/978-3-030-38788-4_6
Access Rights
subscription content
Comments
Johnstone, M., & Peacock, M. (2020). Seven pitfalls of using data science in cybersecurity. In L. F. Sikos & K.-K. R. Choo (Eds.), Data science in cybersecurity and cyberthreat intelligence. Springer. https://doi.org/10.1007/978-3-030-38788-4_6