Title

Seven pitfalls of using data science in cybersecurity

Document Type

Book Chapter

Publication Title

Data Science in Cybersecurity and Cyberthreat Intelligence

Publisher

Springer

School

School of Science

RAS ID

30916

Comments

Johnstone, M., & Peacock, M. (2020). Seven pitfalls of using data science in cybersecurity. In L. F. Sikos & K.-K. R. Choo (Eds.), Data science in cybersecurity and cyberthreat intelligence. Springer. https://doi.org/10.1007/978-3-030-38788-4_6

Abstract

Machine learning, a subset of artificial intelligence, is used for many problems where a data-driven approach is required and the problem space involves either classification or prediction. The hype surrounding machine learning, coupled with the ease of use of machine learning tools can lead to a (mistaken) belief that machine learning is a panacea for all problems and simply feeding large volumes of data to an algorithm will generate a sensible and usable answer. In this chapter, we explore several pitfalls that a data scientist must evaluate in order to obtain some tangible meaning from the results provided by a machine learning algorithm. There is some evidence to suggest that algorithm choice is not a discriminator. In particular, we explore the importance of feature set selection and evaluate the inherent problems in relying on synthetic data.

DOI

10.1007/978-3-030-38788-4_6

Share

 
COinS