Early screening of breast cancer using machine learning algorithms: A comparative study
Author Identifier (ORCID)
Syed Mohammed Shamsul Islam: https://orcid.org/0000-0002-3200-2903
Abstract
Breast cancer is one of the most typical types of cancer in women. It is the second greatest cause of death for women worldwide. Early detection and treatment can raise the likelihood of a full recovery and decrease the risk of cancer spreading. Therefore, the advancement in breast cancer illness prediction and detection is crucial for living a healthy life. As a result, high cancer prognostic accuracy is crucial for updating therapy aspects and patient survivability standards. Machine learning techniques are now a top area of research because of their significant impact on the early diagnosis of breast cancer. To detect breast cancer, we applied seven machine learning algorithms: Random Forest (RF), Naïve Bayes (NB), Extreme Gradient Boost (XGB), Decision Tree (DT), Support Vector Machine (SVM), Logistic Regression (LR), and K-Nearest Neighbors (KNN). We have also performed 10-fold crossvalidation method to detect breast cancer. The main goal of this study is finding the most effective machine learning algorithms for the prediction and diagnosis of breast cancer through confusion matrices, accuracy, and precision as well as ROC-AUC curves and scores. The study is performed by applying machine learning algorithms through feature scaling and two different splits of the training and testing data sets as well as 10-fold cross-validation methods. In this study, it has been seen that while all the selected classifiers have performed well in detecting breast cancer, the RF exceeds all other classifiers and obtains the best accuracy (97.9%) when the datasets are divided into 75% training and 25% testing data. On the other hand, SVM was found to beat all other classifiers with an accuracy of 98.20% when the datasets are divided into 80% training and 20% testing. Although the average accuracy decreased slightly (97.40%) when we performed 10-fold crossvalidation technique, SVM was still showing the best performance. This demonstrates that the separation of training and testing data sets may have an impact on how well machine learning classifiers perform.
Document Type
Conference Proceeding
Date of Publication
2024
Publisher
IEEE
School
School of Science
RAS ID
82333
Event Dates
2024 IEEE International Conference on Future Machine Learning and Data Science, FMLDS 2024
Event Venue
Sydney, Australia
ISBN
979-8-3503-9121-3
First Page
51
Last Page
56
Comments
Haque, A., Majumder, M., Islam, S. (2024). Early screening of breast cancer using machine learning algorithms: A comparative study. Proceedings - 2024 IEEE International Conference on Future Machine Learning and Data Science, FMLDS 2024 (51-56). IEEE. https://doi.org/10.1109/FMLDS63805.2024.00019.