Author Identifier (ORCID)

Helge Janicke: https://orcid.org/0000-0002-1345-2829

Iqbal H. Sarker: https://orcid.org/0000-0003-1740-5517

Abstract

Short Message Service (SMS) is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages - commonly known as SMS spam. With the rapid adoption of smartphones and increased Internet connectivity, SMS spam has emerged as a prevalent threat. Spammers have recognized the critical role SMS plays in today's modern communication, making it a prime target for abuse. As cybersecurity threats continue to evolve, the volume of SMS spam has increased substantially in recent years. Moreover, the unstructured format of SMS data creates significant challenges for SMS spam detection, making it more difficult to successfully combat spam attacks. In this paper, we present an optimized and fine-tuned transformer-based Language Model to address the problem of SMS spam detection. We use a benchmark SMS spam dataset to analyze this spam detection model. Additionally, we utilize pre-processing techniques to obtain clean and noise-free data and address class imbalance problem by leveraging text augmentation techniques. The overall experiment showed that our optimized fine-tuned BERT (Bidirectional Encoder Representations from Transformers) variant model RoBERTa obtained high accuracy with 99.84%. To further enhance model transparency, we incorporate Explainable Artificial Intelligence (XAI) techniques that compute positive and negative coefficient scores, offering insight into the model's decision-making process. Additionally, we evaluate the performance of traditional machine learning models as a baseline for comparison. This comprehensive analysis demonstrates the significant impact language models can have on addressing complex text-based challenges within the cybersecurity landscape.

Document Type

Journal Article

Date of Publication

10-1-2025

Volume

11

Issue

5

Publication Title

Digital Communications and Networks

Publisher

Elsevier

School

Centre for Securing Digital Futures

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Comments

Uddin, M. A., Islam, M. N., Maglaras, L., Janicke, H., & Sarker, I. H. (2025). ExplainableDetector: Exploring transformer-based language modeling approach for SMS spam detection with explainability analysis. Digital Communications and Networks, 11(5), 1504–1518. https://doi.org/10.1016/j.dcan.2025.07.008

First Page

1504

Last Page

1518

Share

 
COinS
 

Link to publisher version (DOI)

10.1016/j.dcan.2025.07.008