Research outputs 2022 to 2026

Cyberbullying text identification: A deep learning and transformer-based language modeling approach

Abstract

In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language.

Keywords

Cyberbullying, deep learning, fine tuning, harmful messages, large language modeling, natural language processing (NLP), OOV, transformers models

Document Type

Journal Article

Date of Publication

1-1-2024

Volume

Issue

Publication Title

EAI Endorsed Transactions on Industrial Networks and Intelligent Systems

Publisher

European Alliance for Innovation

School

School of Engineering

RAS ID

71518

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 License.

Comments

Saifullah, K., Khan, M. I., Jamal, S., & Sarker, I. H. (2024). Cyberbullying text identification based on deep learning and transformer-based Language models. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 11(1), e5. https://doi.org/10.4108/EETINIS.V11I1.4703

First Page

Last Page

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Link to publisher version (DOI)

10.4108/EETINIS.V11I1.4703

Research outputs 2022 to 2026

Cyberbullying text identification: A deep learning and transformer-based language modeling approach

Abstract

Keywords

Document Type

Date of Publication

Volume

Issue

Publication Title

Publisher

School

RAS ID

Creative Commons License

Comments

First Page

Last Page

Included in

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations

Research outputs 2022 to 2026

Cyberbullying text identification: A deep learning and transformer-based language modeling approach

Authors/Creators

Abstract

Keywords

Document Type

Date of Publication

Volume

Issue

Publication Title

Publisher

School

RAS ID

Creative Commons License

Comments

First Page

Last Page

Included in

Share

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations