Chi square feature selection for improving sentiment analysis of news data privacy treats

Author Identifier

Ferry Jie: https://orcid.org/0000-0002-6287-8471

Document Type

Journal Article

Publication Title

Journal of Theoretical and Applied Information Technology

Volume

102

Issue

18

First Page

6601

Last Page

6610

Publisher

Little Lion Scientific

School

School of Business and Law

RAS ID

77115

Comments

Sami’un, D. C., Sugiharto, A., & Jie, F. (2024). Chi square feature selection for improving sentiment analysis of news data privacy treats. Journal of Theoretical and Applied Information Technology, 102(18), 6601-6610. http://www.jatit.org/volumes/Vol102No18/3Vol102No18.pdf

Abstract

Data security and privacy issues are becoming increasingly pressing in the technology-driven digital era. In 2022, this issue became a major topic in Indonesia and triggered various responses on social media. YouTube, one of the primary platforms, plays a crucial role as a news source. To understand public reactions to this news, sentiment analysis is employed as a research method. The initial stage before conducting sentiment analysis involves data preprocessing, which includes cleaning, case folding, tokenization, slang correction, stemming, and stopword removal. Subsequently, the TF-IDF method is used to assess the significance of words in documents, and Chi-Square feature selection is applied to enhance the performance of the classification model. The main contribution of this study lies in the application of Chi-Square feature selection to improve sentiment analysis accuracy in the context of data privacy threat news. Chi-Square feature selection has proven to be effective in identifying the most relevant features, thereby eliminating irrelevant features and enhancing the accuracy of the classification model. The use of the C5.0 algorithm combined with Chi-Square feature selection achieved the highest accuracy of 87.34%, compared to the 80.14% accuracy achieved without the Chi-Square feature selection method. This research makes a significant contribution by demonstrating that appropriate feature selection methods can substantially improve sentiment analysis model performance, providing a more accurate and effective approach to managing and analyzing sentiment data from social media platforms.

Access Rights

free_to_read

Share

 
COinS