Chi square feature selection for improving sentiment analysis of news data privacy treats
Author Identifier
Ferry Jie: https://orcid.org/0000-0002-6287-8471
Document Type
Journal Article
Publication Title
Journal of Theoretical and Applied Information Technology
Volume
102
Issue
18
First Page
6601
Last Page
6610
Publisher
Little Lion Scientific
School
School of Business and Law
RAS ID
77115
Abstract
Data security and privacy issues are becoming increasingly pressing in the technology-driven digital era. In 2022, this issue became a major topic in Indonesia and triggered various responses on social media. YouTube, one of the primary platforms, plays a crucial role as a news source. To understand public reactions to this news, sentiment analysis is employed as a research method. The initial stage before conducting sentiment analysis involves data preprocessing, which includes cleaning, case folding, tokenization, slang correction, stemming, and stopword removal. Subsequently, the TF-IDF method is used to assess the significance of words in documents, and Chi-Square feature selection is applied to enhance the performance of the classification model. The main contribution of this study lies in the application of Chi-Square feature selection to improve sentiment analysis accuracy in the context of data privacy threat news. Chi-Square feature selection has proven to be effective in identifying the most relevant features, thereby eliminating irrelevant features and enhancing the accuracy of the classification model. The use of the C5.0 algorithm combined with Chi-Square feature selection achieved the highest accuracy of 87.34%, compared to the 80.14% accuracy achieved without the Chi-Square feature selection method. This research makes a significant contribution by demonstrating that appropriate feature selection methods can substantially improve sentiment analysis model performance, providing a more accurate and effective approach to managing and analyzing sentiment data from social media platforms.
Access Rights
free_to_read
Comments
Sami’un, D. C., Sugiharto, A., & Jie, F. (2024). Chi square feature selection for improving sentiment analysis of news data privacy treats. Journal of Theoretical and Applied Information Technology, 102(18), 6601-6610. http://www.jatit.org/volumes/Vol102No18/3Vol102No18.pdf