Development of an Indonesian Hoax Detection System Using Logistic Regression Based on TF-IDF
DOI:
https://doi.org/10.32664/j-intech.v13i02.2127Keywords:
Hoax Detection, Logistic Regression, Machine Learning, Text Classification, TF-IDFAbstract
The massive spread of fake news (hoaxes) on digital platforms has become a serious challenge in Indonesia, with the potential to disrupt social stability and undermine public trust. This background drives the urgency of developing an automated system to combat disinformation. Unlike previous works relying on deep learning with high computational cost, this study demonstrates that a lightweight approach remains highly effective for Indonesian hoax detection. This study aims to develop and evaluate a lightweight and effective automatic classification system to detect Indonesian-language hoaxes using a machine learning approach. The method used is Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction to represent text content numerically, which is then classified using the Logistic Regression algorithm. This approach was chosen for its computational efficiency and ease of interpretation. The study utilizes a dataset collected from verified sources, consisting of 7,075 Indonesian-language news articles, which were divided into 80% training data and 20% test data. The evaluation results on the test data show excellent model performance, achieving an accuracy of 94.98%, a precision of 0.95, and an average F1-Score of 0.95. Specifically, the model demonstrated a strong ability to identify hoaxes with a recall value of 98% for the hoax class. This study concludes that the combination of TF-IDF and Logistic Regression is an efficient and accurate approach for Indonesian hoax detection, offering a practical solution that can be further developed to combat disinformation.
References
[1] E. Effendi, “User behaviour and hoax information on social media case of Indonesia”, doi: 10.25139/jsk.v7i3.7402.
[2] dan H. H. Palupi Anggraheni, Novi Tri Setyowati, “Social Media and Political Participation in Indonesia: Restrictions Access at Announcement Results of 2019 Presidential Election”, doi: 10.56353/aspiration.v2i1.23.
[3] I. L. W. & L. P. I. Harini. Charlotte Jocelynne, “Detection of Political Hoax News Using Fine-Tuning IndoBERT”, doi: 10.30871/jaic.v9i2.8989.
[4] T. Koto, F., Rahimi, A., Lau, J. H., & Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” 2020, doi: 10.18653/v1/2020.coling-main.66.
[5] O. N. C. & F. Budiman, “Performa Logistic Regression dan Naive Bayes dalam Klasifikasi Berita Hoax di Indonesia”, doi: 10.29408/edumatic.v9i1.28987.
[6] R. E. S. & A. Nurlayli, “Comparative analysis of Indonesian news validity detection accuracy using machine learning”, doi: 10.21831/jeatech.v4i1.58791.
[7] & W. P. K. N. V. Chawla, K. W. Bowyer, L. O. Hall, “SMOTE: Synthetic Minority Over sampling Technique”, doi: 10.1613/jair.953.
[8] J. F. Trevor Hastie, Robert Tibshirani, “The Elements of Statistical Learning: Data Mining, Inference, and Prediction”, doi: 10.1007/978-0-387-84858-7.
[9] A. K. U. & S. G. (2014), “The impact of preprocessing on text classification”, doi: 10.1016/j.ipm.2013.08.006.
[10] & M. W. Mahoney. A. Dasgupta, P. Drineas, B. Harb, V. Josifovski, “Feature Selection Methods for Text Classification”, doi: 10.1145/1281192.1281220.
[11] A. Y. & I. Haristyawan. D. Raya, “Application of K-Nearest Neighbor Algorithm For Sentiment Analysis On Free Fire Online Game Based On Google Play Store Reviews”, doi: 10.32664/j-intech.v13i01.1882.
[12] D. R. & S. A. S. Vosoughi, “The spread of true and false news online”, doi: 10.1126/science.aap9559.
[13] & W. Y. W. (2020) Ray Oshikawa, Jing Qian, “A Survey on Natural Language Processing for Fake News Detection”.
[14] J. B. & Á. F. Eduardo Mosqueira Rey, Elena Hernández Pereira, David Alonso Ríos, “Human-in-the-loop machine learning: a state of the art”, doi: 10.1007/s10462-022-10246-w.
[15] A. G. & A. M. Emma Strubell, “nergy and Policy Considerations for Deep Learning in NLP”, doi: 10.18653/v1/P19-1355.
[16] C.-O. T. & E.-S. Apostol, “It’s All in the Embedding! Fake News Detection Using Document Embeddings,” 2023, doi: 10.3390/math11030508.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 J-INTECH

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

