Skip to main content

Spam and Phishing Whatsapp Message Filtering Application Using TF - IDF and Machine Learning Methods

Author(s): Ferdinand Aprillian Manurung , Munawir , Deden Pradeka
Author(s) information:
Program Studi Teknik Komputer, Universitas Pendidikan Indonesia, Indonesia

Corresponding author

The rapid development of communication technology has led to an increase in the number of unwanted messages, such as spam and phishing attempts. However, this progress has not been accompanied by sufficient user awareness of the basics of technology use. Additionally, the enforcement of laws regarding internet-based crimes remains unclear, further increasing the risk for users of internet technology to fall victim to such crimes. As one of the media prone to spam and phishing, WhatsApp is the focus of this research, which aims to develop an application capable of filtering spam and phishing messages. The application employs the TF-IDF (Term Frequency-Inverse Document Frequency) method and machine learning using the Random Forest model. It is developed using the MVVM (Model-View-ViewModel) architecture, enabling the separation of business logic from the user interface, thereby improving development and maintenance efficiency. The research findings demonstrate that the combination of TF-IDF and Random Forest achieves high accuracy in classifying spam and phishing messages. Performance evaluation using a confusion matrix reveals an accuracy rate of 92%. For the safe message class, the precision, recall, and F1 scores are 89%, 95%, and 92%, respectively, while for the dangerous message class, the scores are 95%, 88%, and 92%, respectively. Furthermore, the integration of the model and application performed exceptionally well, as evidenced by black-box testing results. All test scenarios were met, successfully detecting test messages with 98% accuracy. Therefore, the developed application provides enhanced protection for WhatsApp users against digital threats.

Next article

Chirzah, D.; Wardhana, Y.A. (2023). Analisis Dampak Pandemi Covid-19 Ditinjau Dari Sudut Pandang Keamanan Siber. Journal of Cybersecurity Studies, 01(01), 1‒8. https://doi.org/10.56772/trends.v1i1.288.

Catal, C.; Giray, G.; Tekinerdogan, B.; Kumar, S.; Shukla, S. (2022). Applications of deep learning for phishing detection: A systematic literature review. Knowledge and Information Systems, 64(6), 1457–1500. https://doi.org/10.1007/s10115-022-01672-x.

Rao, S.; Verma, A.K.; Bhatia, T. (2021). A review on Social Spam Detection: Challenges, open issues, and Future Directions. Expert Systems with Applications, 186, 115742. https://doi.org/10.1016/j.eswa.2021.115742.

Hidayat, A.; Rahman, M.F.; Awaliyah, M.J.; Rachman, A.A.F.; Am, A.M.A. (2023). Analisis Perilaku Mahasiswa dari Ancaman Keamanan Komputer. Journal of Vocational Informatics and Computer Education, 38–43. https://doi.org/10.61220/voice.v1i1.20235.

AllahRakha, N. (2024). Global perspectives on cybercrime legislation. Journal of Infrastructure Policy and Development, 8(10), 6007. https://doi.org/10.24294/jipd.v8i10.6007.

Nugraha, R. (2021). Perspektif Hukum Indonesia (Cyberlaw) Penanganan Kasus Cyber di Indonesia. Jurnal Ilmiah Hukum Dirgantara, 11, 44–56.

Ellis, T.J.; Levy, Y. (2010). A Guide for Novice Researchers: Design and Development Research Methods. Informing Science and IT Education Conference, 107–118. https://doi.org/10.28945/1237.

Danilo Dessí; Rim Helaoui; Vivek Kumar; Diego Reforgiato Recupero; Daniele Riboni. (2020). TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study. CEUR Proceedings of the First Workshop on Smart Personal Health Interfaces Co-located with 25th International Conference on Intelligent User Interfaces (IUI 2020), 1–12. https://doi.org/10.5281/zenodo.4777594.

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/a:1010933404324.

Yuan, H.; Tang, Y.; Sun, W.; Liu, L. (2020). A detection method for Android application security based on TF-IDF and Machine Learning. PLOS ONE, 15(9), e0238694. https://doi.org/10.1371/journal.pone.0238694.

Amir Sjarif, N.N.; Mohd Azmi, N.F.; Chuprat, S.; Sarkan, H.M.; Yahya, Y.; Sam, S.M. (2019). SMS SPAM message detection using term frequency-inverse document frequency and random forest algorithm. Procedia Computer Science, 161, 509–515. https://doi.org/10.1016/j.procs.2019.11.150.

De Silva, D.; Alahakoon, D. (2022). An artificial intelligence life cycle: From conception to production. Patterns, 3(6), 100489. https://doi.org/10.1016/j.patter.2022.100489.

Kouraklis, J. (2016a). MVVM in Delphi: Architecting and building model view viewmodel applications. 1st Ed.; Apress: New York, USA.

Vijaywargi, A.; Boddapati, U.K. (2024). Architectural Patterns in Android Development: Comparing MVP, MVVM, and MVI. International Journal for Research in Applied Science and Engineering Technology, 12(4), 4611–4616. https://doi.org/10.22214/ijraset.2024.60762.

Zidan, M.; Nur’aini, S.; Wibowo, N.C.; Ulinuha, M.A. (2022). Black box testing pada aplikasi single sign on (SSO) Di Diskominfostandi Menggunakan Teknik equivalence partitions. Walisongo Journal of Information Technology, 4(2), 127–137. https://doi.org/10.21580/wjit.2022.4.2.12135.

Dave, B.; Bhat, S.; Majumder, P. (2021, April). IRNLP_DAIICT@DravidianLangTech-EACL2021: Offensive Language identification in Dravidian Languages using TF-IDF Char N-grams and MuRIL. In B.R. Chakravarthi; R. Priyadharshini; A.K.M. Kumar; P. Krishnamurthy; E. Sherly (Eds.), Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 266–269.

Widhiyanti, K.; Atmani, A.K. (2021). Penerapan metode prototyping Dalam Perancangan interface Sistem Unggah portofolio Penerimaan mahasiswa Baru Diploma Isi Yogyakarta. Teknika, 10(2), 88–95. https://doi.org/10.34148/teknika.v10i2.308.

About this article

SUBMITTED: 29 November 2024
ACCEPTED: 10 January 2025
PUBLISHED: 18 January 2025
SUBMITTED to ACCEPTED: 43 days
DOI: https://doi.org/10.53623/gisa.v5i1.551

Cite this article
Manurung, F. A., Munawir, & Pradeka, D. (2025). Spam and Phishing Whatsapp Message Filtering Application Using TF - IDF and Machine Learning Methods. Green Intelligent Systems and Applications, 5(1), 1–13. https://doi.org/10.53623/gisa.v5i1.551
Accessed
93
Citations
0
Share this article