A Sentiment Analysis of Hate Speech in Philippine Election-Related Posts Using BERT Combined with Convolutional Neural Networks and Model Variations Incorporating Hashtags and ALL-CAPS

Micah Collette O. Mendoza; Wayne Gabriel S. Nadurata; Mark Gabriel E. Oritz; Joshua Mari L. Padlan; Charmaine S. Ponay

doi:10.53623/gisa.v4i2.491

As the number of people who use X continually increases, the same thing is true for hate speech. A pressing need exists for automatic detection of posts that promote hate speech. The datasets gathered and validated from the base study were used to categorize posts as either hate or non-hate and classify them as positive, negative, or neutral using Conventional Neural Networks. The partitioning of the labeled data into training and testing sets adhered to a ratio scheme: 70%-30%, 80%-20%, and 90%-10%. The model of this study, BERT-CNN, had an overall better performance than the base study, fastText CNN. Notably, among the three splits, the BERT-CNN model for binary classification without the features of Hashtags and ALL-CAPS with the 90:10 split achieved the best performance with an accuracy of 93.55%, precision of 93.59%, and F1-score of 93.55%. For multi-label classification, the BERT-CNN model demonstrated its optimal performance when incorporating hashtags, specifically with the 90:10 split, achieving an accuracy of 69.14%, precision of 68.44%, recall of 68.40%, and an F1-score of 67.41%. The innovative use of BERT word embeddings paired with CNN proved to excel in classifying Philippine election-related posts as hate or non-hate.

Read Full-Text

Previous article

Next article

References

Hate speech and incitement to hatred or violence. (accessed on 1 September 2024) Available online: https://www.ohchr.org/en/special-procedures/sr-religion-or-belief/hate-speech-and-incitement-hatred-or-violence#:~:text=As%20a%20matter%20of%20principle,peaceful%2C%20inclusive%20and%20just%20societies.

Alfina, I.; Sigmawaty, D.; Nurhidayati, F.; Hidayanto, A.N. (2017). Utilizing hashtags for sentiment analysis of tweets in the political domain. Proceedings of the 9th International Conference on Machine Learning and Computing, 43‒47. https://doi.org/10.1145/3055635.3056631.

Hidayatullah, A.F.; Cahyaningtyas, S.; Hakim, A.M. (2021). Sentiment analysis on Twitter using neural network: Indonesian presidential election 2019 dataset. IOP Conference Series: Materials Science and Engineering, 1077(1), 012001. https://doi.org/10.1088/1757-899x/1077/1/012001.

Malik, P.; Aggrawal, A.; Vishwakarma, D.K. (2021, April). Toxic speech detection using traditional machine learning models and BERT and fastText embedding with deep neural networks. 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), 1254–1259. IEEE. https://doi.org/10.1109/ICCMC51019.2021.9418229.

Alzahrani, E.; Jololian, L. (2021). How different text-preprocessing techniques using the BERT model affect the gender profiling of authors. Advances in Machine Learning, 1‒8. https://doi.org/10.5121/csit.2021.111501.

Solitana, N.T.; Cheng, C.K. (2021, December). Analyses of hate and non-hate expressions during election using NLP. 2021 International Conference on Asian Language Processing (IALP), 385–390.

Velankar, A.; Patil, H.; Gore, A.; Salunke, S.; Joshi, R. (2022). L3cube-mahahate: A tweet-based Marathi hate speech detection dataset and BERT models. arXiv preprint arXiv:2203.13778.

Alim, M.M.F. (2021, June). A sentiment analysis study for Twitter using the various model of convolutional neural network. Journal of Physics: Conference Series, 1918(4), 042136. https://doi.org/10.1088/1742-6596/1918/4/042136.

Arganosa S.;Marasigan, R.;Villanueva, J.; Wenceslao, K.; Ponay, C. (2022). Hate Speech in Filipino Election-Related Tweets: A Sentiment Analysis Using Convolutional Neural Networks. http://dx.doi.org/10.13140/RG.2.2.20961.52326

Cabasag, N.; Chan, V.; Lim, S.; Gonzales, M.E.; Cheng, C. (2019). Hate Speech in Philippine Election-Related Tweets: Automatic Detection and Classification Using Natural Language Processing. Philippine Computing Journal, XIV, 1‒14.

Where is the love? Identifying hate speech in Philippine election-related tweets. (accessed on 1 September 2024) Available online: https://asite.aim.edu/data_science/where-is-the-love-identifying-hate-speech-in-philippine-election-related-tweets/.

Mehta, R.P.; Sanghvi, M.A.; Shah, D.K.; Singh, A. (2019). Sentiment analysis of tweets using supervised learning algorithms. First International Conference on Sustainable Technologies for Computational Intelligence, 323–338. https://doi.org/10.1007/978-981-15-0029-9_26.

Bello, A.; Ng, S.-C.; Leung, M.-F. (2023). A BERT framework to sentiment analysis of tweets. Sensors, 23(1), 506. https://doi.org/10.3390/s23010506.

Mastering BERT: A comprehensive guide from beginner to advanced in natural language processing. (accessed on 1 September 2024) Available online: https://medium.com/@shaikhrayyan123/a-comprehensive-guide-to-understanding-bert-from-beginners-to-advanced-2379699e2b51.

De Goma, J., Hungria, C., Boquiren, A., & Garcia, R. (2022). Tagalog Sentiment Analysis Using Deep Learning Approach with Backward Slang Inclusion. 3rd African International Conference on Industrial Engineering and Operations Management, https://doi.org/10.46254/AF03.20220180.

Kaviani, M.; Rahmani, H. (2020). EmHash: Hashtag recommendation using neural network based on BERT embedding. 6th International Conference on Web Research (ICWR). http://doi.org/10.1109/ICWR49608.2020.9122275.

BERT transformers – how do they work? (accessed on 1 September 2024) Available online: https://www.exxactcorp.com/blog/Deep-Learning/how-do-bert-transformers-work.

Statistical significance tests for comparing machine learning algorithms. (accessed on 1 September 2024) Available online: https://machinelearningmastery.com/statistical-significance-tests-for-comparing-machine-learning-algorithms/.

Chan, S.; Fyshe, A. (2018). Social and Emotional Correlates of Capitalization on Twitter. Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media. Association for Computational Linguistics: New Orleans, Louisiana, USA. pp. 10–15.

Mingua, J.; Padilla, D.; Celino, E.J. (2021, November). Classification of fire-related posts on Twitter using Bidirectional Encoder Representations from Transformers (BERT). 2021 IEEE 13th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), 1–6. https://doi.org/10.1109/HNICEM.2021.9604867.

Nanli, Z.; Ping, Z.; Weiguo, L.I.; Meng, C. (2012, November). Sentiment analysis: A literature review. 2012 International Symposium on Management of Technology (ISMOT), 572–576.

Imperial, J.M.; Orosco, J.; Mazo, S.M.; Maceda, L. (2019). Sentiment analysis of typhoon related tweets using standard and bidirectional recurrent neural networks. arXiv preprint arXiv:1908.01765.

Chiorrini, A.; Diamantini, C.; Mircoli, A.; Potena, D. (2021). Emotion and sentiment analysis of tweets using BERT. Workshop Proceedings of the EDBT/ICDT 2021 Joint Conference, Nicosia, Cyprus.

Galinato, V.; Amores, L.; Magsino, G.B.; Sumawang, D.R. (2023). Context-based profanity detection and censorship using Bidirectional Encoder Representations from Transformers. SSRN, 4341604. http://doi.org/10.2139/ssrn.4341604.

Kaur, K.; Kaur, P. (2023). Improving BERT model for requirements classification by bidirectional LSTM-CNN deep model. Computers and Electrical Engineering, 108, 108699. https://doi.org/10.1016/j.compeleceng.2023.108699.

Dao, T.A.; Aizawa, A. (2023). Evaluating the effect of letter case on named entity recognition performance. In Natural Language Processing and Information Systems. NLDB 2023. Métais, E., Meziane, F., Sugumaran, V., Manning, W., Reiff-Marganiec, S., Eds.; Lecture Notes in Computer Science, 13913; Springer: Cham, Switzerland. https://doi.org/10.1007/978-3-031-35320-8_45.

Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Regalado, R.V.J.; Cheng, C.K. (2012, November). Feature-based subjectivity classification of Filipino text. 2012 International Conference on Asian Language Processing, 57–60.

Sunarya, P.A.; Refianti, R.; Mutiara, A.B.; Octaviani, W. (2019). Comparison of accuracy between convolutional neural networks and Naïve Bayes classifiers in sentiment analysis on Twitter. International Journal of Advanced Computer Science and Applications, 10(5). http://doi.org/10.14569/IJACSA.2019.0100511.

Read Full-Text

About this article

SUBMITTED: 05 September 2024
ACCEPTED: 05 October 2024
PUBLISHED: 24 October 2024
SUBMITTED to ACCEPTED: 31 days
DOI: https://doi.org/10.53623/gisa.v4i2.491

Cite this article

Mendoza, M. C. O., Nadurata, W. G. S., Oritz, M. G. E., Padlan, J. M. L., & Ponay, C. S. (2024). A Sentiment Analysis of Hate Speech in Philippine Election-Related Posts Using BERT Combined with Convolutional Neural Networks and Model Variations Incorporating Hashtags and ALL-CAPS. Green Intelligent Systems and Applications, 4(2), 66–79. https://doi.org/10.53623/gisa.v4i2.491

Accessed

965

Citations

Share this article

A Sentiment Analysis of Hate Speech in Philippine Election-Related Posts Using BERT Combined with Convolutional Neural Networks and Model Variations Incorporating Hashtags and ALL-CAPS

Green Intelligent Systems and Applications

Volume 4 - Issue 2 - 2024

Abstract

References