Skip to main content

Machine Learning Predictive Models Analysis on Telecommunications Service Churn Rate

Author(s): Teuku Alif Rafi Akbar , Catur Apriono
Author(s) information:
Department of Electrical Engineering, Faculty of Engineering, Universitas Indonesia, Kampus Baru UI Depok, Jawa Barat, 16424, Indonesia

Corresponding author

Customer churn frequently occurs in the telecommunications industry, which provides services and can be detrimental to companies. A predictive model can be useful in determining and analyzing the causes of churn actions taken by customers. This paper aims to analyze and implement machine learning models to predict churn actions using Kaggle data on customer churn. The models considered for this research include the XG Boost Classifier algorithm, Bernoulli Naïve Bayes, and Decision Tree algorithms. The research covers the steps of data preparation, cleaning, and transformation, exploratory data analysis (EDA), prediction model design, and analysis of accuracy, F1 Score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUC) score. The EDA results indicate that the contract type, length of tenure, monthly invoice, and total bill are the most influential features affecting churn actions. Among the models considered, the XG Boost Classifier algorithm achieved the highest accuracy and F1 score of 81.59% and 74.76%, respectively. However, in terms of efficiency, the Bernoulli Naïve Bayes and Decision Tree algorithms outperformed XG Boost, with AUC scores of 0.7469 and 0.7468, respectively.

Xu, T.; Ma, Y.; Kim, K. (2021). Telecom Churn Prediction System Based on Ensemble Learning Using Feature Grouping. Applied Sciences, 11, 4742. https://doi.org/10.3390/APP11114742.

Germann, F.; Lilien, G.L.; Moorman, C.; Fiedler, L.; Groβmaβ, T. (2020). Driving Customer Analytics From the Top. Customer Needs and Solutions, 7, 43-61. https://doi.org/10.1007/S40547-020-00109-2.

Nhu, N.Y.; Van Ly, T.; Truong Son, D.V. (2022). Churn Prediction in Telecommunication Industry Using Kernel Support Vector Machines. PLoS ONE, 17, e0267935. https://doi.org/10.1371/journal.pone.0267935.

The Telco Churn Management Handbook (accessed on 22 February 2023) Available online: https://books.google.co.id/books?hl=en&lr=&id=M_uuQx7vMngC&oi=fnd&pg=PA1&dq=Mattison+R.+Churn+Taxonomy.+In:+The+telco+churn+management+handbook.+Oakwood+Hills,+IL:+Xit+Press&ots=QHcczOeJRa&sig=If_VOjYpMoa-pZyOVMMXZbvaF58&redir_esc=y#v=onepage&q&f=false.

Domingos, E.; Ojeme, B.; Daramola, O. (2021). Experimental Analysis of Hyperparameters for Deep Learning-Based Churn Prediction in the Banking Sector. Computation, 9, 34. https://doi.org/10.3390/COMPUTATION9030034.

Ahmed, H.M.S. (2019). The Impact of Customer Churn Factors (CCF) on Customer's Loyalty. International Journal of Customer Relationship Marketing and Management, 10, 48-70. https://doi.org/10.4018/IJCRMM.2019010104.

Panchal, M.N.; Anala, D.; Pandit, A. (2020). Churn Prediction Using Supervised Machine Learning Algorithms - Impact of Oversampling. International Research Journal of Engineering and Technology, 7, 1014-1019.

Kaya, E.; Dong, X.; Suhara, Y.; Balcisoy, S.; Bozkaya, B.; Pentland, A. S. (2018). Behavioral Attributes and Financial Churn Prediction. EPJ Data Science, 7, 41. https://doi.org/10.1140/EPJDS/S13688-018-0165-5.

Thakkar, H.K.; Desai, A.; Ghosh, S.; Singh, P.; Sharma, G. (2022). Clairvoyant: AdaBoost with Cost-Enabled Cost-Sensitive Classifier for Customer Churn Prediction. Computational Intelligence and Neuroscience, 2022, 9028580. https://doi.org/10.1155/2022/9028580.

Semeraro, G.; Vassilakis, C.; Saias, J.; Rato, L.; Gonçalves, T. (2022). An Approach to Churn Prediction for Cloud Services Recommendation and User Retention. Information, 13, 227. https://doi.org/10.3390/INFO13050227.

de Lima Lemos, R.A.; Silva, T.C.; Tabak, B.M. (2022). Propension to Customer Churn in a Financial Institution: A Machine Learning Approach. Neural Computing and Applications, 34, 11751–11768. https://doi.org/10.1007/S00521-022-07067-X/FIGURES/10.

Panjasuchat, M.; Limpiyakorn, Y. (2020). Applying Reinforcement Learning for Customer Churn Prediction. Journal of Physics: Conference Series, 1619, 012016. https://doi.org/10.1088/1742-6596/1619/1/012016.

Hu, X.; Yang, Y.; Chen, L.; Zhu, S. (2020). Research on a Customer Churn Combination Prediction Model Based on Decision Tree and Neural Network. 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA 2020), pp. 129–132. https://doi.org/10.1109/ICCCBDA49378.2020.9095611.

Oskarsdottir, M.; Bravo, C.; Verbeke, W.; Sarraute, C.; Baesens, B.; Vanthienen, J. (2016). A Comparative Study of Social Network Classifiers for Predicting Churn in the Telecommunication Industry. Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016), pp. 1151–1158. https://doi.org/10.1109/ASONAM.2016.7752384.

[Zhao, Y.; Li, B.; Li, X.; Liu, W.; Ren, S. (2005). Customer Churn Prediction Using Improved One-Class Support Vector Machine. Lecture Notes in Computer Science, 3584, 300–306. https://doi.org/10.1007/11527503_36/COVER.

A Support Vector Machine Approach for Churn Prediction in Telecom Industry. (accessed on 22 February 2023) Available online: https://www.researchgate.net/publication/264534919_A_Support_Vector_Machine_Approach_for_Churn_Prediction_in_Telecom_Industry.

Ebrah, K.; Elnasir, S.; Ebrah, K.; Elnasir, S. (2019). Churn Prediction Using Machine Learning and Recommendations Plans for Telecoms. Journal of Computer and Communications, 7, 11, 33–53. https://doi.org/10.4236/JCC.2019.711003.

Ajitha, P.; Sivasangari, A.; Gomathi, R.M.; Indira, K. (2020). Prediction of Customer Plan Using Churn Analysis for Telecom Industry. Recent Advances in Computer Science and Communications, 13, 926–929. https://doi.org/10.2174/2213275912666190410114104.

Lu, N.; Lin, H.; Lu, J.; Zhang, G. (2014). A Customer Churn Prediction Model in Telecom Industry Using Boosting. IEEE Transactions on Industrial Informatics, 10, 2, 1659-1665. https://doi.org/10.1109/TII.2012.2224355.

Verbekea, W.; Martensa, D.; Muesc, C.; Baesensa, B. (2011). Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Systems with Applications, 38, 3, 2354-2364. https://doi.org/10.1016/j.eswa.2010.08.023.

Gerpott, T. J.; Rams, W.; Schindler, A. (2001). Customer Retention, Loyalty, and Satisfaction in the German Mobile Cellular Telecommunications Market. Telecommun Policy, 25, 4, 249–269. https://doi.org/10.1016/s0308-5961(00)00097-5.

Wei, C. P.; Chiu, I. T. (2002) Turning Telecommunications Call Details to Churn Prediction: A Data Mining Approach. Expert Syst Appl, 23, 2, 103–112. https://doi.org/10.1016/s0957-4174(02)00030-1.

Jain, H., Khunteta, A.; Srivastava, S. (2020). Churn Prediction in Telecommunication Using Logistic Regression and Logit Boost. Procedia Computer Science, 167, 101–112. https://doi.org/10.1016/J.PROCS.2020.03.187.

Ullah, I.; Raza, B.; Malik, A. K.; Imran, M.; Islam, S. U.; Kim, S. W. (2019). A Churn Prediction Model Using Random Forest: Analysis of Machine Learning Techniques for Churn Prediction and Factor Identification in Telecom Sector. IEEE Access, 7, 60134–60149. https://doi.org/10.1109/ACCESS.2019.2914999.

Geiler, L.; Affeldt, S.; Nadif, M. (2022). A Survey on Machine Learning Methods for Churn Prediction. International Journal of Data Science and Analytics, 14, 3, 217–242. https://doi.org/10.1007/S41060-022-00312-5.

AL-Najjar, D.; Al-Rousan, N.; AL-Najjar, H. (2022). Machine Learning to Develop Credit Card Customer Churn Prediction. Journal of Theoretical and Applied Electronic Commerce Research, 17, 4, 1529–1542. https://doi.org/10.3390/jtaer17040077.

Tanha, J.; van Someren, M.; Afsarmanesh, H. (2017). Semi-Supervised Self-Training for Decision Tree Classifiers. International Journal of Machine Learning and Cybernetics, 8, 1, 355–370. https://doi.org/10.1007/s13042-015-0328-7.

Ahmed, M.; Afzal, H.; Siddiqi, I.; Amjad, M. F.; Khurshid, K. (2018). Exploring Nested Ensemble Learners Using Overproduction and Choose Approach for Churn Prediction in Telecom Industry. Neural Comput. Appl., 32, 8, 3237–3251. https://doi.org/10.1007/s00521-018-3678-8.

Hudaib, A.; Dannoun, R.; Harfoushi, O.; Obiedat, R.; Faris, H. (2015). Hybrid Data Mining Models for Predicting Customer Churn. International Journal of Communications, Network and System Sciences, 8, 5, 91–96. https://doi.org/10.4236/IJCNS.2015.85012.

Lynn, P. (2019).The Advantage and Disadvantage of Implicitly Stratified Sampling. Methods, Data, Analyses, 13, 2, 253–266. https://doi.org/10.12758/mda.2018.02.

May, R. J.; Maier, H. R.; Dandy, G. C. (2010). Data Splitting for Artificial Neural Networks Using SOM-Based Stratified Sampling. Neural Networks, 23, 2, 283–294. https://doi.org/10.1016/j.neunet.2009.11.009.

Tsangaratos, P.; Ilia, I. (2016). Comparison of a Logistic Regression and Naïve Bayes Classifier in Landslide Susceptibility Assessments: The Influence of Models Complexity and Training Dataset Size. Catena, 145, 164–179. https://doi.org/10.1016/j.catena.2016.06.004.

Alkan, A.; Günay, M. (2012). Identification of EMG Signals Using Discriminant Analysis and SVM Classifier. Expert Systems with Applications, 39, 1, 44–47. https://doi.org/10.1016/J.ESWA.2011.06.043.

Li, X.; Wang, L.; Sung, E. (2008). AdaBoost with SVM-Based Component Classifiers. Engineering Applications of Artificial Intelligence, 21, 5, 785–795. https://doi.org/10.1016/j.engappai.2007.07.001.

Tian, J.; Morillo, C.; Azarian, M. H.; Pecht, M. (2016). Motor Bearing Fault Detection Using Spectral Kurtosis-Based Feature Extraction Coupled with K-Nearest Neighbor Distance Analysis. IEEE Transactions on Industrial Electronics, 63, 3, 1793–1803. https://doi.org/10.1109/TIE.2015.2509913.

Bhukya, D. P.; Ramachandram, S. (2010). Decision Tree Induction: An Approach for Data Classification Using AVL-Tree. International Journal of Computer and Electrical Engineering, 660–665. https://doi.org/10.7763/IJCEE.2010.V2.208.

Song, Y. Y.; Lu, Y. (2015). Decision Tree Methods: Applications for Classification and Prediction. Shanghai Archives of Psychiatry, 27, 2, 130. https://doi.org/10.11919/J.ISSN.1002-0829.215044.

Blagus, R.; Lusa, L. (2017). Gradient Boosting for High-Dimensional Prediction of Rare Events. Computational Statistics and Data Analysis, 113, 19–37. https://doi.org/10.1016/j.csda.2016.07.016.

Chen, Z.; Jiang, F.; Cheng, Y.; Gu, X.; Liu, W.; Peng, J. (2018). XGBoost Classifier for DDoS Attack Detection and Analysis in SDN-Based Cloud. Proceedings - 2018 IEEE International Conference on Big Data and Smart Computing, 251–256. https://doi.org/10.1109/BIGCOMP.2018.00044.

Ahmad, A. K.; Jafar, A.; Aljoumaa, K. (2019). Customer Churn Prediction in Telecom Using Machine Learning in Big Data Platform. Journal of Big Data, 6, 1, 1–24. https://doi.org/10.1186/S40537-019-0191-6.

Telco Customer Churn (11.1.3+). (accessed on 22 February 2023) Available online: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113.

Joseph, V. R. (2022). Optimal Ratio for Data Splitting. Statistical Analysis and Data Mining, 15, 4, 531–538. https://doi.org/10.1002/SAM.11583.

Stratified Sampling: You May Have Been Splitting Your Dataset All Wrong | Towards Data Science. (accessed on 28 May 2023) Available online: https://towardsdatascience.com/stratified-sampling-you-may-have-been-splitting-your-dataset-all-wrong-8cfdd0d32502.

Ağbulut, Ü.; Gürel, A. E.; Biçen, Y. (2021). Prediction of Daily Global Solar Radiation Using Different Machine Learning Algorithms: Evaluation and Comparison. Renewable and Sustainable Energy Reviews, 135, 110114. https://doi.org/10.1016/J.RSER.2020.110114.

Roshan, V.; Stewart, J. H. M.; Joseph, R.; Stewart, H. M. (2022). Optimal Ratio for Data Splitting. Statistical Analysis and Data Mining: The ASA Data Science Journal, 15, 4, 531–538. https://doi.org/10.1002/SAM.11583.

Joseph, V. R.; Vakayil, A. (2022). Split: An Optimal Method for Data Splitting. Technometrics, 64, 2, 166–176. https://doi.org/10.1080/00401706.2021.1921037.

Batini, C.; Cappiello, C.; Francalanci, C.; Maurino, A. (2009). Methodologies for Data Quality Assessment and Improvement. ACM Computing Surveys (CSUR), 41, 3, 16, 1-52. https://doi.org/10.1145/1541880.1541883.

Hoo, Z. H.; Candlish, J.; Teare, D. (2021). What Is an ROC Curve?. Emergency Medicine Journal, 34, 6, 357–359. http://doi.org/10.1136/emermed-2017-206735.

About this article

SUBMITTED: 19 April 2023
ACCEPTED: 31 May 2023
PUBLISHED: 7 June 2023
SUBMITTED to ACCEPTED: 42 days
DOI: https://doi.org/10.53623/gisa.v3i1.249

Cite this article
Akbar, T. A. R., & Apriono , C. . (2023). Machine Learning Predictive Models Analysis on Telecommunications Service Churn Rate . Green Intelligent Systems and Applications, 3(1), 22–34. https://doi.org/10.53623/gisa.v3i1.249
Accessed
942
Citations
0
Share this article