Cost-Sensitive Fake Profile Detection in Online Social Networks Using Random Forest Feature Selection and LightGBM

Hedia Zardi; Raneem Alreshoodi

doi:10.48084/etasr.15297

Authors

Hedia Zardi Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia
Raneem Alreshoodi Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia

Volume: 16 | Issue: 1 | Pages: 30906-30912 | February 2026 | https://doi.org/10.48084/etasr.15297

Received: 3 October 2025 | Revised: 23 October 2025 and 8 November 2025 and | Accepted: 10 November 2025 | Online: 7 December 2025

Corresponding author: Hedia Zardi

Abstract

The proliferation of fake profiles on Online Social Networks (OSNs) presents serious risks to privacy, security, and trust. Traditional detection methods often struggle with large-scale data and fail to keep up with evolving tactics of malicious actors, highlighting the need for scalable, interpretable machine learning solutions. This study introduces a cost-sensitive and interpretable framework for identifying fake profiles by combining Random Forest (RF) feature selection with advanced gradient boosting models, specifically eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM). The framework was tested on the MIB Twitter dataset using a user-level split to prevent data leakage and ensure a realistic evaluation. Results show that LightGBM achieved the highest Cost-Sensitive Accuracy (CSA) of 0.96, surpassing XGBoost by 2.8% and RF by 4.6%, while training approximately 20% faster than XGBoost. These findings demonstrate that LightGBM strikes the best balance among predictive accuracy, cost sensitivity, and computational efficiency. By focusing on CSA as a key performance metric, this work highlights the importance of reducing false negatives in OSNs, where undetected fake accounts can cause more harm than false positives. Overall, the proposed framework offers a practical, scalable, and interpretable solution for real-time detection of fake profiles on online social networks. This study demonstrates that combining feature selection with cost-sensitive boosting effectively improves trust and security on large online social platforms.

Keywords:

fake profile detection, OSNs, CSA, user-level leakage, machine learning, scalable fake account detection

References

M. A. Wani and S. Jabin, "A sneak into the Devil’s Colony - Fake Profiles in Online Social Networks." arXiv, May 31, 2017.

R. Kareem and W. Bhaya, "Fake Profiles Types of Online Social Networks: A Survey," International Journal of Engineering & Technology, vol. 7, no. 4.19, pp. 919–925, Nov. 2018. DOI: https://doi.org/10.14419/ijet.v7i4.19.28071

N. C. Lê, M.-T. Dao, H.-L. Nguyen, T.-N. Nguyen, and H. Vu, "An Application of Random Walk on Fake Account Detection Problem: A Hybrid Approach," in 2020 RIVF International Conference on Computing and Communication Technologies (RIVF), July 2020, pp. 1–6. DOI: https://doi.org/10.1109/RIVF48685.2020.9140749

K. N. Rao, D. Uma Devi, P. Sreekanth, and D. Soujanya, "Detection of fake social media profiles using machine learning techniques," IJO–International Journal of Computer Science and Engineering, vol. 6, no. 5, pp. 01–16, May 2023.

S. D. Muñoz and E. Paul Guillén Pinto, "A dataset for the detection of fake profiles on social networking services," in 2020 International Conference on Computational Science and Computational Intelligence (CSCI), Sept. 2020, pp. 230–237. DOI: https://doi.org/10.1109/CSCI51800.2020.00046

A. Sallah, E. A. A. Alaoui, S. C. K. Tekouabou, and S. Agoujil, "Machine learning for detecting fake accounts and genetic algorithm-based feature selection," Data & Policy, vol. 6, Jan. 2024, Art. no. e15. DOI: https://doi.org/10.1017/dap.2023.46

A. Sarfraz, A. Ahmad, F. Zeshan, M. Hamid, and T. A. N. Alshalali, "Unmasking deception: detection of fake profiles in online social ecosystems," Journal of Big Data, vol. 12, no. 1, Aug. 2025, Art. no. 214. DOI: https://doi.org/10.1186/s40537-025-01254-y

S. Ahmad and D. M. M. Tripathi, "A Review Article on Detection of Fake Profile on Social-Media," International Journal of Innovative Research in Computer Science and Technology Journal, vol. 11, no. 2, pp. 44–49, Apr. 2023. DOI: https://doi.org/10.55524/ijircst.2023.11.2.9

A. K. M. Rubaiyat Reza Habib, E. Elijah Akpan, B. Ghosh, and I. K. Dutta, "Techniques to Detect Fake Profiles on Social Media Using the New Age Algorithms - A Survey," in 2024 IEEE 14th Annual Computing and Communication Workshop and Conference (CCWC), Jan. 2024, pp. 0329–0335. DOI: https://doi.org/10.1109/CCWC60891.2024.10427620

B. Goyal, N. S. Gill, and P. Gulia, "Securing social spaces: machine learning techniques for fake profile detection on instagram," Social Network Analysis and Mining, vol. 14, no. 1, Dec. 2024, Art. no. 231. DOI: https://doi.org/10.1007/s13278-024-01399-3

H. Zardi and H. Alrajhi, "Anomaly Discover: A New Community-based Approach for Detecting Anomalies in Social Networks," International Journal of Advanced Computer Science and Applications (IJACSA), vol. 14, no. 4, pp. 912-920, Apr. 2023. DOI: https://doi.org/10.14569/IJACSA.2023.01404101

H. Zardi, H. Karamti, W. Karamti, and N. S. Alghamdi, "Detecting Anomalies in Network Communities Based on Structural and Attribute Deviation," Applied Sciences, vol. 12, no. 22, Jan. 2022, Art. no. 11791. DOI: https://doi.org/10.3390/app122211791

A. S. Dehkordi and A. N. Zehmakan, "Graph-based Fake Account Detection: A Survey." arXiv, July 10, 2025.

R. Iranzad and X. Liu, "A review of random forest-based feature selection methods for data science education and applications," International Journal of Data Science and Analytics, vol. 20, no. 2, pp. 197–211, Aug. 2025. DOI: https://doi.org/10.1007/s41060-024-00509-w

S. Xia and Y. Yang, "A Model-Free Feature Selection Technique of Feature Screening and Random Forest-Based Recursive Feature Elimination," International Journal of Intelligent Systems, vol. 2023, no. 1, p. 2400194, 2023. DOI: https://doi.org/10.1155/2023/2400194

M. B. Kursa and W. R. Rudnicki, "The All Relevant Feature Selection using Random Forest." arXiv, June 25, 2011.

G. Ke et al., "LightGBM: a highly efficient gradient boosting decision tree," in Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA, Sept. 2017, pp. 3149–3157.

E. Essa, K. Omar, and A. Alqahtani, "Fake news detection based on a hybrid BERT and LightGBM models," Complex & Intelligent Systems, vol. 9, no. 6, pp. 6581–6592, Dec. 2023. DOI: https://doi.org/10.1007/s40747-023-01098-0

F. Vandervorst, B. Deprez, W. Verbeke, and T. Verdonck, "Inductive inference of gradient-boosted decision trees on graphs for insurance fraud detection." arXiv, Oct. 07, 2025. DOI: https://doi.org/10.2139/ssrn.4887265

`Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, "Cost-sensitive boosting for classification of imbalanced data," Pattern Recognition, vol. 40, no. 12, pp. 3358–3378, Dec. 2007. DOI: https://doi.org/10.1016/j.patcog.2007.04.009

W. Lee, W. Fan, M. Miller, S. J. Stolfo, and E. Zadok, "Toward cost-sensitive modeling for intrusion detection and response," Journal of Computer Security, vol. 10, no. 1–2, pp. 5–22, Jan. 2002. DOI: https://doi.org/10.3233/JCS-2002-101-202

H. He and E. A. Garcia, "Learning from Imbalanced Data," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, Sept. 2009,. DOI: https://doi.org/10.1109/TKDE.2008.239

M. Kuhn and K. Johnson, Applied Predictive Modeling. New York, NY, USA: Springer, 2013 DOI: https://doi.org/10.1007/978-1-4614-6849-3

G. E. P. Box and D. R. Cox, "An Analysis of Transformations," Journal of the Royal Statistical Society. Series B (Methodological), vol. 26, no. 2, pp. 211–252, 1964. DOI: https://doi.org/10.1111/j.2517-6161.1964.tb00553.x

L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001. DOI: https://doi.org/10.1023/A:1010933404324

S. M. Lundberg and S.-I. Lee, "A unified approach to interpreting model predictions," in Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA, Sept. 2017, pp. 4768–4777.

D. W. Hosmer, Jr., S. Lemeshow, and R. X. Sturdivant, Applied Logistic Regression, 3rd ed. Hoboken, NJ, USA: John Wiley & Sons, 2013. DOI: https://doi.org/10.1002/9781118548387

T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, May 2016, pp. 785–794. DOI: https://doi.org/10.1145/2939672.2939785

J. A. Hanley and B. J. McNeil, "The meaning and use of the area under a receiver operating characteristic (ROC) curve," Radiology, vol. 143, no. 1, pp. 29–36, Apr. 1982. DOI: https://doi.org/10.1148/radiology.143.1.7063747

J. Davis and M. Goadrich, "The relationship between Precision-Recall and ROC curves," in Proceedings of the 23rd international conference on Machine learning, New York, NY, USA, Mar. 2006, pp. 233–240. DOI: https://doi.org/10.1145/1143844.1143874

C. Elkan, "The foundations of cost-sensitive learning," in Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2, San Francisco, CA, USA, May 2001, pp. 973–978.

Y. A. Alsariera, M. H. Alanazi, Y. Said, and F. Allan, "An Investigation of AI-Based Ensemble Methods for the Detection of Phishing Attacks," Engineering, Technology & Applied Science Research, vol. 14, no. 3, pp. 14266–14274, June 2024. DOI: https://doi.org/10.48084/etasr.7267

A. N. Abdullah, "Development of an Intrusion Detection System using an Ensemble Voting Machine Learning Technique," Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 23917–23922, June 2025. DOI: https://doi.org/10.48084/etasr.10764

S. Kumari and M. P. Singh, "A Deep Learning Multimodal Framework for Fake News Detection," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 16527–16533, Oct. 2024. DOI: https://doi.org/10.48084/etasr.8170

Z. Saad Rubaidi, B. Ben Ammar, and M. Ben Aouicha, "Comparative Data Oversampling Techniques with Deep Learning Algorithms for Credit Card Fraud Detection," in Intelligent Systems Design and Applications, Cham, 2023, pp. 286–296. DOI: https://doi.org/10.1007/978-3-031-27440-4_27