Optimized Transfer Learning Models for Rail Surface Defect Classification with Explainable AI Validation

Murat Alparslan Gungor; Kenan Gencol

doi:10.48084/etasr.19215

Authors

Murat Alparslan Gungor Department of Electrical and Electronics Engineering, Faculty of Engineering and Natural Sciences, Hitit University, Corum, Turkiye
Kenan Gencol Department of Electrical and Electronics Engineering, Faculty of Engineering and Natural Sciences, Hitit University, Corum, Turkiye

Volume: 16 | Issue: 4 | Pages: 38037-38044 | August 2026 | https://doi.org/10.48084/etasr.19215

Received: 10 April 2026 | Revised: 28 May 2026 | Accepted: 1 June 2026 | Online: 24 June 2026

Corresponding author: Kenan Gencol

Abstract

Rail surface defects such as flaking, spalling, and squat are critical indicators of railway track degradation and require reliable and efficient automated inspection systems to ensure operational safety. This study presents a comparative evaluation of seven state-of-the-art pre-trained Convolutional Neural Network (CNN) architectures for rail surface defect classification, namely Inception-V3, MobileNet-V1, MobileNet-V2, MobileNet-V3, NasNetMobile, ResNet50, and EfficientNet-B0. Transfer learning was employed by freezing the convolutional backbones and optimizing the classifier head using Optuna-based hyperparameter search. The models were trained and evaluated on a benchmark rail surface defect dataset containing balanced samples of the three defect classes after redundancy reduction and class equalization. Performance was assessed using quality metrics, whereas computational efficiency was analyzed using Floating-Point Operations (FLOPs) and parameter complexity. Experimental results show that MobileNet-V2 achieves the highest classification accuracy of 81.6% with 599M FLOPs, whereas MobileNet-V3 achieves competitive accuracy of 78.5% with the lowest computational cost of only 132M FLOPs. Inception-V3 also demonstrates strong performance with 80.8% accuracy but requires substantially higher computational complexity (11.4B FLOPs). Pareto analysis confirmed that both MobileNet variants provide the best efficiency–accuracy trade-off among all evaluated models. To improve interpretability, Gradient-weighted Class Activation Mapping (Grad-CAM) was applied to the Pareto-optimal models. The visualizations revealed that MobileNet-V2 generally focuses more consistently on defect-relevant regions, particularly for flaking defects. The findings highlight the importance of combining performance benchmarking with Explainable Artificial Intelligence (XAI) analysis for safety-critical railway monitoring applications.

Keywords:

rail surface defect, Convolutional Neural Network (CNN), classification, Explainable AI (XAI)

References

[1] A. K. Chhotu and S. K. Suman, "Predicting the Severity of Accidents at Highway Railway Level Crossings of the Eastern Zone of Indian Railways using Logistic Regression and Artificial Neural Network Models," Engineering, Technology & Applied Science Research, vol. 14, no. 3, pp. 14028–14032, June 2024.

[2] T. Kara and M. C. Savas, "Design and Simulation of a Decentralized Railway Traffic Control System," Engineering, Technology & Applied Science Research, vol. 6, no. 2, pp. 945–951, Apr. 2016.

[3] A. Arain, S. Mehran, M. Z. Shaikh, D. Kumar, B. S. Chowdhry, and T. Hussain, "Railway track surface faults dataset," Data in Brief, vol. 52, Feb. 2024, Art. no. 110050.

[4] F. Afonso et al., "Surface defect detection systems for railway components," Procedia Structural Integrity, vol. 54, pp. 545–552, Jan. 2024.

[5] Y. Wang, B. Miao, Y. Zhang, Z. Huang, and S. Xu, "A Novel Rail Damage Fault Detection Method for High-Speed Railway," Sensors, vol. 25, no. 10, May 2025, Art. no. 3063.

[6] T. A. Alvarenga, A. L. Carvalho, L. M. Honorio, A. S. Cerqueira, L. M. A. Filho, and R. A. Nobrega, "Detection and Classification System for Rail Surface Defects Based on Eddy Current," Sensors, vol. 21, no. 23, Dec. 2021, Art. no. 7937.

[7] A. A. Shah, B. S. Chowdhry, T. D. Memon, I. H. Kalwar, and J. A. Ware, "Real Time Identification of Railway Track Surface Faults using Canny Edge Detector and 2D Discrete Wavelet Transform," Annals of Emerging Technologies in Computing, vol. 4, no. 2, pp. 53–60, Apr. 2020.

[8] C. Özdemi̇R and Y. Kaya, "Ray Bileşenlerinde Meydana Gelen Arızaların Görüntü İşleme Teknikleri ile Tespit Edilmesi," Bilişim Teknolojileri Dergisi, vol. 14, no. 1, pp. 105–113, Jan. 2021.

[9] M. Wang, K. Li, X. Zhu, and Y. Zhao, "Detection of Surface Defects on Railway Tracks Based on Deep Learning," IEEE Access, vol. 10, pp. 126451–126465, 2022.

[10] A. Kumar and S. P. Harsha, "A systematic literature review of defect detection in railways using machine vision-based inspection methods," International Journal of Transportation Science and Technology, vol. 18, pp. 207–226, June 2025.

[11] J. H. Feng, H. Yuan, Y. Q. Hu, J. Lin, S. W. Liu, and X. Luo, "Research on deep learning method for rail surface defect detection," IET Electrical Systems in Transportation, vol. 10, no. 4, pp. 436–442, Nov. 2020.

[12] Y. Zhang, T. Feng, Y. Song, Y. Shi, and G. Cai, "An Improved Target Network Model for Rail Surface Defect Detection," Applied Sciences, vol. 14, no. 15, Aug. 2024, Art. no. 6467.

[13] W. Yaodong, Y. Hang, G. Baoqing, S. Hongmei, and Y. Zujun, "Research on Real-Time Detection System of Rail Surface Defects Based on Deep Learning," IEEE Sensors Journal, vol. 24, no. 13, pp. 21157–21167, July 2024.

[14] R. Ozdemir and M. Koc, "On the enhancement of semi-supervised deep learning-based railway defect detection using pseudo-labels," Expert Systems with Applications, vol. 251, Oct. 2024, Art. no. 124105.

[15] W. Phusakulkajorn, J. Hendriks, Z. Li, and A. Núñez, "Spiking neural network with time-varying weights for rail squat detection," Applied Soft Computing, vol. 184, Dec. 2025, Art. no. 113689.

[16] Dileep Kumar, "Railway Track Surface Faults Dataset." Mendeley, Jan. 06, 2022.

[17] S. Jatoi et al., "From Detection to Diagnosis: Elevating Track Fault Identification with Transfer Learning," in 2024 International Conference on Robotics and Automation in Industry, Rawalpindi, Pakistan, 2024, pp. 1–6.

[18] J. Zhao, A. W.-L. Yeung, M. Ali, S. Lai, and V. T.-Y. Ng, "CBAM-SwinT-BL: Small Rail Surface Defect Detection Method Based on Swin Transformer With Block Level CBAM Enhancement," IEEE Access, vol. 12, pp. 181997–182009, 2024.

[19] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2818–2826.

[20] A. G. Howard et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications." arXiv, Apr. 17, 2017.

[21] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 4510–4520.

[22] A. Howard et al., "Searching for MobileNetV3," in 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019, pp. 1314–1324.

[23] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, "Learning Transferable Architectures for Scalable Image Recognition," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8697–8710.

[24] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.

[25] M. Tan and Q. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," in Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 2019, pp. 6105–6114.

[26] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, "Optuna: A Next-generation Hyperparameter Optimization Framework," in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019, pp. 2623–2631.

[27] M. Imani, A. Beikmohammadi, and H. R. Arabnia, "Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels," Technologies, vol. 13, no. 3, Mar. 2025, Art. no. 88.

[28] M. Sundararajan, A. Taly, and Q. Yan, "Axiomatic attribution for deep networks," in Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 2017, pp. 3319–3328.

[29] M. T. Ribeiro, S. Singh, and C. Guestrin, "‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 1135–1144.

[30] S. M. Lundberg and S.-I. Lee, "A Unified Approach to Interpreting Model Predictions," in 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 4768–4777.

[31] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization," in 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 618–626.

[32] S. Balanageshwara, V. Kumara, M. Badiger, and A. Naik, "Explainable AI for Precise Leaf Disease Diagnosis: A Comparative Study," Engineering, Technology & Applied Science Research, vol. 16, no. 2, pp. 33806–33812, Apr. 2026.

[33] A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian, "Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks," in 2018 IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, NV, USA, 2018, pp. 839–847.

[34] P. Rajpurkar and M. P. Lungren, "The Current and Future State of AI Interpretation of Medical Images," Obstetrical & Gynecological Survey, vol. 78, no. 11, pp. 634–635, Nov. 2023.