A Deep Learning-Based Framework Using CNN+LSTM for Karate Kata Classification and Correctness Evaluation

Nur Abdulrahman; Zahir Zainuddin; Ingrid Nurtanio

doi:10.48084/etasr.12147

Authors

Nur Abdulrahman Department of Informatics Engineering, Faculty of Engineering, Hasanuddin University, Indonesia
Zahir Zainuddin Department of Informatics Engineering, Faculty of Engineering, Hasanuddin University, Indonesia
Ingrid Nurtanio Department of Informatics Engineering, Faculty of Engineering, Hasanuddin University, Indonesia

Volume: 16 | Issue: 1 | Pages: 32619-32624 | February 2026 | https://doi.org/10.48084/etasr.12147

Received: 15 May 2025 | Revised: 5 August 2025, 12 October 2025, and 25 October 2025 | Accepted: 29 October 2025 | Online: 9 February 2026

Corresponding author: Zahir Zainuddin

Abstract

This paper presents a deep learning-based framework for automated classification and correctness evaluation of karate kata sequences using human pose data. The method utilizes MediaPipe BlazePose to extract 3D body keypoints from video frames, which are subsequently transformed into 132-dimensional vectors and temporally normalized into fixed-length sequences. Two neural architectures are evaluated: a baseline Convolutional Neural Network (CNN) and a hybrid CNN combined with Long Short-Term Memory (CNN+LSTM). Both models perform dual-output predictions: multi-class kata identification and binary correctness classification. Experimental results demonstrate that the CNN+LSTM model performs better in classification accuracy and movement assessment, with up to 95.74% accuracy and F1-scores exceeding 95% for several kata classes on unseen data. The findings highlight the importance of temporal modeling for structured movement analysis and establish a foundation for intelligent martial arts evaluation systems.

Keywords:

human pose estimation, karate kata, CNN, LSTM, multitask learning

References

S. L. Colyer, M. Evans, D. P. Cosker, and A. I. T. Salo, "A Review of the Evolution of Vision-Based Motion Analysis and the Integration of Advanced Computer Vision Methods Towards Developing a Markerless System," Sports Medicine - Open, vol. 4, no. 1, Dec. 2018, Art. no. 24. DOI: https://doi.org/10.1186/s40798-018-0139-y

Y. Li, K. Li, X. Wang, and R. Y. D. Xu, "Exploring temporal consistency for human pose estimation in videos," Pattern Recognition, vol. 103, Jul. 2020, Art. no. 107258. DOI: https://doi.org/10.1016/j.patcog.2020.107258

K. Armstrong, A. Rodrigues, A. P. Willmott, L. Zhang, and X. Ye, "Validation of Human Pose Estimation and Human Mesh Recovery for Extracting Clinically Relevant Motion Data from Videos." arXiv, 2025.

R. Sun, Z. Lin, S. Leng, A. Wang, and L. Zhao, "An In-Depth Analysis of 2D and 3D Pose Estimation Techniques in Deep Learning: Methodologies and Advances," Electronics, vol. 14, no. 7, Mar. 2025, Art. no. 1307. DOI: https://doi.org/10.3390/electronics14071307

S. Park, J. Hwang, and N. Kwak, "3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information," in Computer Vision – ECCV 2016 Workshops, vol. 9915, G. Hua and H. Jégou, Eds. Springer International Publishing, 2016, pp. 156–169. DOI: https://doi.org/10.1007/978-3-319-49409-8_15

D. C. Luvizon, D. Picard, and H. Tabia, "2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, Jun. 2018, pp. 5137–5146. DOI: https://doi.org/10.1109/CVPR.2018.00539

J. Stenum, K. M. Cherry-Allen, C. O. Pyles, R. D. Reetzke, M. F. Vignos, and R. T. Roemmich, "Applications of Pose Estimation in Human Health and Performance across the Lifespan," Sensors, vol. 21, no. 21, Nov. 2021, Art. no. 7315. DOI: https://doi.org/10.3390/s21217315

H. L. Cornman, J. Stenum, and R. T. Roemmich, "Video-based quantification of human movement frequency using pose estimation: A pilot study," PLOS ONE, vol. 16, no. 12, Dec. 2021, Art. no. e0261450. DOI: https://doi.org/10.1371/journal.pone.0261450

M. B. Holte, C. Tran, M. M. Trivedi, and T. B. Moeslund, "Human Pose Estimation and Activity Recognition From Multi-View Videos: Comparative Explorations of Recent Developments," IEEE Journal of Selected Topics in Signal Processing, vol. 6, no. 5, pp. 538–552, Sep. 2012. DOI: https://doi.org/10.1109/JSTSP.2012.2196975

"Top 50 Most Popular Martial Arts in the World 2025," Country Cassette, May 21, 2024. https://countrycassette.com/most-popular-martial-arts-in-the-world/.

Y. W. Chao, J. Yang, B. Price, S. Cohen, and J. Deng, "Forecasting Human Dynamics from Static Images," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 3643–3651. DOI: https://doi.org/10.1109/CVPR.2017.388

J. Echeverria and O. C. Santos, "Toward Modeling Psychomotor Performance in Karate Combats Using Computer Vision Pose Estimation," Sensors, vol. 21, no. 24, Dec. 2021, Art. no. 8378. DOI: https://doi.org/10.3390/s21248378

V. Bazarevsky, I. Grishchenko, K. Raveendran, T. Zhu, F. Zhang, and M. Grundmann, "BlazePose: On-device Real-time Body Pose tracking." arXiv, 2020.

S. Zhang, C. Wang, W. Dong, and B. Fan, "A Survey on Depth Ambiguity of 3D Human Pose Estimation," Applied Sciences, vol. 12, no. 20, Oct. 2022, Art. no. 10591. DOI: https://doi.org/10.3390/app122010591

G. Cheron, I. Laptev, and C. Schmid, "P-CNN: Pose-Based CNN Features for Action Recognition," in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, Dec. 2015, pp. 3218–3226. DOI: https://doi.org/10.1109/ICCV.2015.368

M. H. D. L. Cruz, U. Solache, A. Luna-Álvarez, S. R. Zagal-Barrera, D. A. Morales López, and D. Mujica-Vargas, "CNN 1D: A Robust Model for Human Pose Estimation," Information, vol. 16, no. 2, Feb. 2025, Art. no. 129. DOI: https://doi.org/10.3390/info16020129

Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, "PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes." arXiv, 2017. DOI: https://doi.org/10.15607/RSS.2018.XIV.019

G. Ercolano and S. Rossi, "Combining CNN and LSTM for activity of daily living recognition with a 3D matrix skeleton representation," Intelligent Service Robotics, vol. 14, no. 2, pp. 175–185, Apr. 2021. DOI: https://doi.org/10.1007/s11370-021-00358-7

B. G. Priya and D. M. Arulselvi, "Action Classification for Karate Dataset Using Deep Learning," in International Conference INTELINC, Oct. 2018, pp. 212–216.

N. U. R. Malik, S. A. R. Abu-Bakar, U. U. Sheikh, A. Channa, and N. Popescu, "Cascading Pose Features with CNN-LSTM for Multiview Human Action Recognition," Signals, vol. 4, no. 1, pp. 40–55, Jan. 2023. DOI: https://doi.org/10.3390/signals4010002

S. M. Saeed, H. Akbar, T. Nawaz, H. Elahi, and U. S. Khan, "Body-Pose-Guided Action Recognition with Convolutional Long Short-Term Memory (LSTM) in Aerial Videos," Applied Sciences, vol. 13, no. 16, Aug. 2023, Art. no. 9384. DOI: https://doi.org/10.3390/app13169384

G. Lan, Y. Wu, F. Hu, and Q. Hao, "Vision-Based Human Pose Estimation via Deep Learning: A Survey," IEEE Transactions on Human-Machine Systems, vol. 53, no. 1, pp. 253–268, Feb. 2023. DOI: https://doi.org/10.1109/THMS.2022.3219242