Multi-Modality Abnormal Crowd Detection with Self-Attention and Knowledge Distillation
Received: 20 June 2024 | Revised: 24 July 2024 | Accepted: 25 July 2024 | Online: 14 August 2024
Corresponding author: Huong-Giang Doan
Abstract
Deep Neural Networks (DNNs) have become a promising solution for detecting abnormal human behaviors. However, building an efficient DNN model in terms of both computational cost and classification accuracy is still a challenging problem. Furthermore, there are limited existing datasets for abnormal behavior detection, and each focuses on a certain context. Therefore, a DNN model trained on a certain dataset will be adaptive for a particular context and not suitable for others. This study proposes a DNN framework with efficient attention and Knowledge Distillation (KD) mechanisms. Attention units capture key information from multiple RGB, optical flow, and heatmap inputs. KD is applied to scale down model size. Experiments were performed on several benchmark datasets, examining both AUC and accuracy. The results show that the proposed framework outperformed other state-of-the-art methods in detection accuracy. Furthermore, the trade-off between detection performance and computational cost was also addressed by the proposed framework with KD.
Keywords:
abnormal behavior detection, attention, knowledge distillationDownloads
References
H. G. Doan and N. T. Nguyen, "Fusion Machine Learning Strategies for Multi-modal Sensor-based Hand Gesture Recognition," Engineering, Technology & Applied Science Research, vol. 12, no. 3, pp. 8628–8633, Jun. 2022.
I. P. Febin, K. Jayasree, and P. T. Joy, "Violence detection in videos for an intelligent surveillance system using MoBSIFT and movement filtering algorithm," Pattern Analysis and Applications, vol. 23, no. 2, pp. 611–623, May 2020.
S. P. Sahoo and S. Ari, "On an algorithm for human action recognition," Expert Systems with Applications, vol. 115, pp. 524–534, Jan. 2019.
H. Lin, J. D. Deng, B. J. Woodford, and A. Shahi, "Online Weighted Clustering for Real-time Abnormal Event Detection in Video Surveillance," in Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, Netherlands, Jul. 2016, pp. 536–540.
X. Zhang, S. Yang, J. Zhang, and W. Zhang, "Video anomaly detection and localization using motion-field shape description and homogeneity testing," Pattern Recognition, vol. 105, Sep. 2020, Art. no. 107394.
V. G. Sánchez, O. M. Lysaker, and N.-O. Skeie, "Human behaviour modelling for welfare technology using hidden Markov models," Pattern Recognition Letters, vol. 137, pp. 71–79, Sep. 2020.
T. Huang, Q. Han, W. Min, X. Li, Y. Yu, and Y. Zhang, "Loitering Detection Based on Pedestrian Activity Area Classification," Applied Sciences, vol. 9, no. 9, Jan. 2019, Art. no. 1866.
D. Gao and H. Yu, "The use of optimised SVM method in human abnormal behaviour detection," International Journal of Grid and Utility Computing, vol. 13, no. 2–3, pp. 164–172, Jan. 2022,
S. Samudra, M. Barbosh, and A. Sadhu, "Machine Learning-Assisted Improved Anomaly Detection for Structural Health Monitoring," Sensors, vol. 23, no. 7, Jan. 2023, Art. no. 3365.
V. G. Sánchez and N.-O. Skeie, "Decision Trees for Human Activity Recognition in Smart House Environments," in The 59th Conference on Imulation and Modelling (SIMS 59), Oslo, Norway, Sep. 2018, pp. 222–229.
P. Kuppusamy and V. C. Bharathi, "Human abnormal behavior detection using CNNs in crowded and uncrowded surveillance – A survey," Measurement: Sensors, vol. 24, Dec. 2022, Art. no. 100510.
M. Zerkouk and B. Chikhaoui, "Long Short Term Memory Based Model for Abnormal Behavior Prediction in Elderly Persons," in How AI Impacts Urban Living and Public Health, New York, NY, USA, 2019, pp. 36–45.
C. W. Chang, C. Y. Chang, and Y. Y. Lin, "A hybrid CNN and LSTM-based deep learning model for abnormal behavior detection," Multimedia Tools and Applications, vol. 81, no. 9, pp. 11825–11843, Apr. 2022.
H. C. Liu, J. H. Chuah, A. S. M. Khairuddin, X. M. Zhao, and X. D. Wang, "Campus Abnormal Behavior Recognition With Temporal Segment Transformers," IEEE Access, vol. 11, pp. 38471–38484, 2023.
A. Gangwar, V. González-Castro, E. Alegre, and E. Fidalgo, "AttM-CNN: Attention and metric learning based CNN for pornography, age and Child Sexual Abuse (CSA) Detection in images," Neurocomputing, vol. 445, pp. 81–104, Jul. 2021.
X. Zheng, Y. Zhang, Y. Zheng, F. Luo, and X. Lu, "Abnormal event detection by a weakly supervised temporal attention network," CAAI Transactions on Intelligence Technology, vol. 7, no. 3, pp. 419–431, 2022.
G. Yang et al., "STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video," PLOS ONE, vol. 17, no. 3, 2022, Art. no. e0265115.
Z. Teed and J. Deng, "RAFT: Recurrent All-Pairs Field Transforms for Optical Flow," in Computer Vision – ECCV 2020, Glasgow, UK, 2020, pp. 402–419.
Y. Liu, J. Yan, and W. Ouyang, "Quality Aware Network for Set to Set Recognition," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 4694–4703.
R. Mehran, A. Oyama, and M. Shah, "Abnormal crowd behavior detection using social force model," in 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, Jun. 2009, pp. 935–942.
C. Dupont, L. Tobías, and B. Luvison, "Crowd-11: A Dataset for Fine Grained Crowd Behaviour Analysis," in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, Jul. 2017, pp. 2184–2191.
H. Idrees, I. Saleemi, C. Seibert, and M. Shah, "Multi-source Multi-scale Counting in Extremely Dense Crowd Images," in 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, Jun. 2013, pp. 2547–2554.
A. Acsintoae et al., "UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection," in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, pp. 20111–20121.
B. Leibe, E. Seemann, and B. Schiele, "Pedestrian detection in crowded scenes," in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, Jun. 2005, vol. 1, pp. 878–885 vol. 1.
H. Bagherinezhad and S. Y. Soltani, "Abnormal Human Behavior Detection System in Video Surveillance Systems." SSRN, May 11, 2022.
G. Yu et al., "Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events," in Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, Oct. 2020, pp. 583–591.
A. Barbalau et al., "SSMTL++: Revisiting self-supervised multi-task learning for video anomaly detection," Computer Vision and Image Understanding, vol. 229, Mar. 2023, Art. no. 103656.
M. I. Georgescu, R. T. Ionescu, F. S. Khan, M. Popescu, and M. Shah, "A Background-Agnostic Framework With Adversarial Training for Abnormal Event Detection in Video," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4505–4523, Sep. 2022.
W. Luo, W. Liu, and S. Gao, "A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework," in 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 341–349.
W. Liu, W. Luo, D. Lian, and S. Gao, "Future Frame Prediction for Anomaly Detection - A New Baseline," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, Dec. 2018, pp. 6536–6545.
N. C. Ristea et al., "Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection," in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, Jun. 2022, pp. 13566–13576.
R. T. Ionescu, S. Smeureanu, M. Popescu, and B. Alexe, "Detecting Abnormal Events in Video Using Narrowed Normality Clusters," in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, Jan. 2019, pp. 1951–1960.
D. Gong et al., "Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), Oct. 2019, pp. 1705–1714.
B. Ramachandra and M. J. Jones, "Street Scene: A new dataset and evaluation protocol for video anomaly detection," in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass, CO, USA, Mar. 2020, pp. 2558–2567.
H. Park, J. Noh, and B. Ham, "Learning Memory-Guided Normality for Anomaly Detection," in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp. 14360–14369.
Z. Liu, Y. Nie, C. Long, Q. Zhang, and G. Li, "A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction," in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, Oct. 2021, pp. 13568–13577.
Downloads
How to Cite
License
Copyright (c) 2024 Anh-Dung Ho, Huong-Giang Doan, Thi Thanh Thuy Pham
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.