A Differentiable Gating Mechanism for DETR: Improving Attention Efficiency in Real-Time Road Anomaly Detection

Noor Misbah; S. Srinath; R. Rakshitha

doi:10.48084/etasr.15977

Authors

Noor Misbah Department of Computer Science and Engineering, JSS Science and Technology University, Mysore, Karnataka, India
S. Srinath Department of Computer Science and Engineering, JSS Science and Technology University, Mysore, Karnataka, India
R. Rakshitha Department of Computer Science and Engineering, JSS Science and Technology University, Mysore, Karnataka, India

Volume: 16 | Issue: 1 | Pages: 31976-31982 | February 2026 | https://doi.org/10.48084/etasr.15977

Received: 3 November 2025 | Revised: 24 November 2025 | Accepted: 7 December 2025 | Online: 9 February 2026

Corresponding author: S. Srinath

Abstract

Accurate detection of road-surface anomalies such as potholes and bumps, along with safety-critical dynamic objects including vehicles and pedestrians, is essential for ensuring traffic safety and enabling reliable autonomous navigation. In this work, "anomalies" refer specifically to static road defects, whereas dynamic objects are treated as safety-relevant events that require immediate attention by intelligent systems. Conventional convolution–based detectors like Faster Region-based Convolutional Neural Network (Faster R-CNN), Single Shot MultiBox Detector (SSD), and You Only Look Once (YOLO) perform well on structured objects but struggle to capture long-range contextual dependencies, limiting performance in complex scenes. Transformer-based models such as the Detection Transformer (DETR) overcome these limitations through global self-attention but suffer from redundant attention activations and slow convergence. To address this, we introduce a Differentiable Gating Mechanism integrated into the encoder's self-attention layers of DETR, employing learnable sigmoid-based gates to selectively emphasize informative heads while suppressing redundant ones. Experiments on a custom COCO-annotated dataset of over 4,700 road images demonstrate that the proposed model improves mean Average Precision (mAP)@0.5 from 82.9% to 96.2%, increases mean Intersection over Union (mIoU) from 0.79 to 0.84, reduces trainable parameters by 56%, and achieves a 4.58× faster per image inference time (147.6 ms to 32.2 ms). These results confirm that adaptive gating enhances attention efficiency, accelerates convergence, and significantly improves detection accuracy for real-time road anomaly detection.

Keywords:

DETR, ransformer-based object detection, differentiable gating, attention mechanism, road anomaly detection, autonomous driving, deep learning

Downloads

Download data is not yet available.

References

Y. Safyari, M. Mahdianpari, and H. Shiri, "A Review of Vision-Based Pothole Detection Methods Using Computer Vision and Machine Learning," Sensors, vol. 24, no. 17, Sept. 2024, Art. no. 5652. DOI: https://doi.org/10.3390/s24175652

A. K. Bhatt et al., "Advancements in pothole detection techniques: a comprehensive review and comparative analysis," Discover Artificial Intelligence, vol. 5, no. 1, Oct. 2025, Art. no. 255. DOI: https://doi.org/10.1007/s44163-025-00297-7

T. Li and G. Li, "Road Defect Identification and Location Method Based on an Improved ML-YOLO Algorithm," Sensors, vol. 24, no. 21, Nov. 2024, Art. no. 6783. DOI: https://doi.org/10.3390/s24216783

S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in Proceedings of the 29th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015, pp. 91–99.

W. Liu et al., "SSD: Single Shot MultiBox Detector," in 14th European Conference on Computer Vision, Amsterdam, Netherlands, 2016, pp. 21–37. DOI: https://doi.org/10.1007/978-3-319-46448-0_2

A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection." arXiv, Apr. 23, 2020.

N. Bhavana, M. M. Kodabagi, B. M. Kumar, P. Ajay, N. Muthukumaran, and A. Ahilan, "POT-YOLO: Real-Time Road Potholes Detection Using Edge Segmentation-Based Yolo V8 Network," IEEE Sensors Journal, vol. 24, no. 15, pp. 24802–24809, Aug. 2024. DOI: https://doi.org/10.1109/JSEN.2024.3399008

P. Mutabarura, N. Muchuka, and D. Segera, "Comparative Evaluation of YOLO Models on an African Road Obstacles Dataset for Real-Time Obstacle Detection," Engineering, Technology & Applied Science Research, vol. 15, no. 1, pp. 19045–19051, Feb. 2025. DOI: https://doi.org/10.48084/etasr.9135

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal Loss for Dense Object Detection," in 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2999–3007. DOI: https://doi.org/10.1109/ICCV.2017.324

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-End Object Detection with Transformers," in 16th European Conference on Computer Vision, Glasgow, UK, 2020, pp. 213–229. DOI: https://doi.org/10.1007/978-3-030-58452-8_13

X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, "Deformable DETR: Deformable Transformers for End-to-End Object Detection," in 9th International Conference on Learning Representations, Virtual Event, Austria, 2020.

D. Meng et al., "Conditional DETR for Fast Training Convergence," in 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021, pp. 3631–3640. DOI: https://doi.org/10.1109/ICCV48922.2021.00363

F. Li, H. Zhang, S. Liu, J. Guo, L. M. Ni, and L. Zhang, "DN-DETR: Accelerate DETR Training by Introducing Query DeNoising," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 4, pp. 2239–2251, Apr. 2024. DOI: https://doi.org/10.1109/TPAMI.2023.3335410

H. Zhang et al., "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection." arXiv, July 11, 2022.

C. Zuo, N. Huang, C. Yuan, and Y. Li, "Pavement-DETR: A High-Precision Real-Time Detection Transformer for Pavement Defect Detection," Sensors, vol. 25, no. 8, Apr. 2025, Art. no. 2426. DOI: https://doi.org/10.3390/s25082426

H. Zhao et al., "Improved object detection method for unmanned driving based on Transformers," Frontiers in Neurorobotics, vol. 18, May 2024, Art. no. 1342126. DOI: https://doi.org/10.3389/fnbot.2024.1342126

Y. Ye, Q. Sun, K. Cheng, X. Shen, and D. Wang, "A lightweight mechanism for vision-transformer-based object detection," Complex & Intelligent Systems, vol. 11, no. 7, May 2025, Art. no. 302. DOI: https://doi.org/10.1007/s40747-025-01904-x

Y. Zhang and C. Liu, "Vision-enhanced multi-modal learning framework for non-destructive pavement damage detection," Automation in Construction, vol. 177, Sept. 2025, Art. no. 106389. DOI: https://doi.org/10.1016/j.autcon.2025.106389

E. Voita, D. Talbot, F. Moiseev, R. Sennrich, and I. Titov, "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned," in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 5797–5808. DOI: https://doi.org/10.18653/v1/P19-1580

P. Michel, O. Levy, and G. Neubig, "Are Sixteen Heads Really Better than One?," in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 14037–14047.

R. Child, S. Gray, A. Radford, and I. Sutskever, "Generating Long Sequences with Sparse Transformers." arXiv, Apr. 23, 2019.

Y. Tay, D. Bahri, D. Metzler, D.-C. Juan, Z. Zhao, and C. Zheng, "Synthesizer: Rethinking Self-Attention for Transformer Models," in Proceedings of the 38th International Conference on Machine Learning, Online, 2021, pp. 10183–10192.

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The Pascal Visual Object Classes (VOC) Challenge," International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, June 2010. DOI: https://doi.org/10.1007/s11263-009-0275-4

T.-Y. Lin et al., "Microsoft COCO: Common Objects in Context," in 13th European Conference on Computer Vision, Zurich, Switzerland, 2014, pp. 740–755. DOI: https://doi.org/10.1007/978-3-319-10602-1_48