SHIELD-DBOps: An Admissibility-and-Verification Framework for Guardrail-Based Database Operations

Raghu Gollapudi

doi:10.48084/etasr.19469

Authors

Raghu Gollapudi Fiserv Inc., Frisco, Texas, USA https://orcid.org/0009-0006-6861-3796

Volume: 16 | Issue: 4 | Pages: 37239-37244 | August 2026 | https://doi.org/10.48084/etasr.19469

Received: 22 April 2026 | Revised: 19 May 2026 and 25 May 2026 | Accepted: 27 May 2026 | Online: 3 June 2026

Corresponding author: Raghu Gollapudi

Abstract

Enterprise production databases emit abundant telemetry, yet recurring database incidents are still often handled through disconnected alerts, scripts, Oracle-native point tools, and manual runbooks. The unresolved operational problem is not detection alone; it is the gap between detection and verified closure. In always-on financial and enterprise data environments, this gap is now operationally expensive because teams must control mixed Real Application Clusters (RAC), standby, and non-production estates under uptime, auditability, and staffing pressure. This paper presents SHIELD-DBOps (Standardized Handling of Incidents with Escalation Limits and Decision-guardrails), an external orchestration framework that standardizes four recurring incident classes through one supervisory control contract: TEMP/UNDO pressure, archive/FRA saturation, blocking sessions, and Data Guard apply lag. The technical contribution is a reusable admission-action-verification-handoff contract that separates database-operations policy from execution mechanics. Operational validation is reported on an 11-month ServiceNow incident series with class-level monthly counts and Mean Time To Resolve (MTTR). After production enablement in mid-May 2025, class-level MTTR remained below that of the transition month across all four classes, and incident counts declined overall. The results are descriptive and deployment-specific; phased rollout, mid-series policy tuning, and the absence of a control group preclude causal claims.

Keywords:

self-healing database operations, database reliability engineering, Oracle operations, bounded automation, Data Guard, runbook orchestration

References

[1] J. O. Kephart and D. M. Chess, "The vision of autonomic computing," Computer, vol. 36, no. 1, pp. 41–50, Jan. 2003.

[2] H. Psaier and S. Dustdar, "A survey on self-healing systems: approaches and systems," Computing, vol. 91, no. 1, pp. 43–73, Jan. 2011.

[3] J. Chandra and P. Manhas, "Efficient and Scalable Self-Healing Databases Using Meta-Learning and Dependency-Driven Recovery." arXiv, July 18, 2025.

[4] H. S. Gunawi et al., "Why Does the Cloud Stop Computing?: Lessons from Hundreds of Service Outages," in Proceedings of the Seventh ACM Symposium on Cloud Computing, Oct. 2016, pp. 1–16.

[5] Y. Chen et al., "Outage Prediction and Diagnosis for Cloud Service Systems," in The World Wide Web Conference, May 2019, pp. 2659–2665.

[6] D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang, "Automatic Database Management System Tuning Through Large-scale Machine Learning," in Proceedings of the 2017 ACM International Conference on Management of Data, May 2017, pp. 1009–1024.

[7] G. Li et al., "openGauss: an autonomous database system," Proceedings of the VLDB Endowment, vol. 14, no. 12, pp. 3028–3042, July 2021.

[8] D. Van Aken et al., "An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems," Proceedings of the VLDB Endowment, vol. 14, no. 7, pp. 1241–1253, Mar. 2021.

[9] A. Pavlo et al., "Make your database system dream of electric sheep: towards self-driving operation," Proceedings of the VLDB Endowment, vol. 14, no. 12, pp. 3211–3221, July 2021.

[10] A. Albugmi, "Digital Forensics Readiness Framework (DFRF) to Secure Database Systems," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13732–13740, Apr. 2024.

[11] C. Tan et al., "A Knowledge-driven Self-healing Dual-loop and Validation for Autonomous Networks," in Proceedings of the 8th Asia-Pacific Workshop on Networking, Aug. 2024, pp. 185–186.

[12] G. White, L. L. Custode, and O. O’Brien, "SASH: Safe Autonomous Self-Healing," in Service-Oriented Computing – ICSOC 2022 Workshops, vol. 13821, J. Troya, R. Mirandola, E. Navarro, A. Delgado, S. Segura, G. Ortiz, C. Pautasso, C. Zirpins, P. Fernández, and A. Ruiz-Cortés, Eds. Springer Nature Switzerland, 2023, pp. 142–153.

[13] Z. Chen et al., "Towards intelligent incident management: why we need it and how we make it," in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Nov. 2020, pp. 1487–1497.

[14] P. Notaro, J. Cardoso, and M. Gerndt, "A Survey of AIOps Methods for Failure Management," ACM Transactions on Intelligent Systems and Technology, vol. 12, no. 6, pp. 1–45, Dec. 2021.

[15] Y. Dang, Q. Lin, and P. Huang, "AIOps: Real-World Challenges and Research Innovations," in 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), May 2019, pp. 4–5.