SHIELD-DBOps: An Admissibility-and-Verification Framework for Guardrail-Based Database Operations
Received: 22 April 2026 | Revised: 19 May 2026 and 25 May 2026 | Accepted: 27 May 2026 | Online: 3 June 2026
Corresponding author: Raghu Gollapudi
Abstract
Enterprise production databases emit abundant telemetry, yet recurring database incidents are still often handled through disconnected alerts, scripts, Oracle-native point tools, and manual runbooks. The unresolved operational problem is not detection alone; it is the gap between detection and verified closure. In always-on financial and enterprise data environments, this gap is now operationally expensive because teams must control mixed Real Application Clusters (RAC), standby, and non-production estates under uptime, auditability, and staffing pressure. This paper presents SHIELD-DBOps (Standardized Handling of Incidents with Escalation Limits and Decision-guardrails), an external orchestration framework that standardizes four recurring incident classes through one supervisory control contract: TEMP/UNDO pressure, archive/FRA saturation, blocking sessions, and Data Guard apply lag. The technical contribution is a reusable admission-action-verification-handoff contract that separates database-operations policy from execution mechanics. Operational validation is reported on an 11-month ServiceNow incident series with class-level monthly counts and Mean Time To Resolve (MTTR). After production enablement in mid-May 2025, class-level MTTR remained below that of the transition month across all four classes, and incident counts declined overall. The results are descriptive and deployment-specific; phased rollout, mid-series policy tuning, and the absence of a control group preclude causal claims.
Keywords:
self-healing database operations, database reliability engineering, Oracle operations, bounded automation, Data Guard, runbook orchestrationReferences
[1] J. O. Kephart and D. M. Chess, "The vision of autonomic computing," Computer, vol. 36, no. 1, pp. 41–50, Jan. 2003.
[2] H. Psaier and S. Dustdar, "A survey on self-healing systems: approaches and systems," Computing, vol. 91, no. 1, pp. 43–73, Jan. 2011.
[3] J. Chandra and P. Manhas, "Efficient and Scalable Self-Healing Databases Using Meta-Learning and Dependency-Driven Recovery." arXiv, July 18, 2025.
[4] H. S. Gunawi et al., "Why Does the Cloud Stop Computing?: Lessons from Hundreds of Service Outages," in Proceedings of the Seventh ACM Symposium on Cloud Computing, Oct. 2016, pp. 1–16.
[5] Y. Chen et al., "Outage Prediction and Diagnosis for Cloud Service Systems," in The World Wide Web Conference, May 2019, pp. 2659–2665.
[6] D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang, "Automatic Database Management System Tuning Through Large-scale Machine Learning," in Proceedings of the 2017 ACM International Conference on Management of Data, May 2017, pp. 1009–1024.
[7] G. Li et al., "openGauss: an autonomous database system," Proceedings of the VLDB Endowment, vol. 14, no. 12, pp. 3028–3042, July 2021.
[8] D. Van Aken et al., "An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems," Proceedings of the VLDB Endowment, vol. 14, no. 7, pp. 1241–1253, Mar. 2021.
[9] A. Pavlo et al., "Make your database system dream of electric sheep: towards self-driving operation," Proceedings of the VLDB Endowment, vol. 14, no. 12, pp. 3211–3221, July 2021.
[10] A. Albugmi, "Digital Forensics Readiness Framework (DFRF) to Secure Database Systems," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13732–13740, Apr. 2024.
[11] C. Tan et al., "A Knowledge-driven Self-healing Dual-loop and Validation for Autonomous Networks," in Proceedings of the 8th Asia-Pacific Workshop on Networking, Aug. 2024, pp. 185–186.
[12] G. White, L. L. Custode, and O. O’Brien, "SASH: Safe Autonomous Self-Healing," in Service-Oriented Computing – ICSOC 2022 Workshops, vol. 13821, J. Troya, R. Mirandola, E. Navarro, A. Delgado, S. Segura, G. Ortiz, C. Pautasso, C. Zirpins, P. Fernández, and A. Ruiz-Cortés, Eds. Springer Nature Switzerland, 2023, pp. 142–153.
[13] Z. Chen et al., "Towards intelligent incident management: why we need it and how we make it," in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Nov. 2020, pp. 1487–1497.
[14] P. Notaro, J. Cardoso, and M. Gerndt, "A Survey of AIOps Methods for Failure Management," ACM Transactions on Intelligent Systems and Technology, vol. 12, no. 6, pp. 1–45, Dec. 2021.
[15] Y. Dang, Q. Lin, and P. Huang, "AIOps: Real-World Challenges and Research Innovations," in 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), May 2019, pp. 4–5.
Downloads
How to Cite
License
Copyright (c) 2026 Raghu Gollapudi

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
