ServiceNow, OPA, and Ansible: Policy-Gated Agentic Remediation Explained
By Luca Berton · Published 2024-01-01 · Category: troubleshooting
How OPA policy gates and ServiceNow ITSM keep agentic CVE remediation with Ansible Automation Platform safe, auditable, and enterprise-ready.
Autonomous remediation sounds risky until you look at what actually gates it. At Red Hat Tech Day Netherlands 2026 in Bunnik, Fred van Zwieten and Ismail Dhaoui demonstrated exactly that during their "Ansible Automation Platform 2.7 and Beyond" session: an AI agent detecting a critical CVE, opening a ServiceNow change record, clearing an Open Policy Agent (OPA) review, and patching four production servers — without a human touching a keyboard mid-flight. The headline isn't the AI. It's the governance stack that never lost authority.
This article breaks down the two components that made the demo enterprise-credible rather than a stunt: the OPA policy engine acting as a hard gate, and ServiceNow acting as the system of record. Together they answer the question every platform team asks about agentic AI: who's actually in control?
The scenario: a critical CVE, four servers, zero manual steps
The demo used an agent called OpenClaw, running as a pod inside OpenShift, orchestrating work through Ansible Automation Platform (AAP). The pipeline had five stages:
- Detect and validate — the agent identifies CVE-2026-31337 (CVSS 9.8, critical) affecting four production servers and cross-checks it against the CVE database.
- Open the change — a ServiceNow Change Request, CHG0012847, is created with critical priority and a 03:00–05:00 EST maintenance window. PagerDuty notifies app owners, and the four affected servers are attached to the change record.
- Policy review — OPA evaluates the change against a set of hard conditions.
- Execute — a rolling patch-and-reboot runs host by host.
- Close and report — the ticket closes, stakeholders are notified, CMDB is updated, and a compliance report is filed.
See also: AAP Automation Orchestrator: ITSM Ticket Integration with ServiceNow
Why OPA is the gate, not a suggestion
Open Policy Agent evaluates structured input against declarative Rego policy and returns an allow/deny decision. In this pipeline, before AAP is permitted to launch patch_and_reboot.yml against a single host, OPA checks:
- The ITSM ticket (CHG0012847) exists and is validated.
- The maintenance window is confirmed and current.
- The playbook being invoked is pre-approved for this specific CVE class.
- A rollback plan is present and attached to the change.
A simplified version of the kind of Rego policy backing that gate looks like this:
# policy-input.yml — example payload AAP would submit to OPA before execution
change_request: "CHG0012847"
cve_id: "CVE-2026-31337"
cvss_score: 9.8
maintenance_window:
start: "2026-06-03T03:00:00-05:00"
end: "2026-06-03T05:00:00-05:00"
playbook: "patch_and_reboot.yml"
playbook_preapproved_for_cve_class: true
rollback_plan_attached: true
itsm_ticket_validated: true# opa-gate.yml — illustrative AAP workflow step calling the policy check
- name: Evaluate OPA policy before remediation
hosts: localhost
gather_facts: false
tasks:
- name: Query OPA decision endpoint
ansible.builtin.uri:
url: "https://opa.internal.example.com/v1/data/aap/remediation/allow"
method: POST
body_format: json
body: "{{ lookup('file', 'policy-input.yml') | from_yaml | to_json }}"
return_content: true
register: opa_decision
- name: Fail the workflow if policy denies remediation
ansible.builtin.fail:
msg: "OPA policy check failed — remediation blocked pending manual review."
when: not (opa_decision.json.result | default(false))That second task is the entire point of the architecture: a failed policy check is a hard stop, surfaced as a workflow failure in AAP, not a warning the agent can reason its way past.
Why ServiceNow is more than a notification target
It's tempting to read the ServiceNow step as "send an alert." In the demo it did four structural jobs:
| ServiceNow function | What it enforces |
|---|---|
| Change Request (CHG0012847) | Creates an auditable record before any host is touched |
| Attached servers | Scopes exactly which four production hosts are in play — no silent scope creep |
| Maintenance window (03:00–05:00 EST) | Gives OPA a concrete time boundary to validate against |
| PagerDuty notification to app owners | Ensures humans are aware before automated execution begins |
5.14.0-427.el9 to 5.14.0-503.el9, with the previous kernel retained as a GRUB boot entry for rollback. The CMDB was updated and a compliance report was filed automatically. Nothing about "what happened" lives only in agent memory or chat logs — it lives in ServiceNow, where auditors already know how to look for it.
See also: AAP 2.7 EE Builder Step 1: Choosing a Base Image
The execution stage: rolling, drained, reversible
Once OPA approved and the change record was live, AAP ran the rolling patch across prod-web-01, prod-web-02, prod-api-01, and finally prod-db-01 — one host at a time, with the database node drained before its reboot. Monitoring was silenced for the maintenance window and restored afterward, and health checks passed on every host before moving to the next. The result was a zero-downtime kernel upgrade, with a same-generation rollback path preserved via the retained GRUB entry.
What the agent actually did — and didn't do
This is the detail worth repeating to any skeptical stakeholder: the agent, named sre-sally and running on Claude Sonnet 4.6 via litellm, did not write the remediation. Its visible tool calls in the OpenClaw control panel — memory_search, runtime_search, runtime_exec — show it searching context, checking prior runs, and executing tools. The actual fix was a pre-approved playbook exposed through the AAP MCP server, selected by the agent from an existing, human-reviewed catalog. The agent's job was orchestration and speed: detect, file, wait on policy, execute the known-good playbook, report. Every decision point that mattered — is this ticket valid, is this the approved fix, is there a way back out — was answered by systems that predate the agent entirely.
See also: AAP 2.7 EE Builder Step 2: Adding Collections from Private Automation Hub
Key Takeaways
- OPA is a hard gate, not a lint check. Execution is blocked, not warned, when ITSM validation, maintenance window, playbook approval, or rollback plan is missing.
- ServiceNow anchors the audit trail. The Change Request created before remediation is the same record closed after it, giving auditors a single source of truth.
- The agent selects, it doesn't author. Remediation ran through a pre-approved playbook (
patch_and_reboot.yml) via the AAP MCP server — the AI didn't improvise a fix. - Rollback is designed in, not bolted on. Retaining the prior kernel as a GRUB entry meant the "undo" path existed before the "do" path ran.
- Speed doesn't require lowering guardrails. CVE-2026-31337 went from detection to closed, compliant change record without skipping ITSM, policy, or CMDB steps — automation made the existing process faster, not thinner.
Category: troubleshooting