Agent Identity and Guardrails: Securing OpenClaw Access to Ansible Automation Platform
By Luca Berton · Published 2024-01-01 · Category: database-automation
How pre-approved playbooks, an OPA policy gate, and full audit trails let OpenClaw agents drive Ansible Automation Platform safely, without writing fixes.
Why "Agent Identity" Is Now a Security Problem, Not a Buzzword
The moment you let an AI agent touch production infrastructure, you have created a new kind of identity: not a human operator, not a service account running a fixed script, but a reasoning system that decides what to run and when. Red Hat's Ansible Automation Platform 2.7 and Beyond session at Red Hat Tech Day Netherlands 2026 in Bunnik (presented by Fred van Zwieten and Ismail Dhaoui) put that problem on stage with a live demo: an agent called OpenClaw autonomously remediating a critical CVE across four production servers, end to end, without a human writing a single line of the fix.
The demo worked because of what OpenClaw was not allowed to do. This article walks through the guardrail architecture that made fully autonomous CVE remediation safe enough to run in production: identity, pre-approval, policy gating, and audit trail — the same controls a well-run operations team already relies on, just driven at machine speed.
See also: AAP MCP Security and Compliance Tool Set: Audit Trails via AI Agents
The Demo, in Five Guardrailed Steps
The OpenClaw agent — visible in its control panel as sre-sally, running on Claude Sonnet 4.6 via litellm inside an OpenShift Pod — executed a five-step pipeline against CVE-2026-31337, a critical CVSS 9.8 vulnerability affecting four production servers.
- Detect and validate. The agent identified the CVE and cross-checked it against the CVE database before doing anything else.
- Open the change record. It created ServiceNow Change Request CHG0012847, marked it priority critical, set a maintenance window of 03:00–05:00 EST, notified app owners via PagerDuty, and attached all four affected servers to the change.
- Policy gate. An Open Policy Agent (OPA) engine ran an automated review — ITSM ticket validated, maintenance window confirmed, playbook pre-approved for this CVE class, rollback plan present. This check had to pass before the pipeline could move forward.
- Execute the rolling patch. Using
patch_and_reboot.yml, the agent patched and rebooted one host at a time — prod-web-01, prod-web-02, prod-api-01, then prod-db-01, drained before reboot — silencing and restoring monitoring around the maintenance window, with health checks passing on every server. - Close and report. The ITSM ticket was closed, app owners and the SRE lead notified, the CMDB updated, and a compliance report filed.
The Guardrail That Matters Most: The Agent Never Wrote the Fix
The single point Red Hat emphasized hardest in the session is easy to miss amid the automation spectacle: OpenClaw did not author the remediation. It selected a pre-approved playbook exposed through the AAP MCP server. The reasoning model decided when and in what sequence to act — it never had the authority to decide what code runs on a production host.
That distinction is the whole security model in one sentence. An agent with tool calls like memory_search, runtime_search, and runtime_exec visible in its control panel is powerful enough to improvise — which is exactly why it was scoped so it couldn't. Every guardrail an enterprise already trusts — the policy engine, the ITSM system, the CMDB, the audit log — stayed authoritative. OpenClaw didn't replace those controls; it drove them faster than a human on-call engineer could.
See also: Ansible splunk.es Collection: Automating Security Incident Response Workflows
Comparing Guardrail Layers
| Guardrail | What it enforces | Failure mode if removed |
|---|---|---|
| Pre-approved playbook catalog (AAP MCP server) | Agent can only invoke vetted, version-controlled playbooks — never generate new task logic | Agent could improvise untested changes against production |
| OPA policy engine | Hard gate: ITSM ticket, maintenance window, playbook approval, rollback plan must all be true | Remediation could proceed without change control or a way back out |
| ServiceNow change record (CHG0012847 / CHG9679226) | Human-visible change history, ownership, and notification trail | No traceability for what changed, when, or why |
| PagerDuty notification | App owners aware of impact before execution | Silent changes to systems owners don't know are being touched |
| Rolling execution with drain-before-reboot | Zero-downtime failure containment, one host at a time | A bad patch could take down all four servers simultaneously |
| GRUB rollback boot entry | Fast recovery path if the new kernel misbehaves | No safety net if the patched kernel regresses |
| CMDB update + compliance report | Closed-loop record for audit and future queries | Configuration drift between reality and system of record |
What a Pre-Approved Playbook Looks Like
The playbook itself is unremarkable by design — that's the point. It's a normal, reviewed, version-controlled Ansible playbook that happens to be reachable through the AAP MCP server's exposed job template, not something the agent assembles on the fly.
---
# patch_and_reboot.yml
# Pre-approved playbook exposed via the AAP MCP server job template
# for critical kernel CVE remediation. Not agent-authored.
- name: Rolling patch and reboot for critical CVE remediation
hosts: "{{ target_hosts }}"
serial: 1
become: true
pre_tasks:
- name: Confirm change record is approved before proceeding
ansible.builtin.assert:
that:
- change_ticket is defined
- change_status == "approved"
fail_msg: "No approved change record — refusing to patch."
- name: Silence monitoring for this host during maintenance window
ansible.builtin.uri:
url: "https://monitoring.internal/api/v1/silence"
method: POST
body_format: json
body:
host: "{{ inventory_hostname }}"
window: "{{ maintenance_window }}"
- name: Drain node from load balancer before reboot
ansible.builtin.command: /usr/local/bin/drain-node.sh {{ inventory_hostname }}
when: "'db' in group_names or 'web' in group_names or 'api' in group_names"
tasks:
- name: Apply kernel security update
ansible.builtin.dnf:
name: kernel
state: latest
- name: Reboot into patched kernel
ansible.builtin.reboot:
reboot_timeout: 600
post_tasks:
- name: Run health check after reboot
ansible.builtin.uri:
url: "http://{{ inventory_hostname }}:8080/healthz"
status_code: 200
retries: 5
delay: 15
- name: Restore monitoring for this host
ansible.builtin.uri:
url: "https://monitoring.internal/api/v1/unsilence"
method: POST
body_format: json
body:
host: "{{ inventory_hostname }}"Notice the pre_tasks assertion: the playbook itself refuses to run without an approved change ticket, independent of whatever the agent believes. That redundancy — policy gate at the orchestration layer, plus a check baked into the playbook — is deliberate defense in depth.
See also: HashiCorp Vault Integration with Ansible: OIDC, PKI, and Dynamic Credentials Complete Guide
Key Takeaways
- Identity, not intelligence, is the control surface. OpenClaw's authority came from what it was scoped to call through the AAP MCP server, not from how capable the underlying model was.
- Pre-approval beats generation. The agent selected from a catalog of reviewed playbooks; it never wrote or modified remediation logic itself.
- Policy is a gate, not a suggestion. The OPA check on ITSM validation, maintenance window, playbook approval, and rollback plan had to pass before any host was touched.
- Every existing enterprise control stayed authoritative. ServiceNow, PagerDuty, CMDB, and audit logging weren't bypassed for speed — they were the rails the agent ran on.
- Rollback was designed in, not bolted on. The retained GRUB boot entry for the previous kernel meant the fast path and the safe path were the same path.
Category: database-automation