Agent Identity and Guardrails: Securing OpenClaw Access to Ansible Automation Platform

By Luca Berton · Published 2024-01-01 · Category: database-automation

How pre-approved playbooks, an OPA policy gate, and full audit trails let OpenClaw agents drive Ansible Automation Platform safely, without writing fixes.

Why "Agent Identity" Is Now a Security Problem, Not a Buzzword

The moment you let an AI agent touch production infrastructure, you have created a new kind of identity: not a human operator, not a service account running a fixed script, but a reasoning system that decides what to run and when. Red Hat's Ansible Automation Platform 2.7 and Beyond session at Red Hat Tech Day Netherlands 2026 in Bunnik (presented by Fred van Zwieten and Ismail Dhaoui) put that problem on stage with a live demo: an agent called OpenClaw autonomously remediating a critical CVE across four production servers, end to end, without a human writing a single line of the fix.

The demo worked because of what OpenClaw was not allowed to do. This article walks through the guardrail architecture that made fully autonomous CVE remediation safe enough to run in production: identity, pre-approval, policy gating, and audit trail — the same controls a well-run operations team already relies on, just driven at machine speed.

The Demo, in Five Guardrailed Steps

The OpenClaw agent — visible in its control panel as sre-sally, running on Claude Sonnet 4.6 via litellm inside an OpenShift Pod — executed a five-step pipeline against CVE-2026-31337, a critical CVSS 9.8 vulnerability affecting four production servers.

Detect and validate. The agent identified the CVE and cross-checked it against the CVE database before doing anything else.
Open the change record. It created ServiceNow Change Request CHG0012847, marked it priority critical, set a maintenance window of 03:00–05:00 EST, notified app owners via PagerDuty, and attached all four affected servers to the change.
Policy gate. An Open Policy Agent (OPA) engine ran an automated review — ITSM ticket validated, maintenance window confirmed, playbook pre-approved for this CVE class, rollback plan present. This check had to pass before the pipeline could move forward.
Execute the rolling patch. Using patch_and_reboot.yml, the agent patched and rebooted one host at a time — prod-web-01, prod-web-02, prod-api-01, then prod-db-01, drained before reboot — silencing and restoring monitoring around the maintenance window, with health checks passing on every server.
Close and report. The ITSM ticket was closed, app owners and the SRE lead notified, the CMDB updated, and a compliance report filed.

The final change record, CHG9679226, shows the kernel moved from 5.14.0-427.el9 to 5.14.0-503.el9, with the previous kernel retained as a GRUB boot entry for rollback, and zero downtime because each host was drained before its reboot.

The Guardrail That Matters Most: The Agent Never Wrote the Fix

The single point Red Hat emphasized hardest in the session is easy to miss amid the automation spectacle: OpenClaw did not author the remediation. It selected a pre-approved playbook exposed through the AAP MCP server. The reasoning model decided when and in what sequence to act — it never had the authority to decide what code runs on a production host.

That distinction is the whole security model in one sentence. An agent with tool calls like memory_search, runtime_search, and runtime_exec visible in its control panel is powerful enough to improvise — which is exactly why it was scoped so it couldn't. Every guardrail an enterprise already trusts — the policy engine, the ITSM system, the CMDB, the audit log — stayed authoritative. OpenClaw didn't replace those controls; it drove them faster than a human on-call engineer could.

Comparing Guardrail Layers

Guardrail	What it enforces	Failure mode if removed
Pre-approved playbook catalog (AAP MCP server)	Agent can only invoke vetted, version-controlled playbooks — never generate new task logic	Agent could improvise untested changes against production
OPA policy engine	Hard gate: ITSM ticket, maintenance window, playbook approval, rollback plan must all be true	Remediation could proceed without change control or a way back out
ServiceNow change record (CHG0012847 / CHG9679226)	Human-visible change history, ownership, and notification trail	No traceability for what changed, when, or why
PagerDuty notification	App owners aware of impact before execution	Silent changes to systems owners don't know are being touched
Rolling execution with drain-before-reboot	Zero-downtime failure containment, one host at a time	A bad patch could take down all four servers simultaneously
GRUB rollback boot entry	Fast recovery path if the new kernel misbehaves	No safety net if the patched kernel regresses
CMDB update + compliance report	Closed-loop record for audit and future queries	Configuration drift between reality and system of record

Each row is a place where the agent's autonomy stops and an existing enterprise control takes over. None of these are new inventions for the AI era — they are the standard change-management stack. What's new is that an agent can walk the entire chain unattended.

What a Pre-Approved Playbook Looks Like

The playbook itself is unremarkable by design — that's the point. It's a normal, reviewed, version-controlled Ansible playbook that happens to be reachable through the AAP MCP server's exposed job template, not something the agent assembles on the fly.

---
# patch_and_reboot.yml
# Pre-approved playbook exposed via the AAP MCP server job template
# for critical kernel CVE remediation. Not agent-authored.
- name: Rolling patch and reboot for critical CVE remediation
  hosts: "{{ target_hosts }}"
  serial: 1
  become: true

  pre_tasks:
    - name: Confirm change record is approved before proceeding
      ansible.builtin.assert:
        that:
          - change_ticket is defined
          - change_status == "approved"
        fail_msg: "No approved change record — refusing to patch."

    - name: Silence monitoring for this host during maintenance window
      ansible.builtin.uri:
        url: "https://monitoring.internal/api/v1/silence"
        method: POST
        body_format: json
        body:
          host: "{{ inventory_hostname }}"
          window: "{{ maintenance_window }}"

    - name: Drain node from load balancer before reboot
      ansible.builtin.command: /usr/local/bin/drain-node.sh {{ inventory_hostname }}
      when: "'db' in group_names or 'web' in group_names or 'api' in group_names"

  tasks:
    - name: Apply kernel security update
      ansible.builtin.dnf:
        name: kernel
        state: latest

    - name: Reboot into patched kernel
      ansible.builtin.reboot:
        reboot_timeout: 600

  post_tasks:
    - name: Run health check after reboot
      ansible.builtin.uri:
        url: "http://{{ inventory_hostname }}:8080/healthz"
        status_code: 200
      retries: 5
      delay: 15

    - name: Restore monitoring for this host
      ansible.builtin.uri:
        url: "https://monitoring.internal/api/v1/unsilence"
        method: POST
        body_format: json
        body:
          host: "{{ inventory_hostname }}"

Notice the pre_tasks assertion: the playbook itself refuses to run without an approved change ticket, independent of whatever the agent believes. That redundancy — policy gate at the orchestration layer, plus a check baked into the playbook — is deliberate defense in depth.

Key Takeaways

Identity, not intelligence, is the control surface. OpenClaw's authority came from what it was scoped to call through the AAP MCP server, not from how capable the underlying model was.
Pre-approval beats generation. The agent selected from a catalog of reviewed playbooks; it never wrote or modified remediation logic itself.
Policy is a gate, not a suggestion. The OPA check on ITSM validation, maintenance window, playbook approval, and rollback plan had to pass before any host was touched.
Every existing enterprise control stayed authoritative. ServiceNow, PagerDuty, CMDB, and audit logging weren't bypassed for speed — they were the rails the agent ran on.
Rollback was designed in, not bolted on. The retained GRUB boot entry for the previous kernel meant the fast path and the safe path were the same path.

For teams evaluating agent-driven operations on Ansible Automation Platform 2.7, the Bunnik demo is a useful template: treat the agent as a very fast, very literal operator who is only ever allowed to press buttons that already exist.

Category: database-automation

Browse all Ansible tutorials · AnsiblePilot Home

AnsiblePilot — Master Ansible Automation

Popular Topics

About Luca Berton