From Alert to Patched Fleet: An OpenClaw and AAP Remediation Walkthrough

By Luca Berton · Published 2024-01-01 · Category: database-automation

Step-by-step walkthrough of the OpenClaw and AAP live demo from Red Hat Tech Day Netherlands 2026: CVE detection to a patched, zero-downtime fleet.

At Red Hat Tech Day Netherlands 2026 in Bunnik, Fred van Zwieten and Ismail Dhaoui closed their "Ansible Automation Platform 2.7 and Beyond" session with a demo that made the whole room go quiet: an AI agent called OpenClaw detected a critical CVE, opened a change ticket, cleared a policy gate, patched four production servers on a rolling basis, and closed out the paperwork — without a human touching a keyboard in between. This article walks through that demo step by step, exactly as it ran, and explains why the "boring" enterprise plumbing underneath it is the actual story.

The setup: what OpenClaw is doing inside AAP

OpenClaw ran as an agent named sre-sally, deployed as a Pod on OpenShift, powered by Claude Sonnet 4.6 via litellm. On the OpenClaw control panel, the audience could watch its tool calls fire in real time: memory_search, runtime_search, and runtime_exec. Those calls weren't hitting some generic shell — they were routed through the AAP MCP server, which exposes a curated set of pre-approved playbooks and job templates as callable tools. That distinction matters more than it sounds: the agent never wrote a line of remediation code. It searched, matched a CVE class to an existing, tested playbook, and asked AAP to run it. Ansible Automation Platform remained the thing actually touching the servers.

Step 1: Detect and validate

The pipeline opened with sre-sally flagging CVE-2026-31337, a critical vulnerability with a CVSS score of 9.8, affecting four production servers. Before doing anything else, the agent cross-checked the finding against the CVE database to confirm severity and applicability — no action is triggered off an unverified signal. This validation step is what separates an autonomous remediation pipeline from a reckless one: the agent's first job is to be sure, not to be fast.

Step 2: Open the change and notify the owners

Once validated, OpenClaw created a ServiceNow Change Request, CHG0012847, marked priority critical with a maintenance window of 03:00–05:00 EST. The four affected servers were attached directly to the change record, and application owners were paged through PagerDuty. Nothing here bypasses ITSM — the agent is using it exactly as an on-call engineer would, just without the 2 a.m. wake-up call to fill in the ticket fields.

Step 3: The policy gate that has to pass

This is the step Red Hat leaned on hardest during the session. Before any patch touches a host, an Open Policy Agent (OPA) engine runs an automated review against four conditions:

Check	What it confirms
ITSM ticket validated	CHG0012847 exists, is approved, and is linked to the right hosts
Maintenance window confirmed	Current or scheduled time falls inside 03:00–05:00 EST
Playbook pre-approved	The selected playbook is on the approved list for this CVE class
Rollback plan present	A documented rollback path exists before execution starts

All four must pass. There is no "warn and continue" path. If OPA rejects the change, the pipeline stops, full stop — the agent has no override. This is the crux of the whole demo: the policy engine is a hard gate, not a suggestion, and it sits between the AI agent and production infrastructure no matter how confident the agent's reasoning looks in the control panel.

Step 4: Rolling patch and reboot

With the policy check green, AAP executed patch_and_reboot.yml one host at a time, in this order: prod-web-01, prod-web-02, prod-api-01, then prod-db-01 — the database node drained of connections before its reboot. Monitoring was silenced for the maintenance window and automatically restored afterward, so the patching didn't fire a storm of false alerts. Health checks ran on every server after reboot, and all four came back clean.

A representative, illustrative shape of that playbook looks like this:

---
- name: Rolling CVE remediation - patch and reboot
  hosts: patch_targets
  serial: 1
  become: true
  vars:
    cve_id: CVE-2026-31337
    change_ticket: CHG0012847

  pre_tasks:
    - name: Silence monitoring for this host
      ansible.builtin.uri:
        url: "https://monitoring.internal/api/v1/silence"
        method: POST
        body_format: json
        body:
          host: "{{ inventory_hostname }}"
          reason: "{{ change_ticket }} - {{ cve_id }} remediation"

    - name: Drain host if it is a database node
      ansible.builtin.command: /usr/local/bin/drain-node.sh
      when: "'db' in group_names"

  tasks:
    - name: Apply kernel and package updates
      ansible.builtin.dnf:
        name: "*"
        state: latest
      register: patch_result

    - name: Reboot host
      ansible.builtin.reboot:
        reboot_timeout: 600
      when: patch_result.changed

    - name: Run post-patch health check
      ansible.builtin.uri:
        url: "http://{{ inventory_hostname }}:8080/healthz"
        status_code: 200
      retries: 5
      delay: 15

  post_tasks:
    - name: Restore monitoring
      ansible.builtin.uri:
        url: "https://monitoring.internal/api/v1/unsilence"
        method: POST
        body_format: json
        body:
          host: "{{ inventory_hostname }}"

The end state, captured in the final change record CHG9679226, shows the kernel moving from 5.14.0-427.el9 to 5.14.0-503.el9 on all four hosts, with the previous kernel kept as a GRUB boot entry for instant rollback if anything regressed post-patch. Because each host was drained before its reboot, the fleet saw zero downtime throughout.

Step 5: Close, notify, and file the evidence

The pipeline didn't stop at "servers are patched." OpenClaw closed the ITSM ticket, notified the app owners and the SRE lead that remediation was complete, updated the CMDB with the new kernel version and patch date, and filed a compliance report. For an auditor, the trail looks identical to a well-run human-executed change — because structurally, it is one.

What actually changed, and what didn't

It's tempting to read this demo as "AI patches your servers now." The more accurate reading, and the one Red Hat emphasized explicitly, is narrower and more useful: the agent did not write the fix. It selected a pre-approved playbook exposed through the AAP MCP server and drove existing enterprise guardrails — the policy engine, the ITSM system, the CMDB, the audit trail — faster than a human on-call team could. None of those systems became optional. All of them stayed authoritative. The agent's contribution was orchestration speed, not authority.

Key Takeaways

The 5-step pipeline was: detect and validate the CVE, open a ServiceNow change and notify owners, pass a hard OPA policy gate, execute a rolling patch-and-reboot playbook, then close and report.
OpenClaw (agent "sre-sally," Claude Sonnet 4.6 via litellm) never generated remediation code — it called pre-approved playbooks through the AAP MCP server using memory_search, runtime_search, and runtime_exec.
The OPA policy check (ticket validated, window confirmed, playbook pre-approved, rollback present) had to pass unanimously before any host was touched — a hard gate, not a soft warning.
The rolling patch_and_reboot.yml run drained prod-db-01 before rebooting it and kept the prior kernel as a GRUB entry, delivering the upgrade from 5.14.0-427.el9 to 5.14.0-503.el9 with zero downtime.
Every enterprise guardrail — ITSM, policy engine, CMDB, audit trail — remained authoritative; the agent only made the existing process faster, not less governed.

Category: database-automation

Browse all Ansible tutorials · AnsiblePilot Home

AnsiblePilot — Master Ansible Automation

Popular Topics

About Luca Berton