AAP Automation Orchestrator: Reading the Execution Timeline for Auditing

By Luca Berton · Published 2024-01-01 · Category: troubleshooting

How to read the AAP Automation Orchestrator execution timeline for audit trails, using the CVE-2024-6387 regreSSHion remediation demo as a worked example.

Auditors do not care how elegant your automation is. They care about one thing: can you prove, step by step, exactly what happened, who approved it, and how long each stage took. Red Hat's upcoming Automation Orchestrator — announced for Q3 2026 at Red Hat Tech Day Netherlands 2026 in Bunnik — is built around that requirement from the ground up. It runs on the upstream Temporal durable-execution engine and gives every workflow, whether triggered by an event, a human, or an AI agent, a persistent, replayable execution timeline.

This article walks through that timeline using the live demo shown at the event: an 8-step remediation of CVE-2024-6387, the critical "regreSSHion" race condition in OpenSSH's sshd. The point isn't the vulnerability — it's what the timeline record proves about governance.

The Governing Principle: AI Acts Through AAP, Never Around It

The line Red Hat used at the event captures the design intent precisely:

> "AI isn't improvising against production infrastructure, it's acting through AAP."

That sentence is the entire audit story in miniature. An LLM can reason, correlate, and recommend, but every action it takes against real infrastructure passes through the same governed execution path as a human-triggered job — inventory lookups, job template runs, and approval gates all land in the same durable timeline. Nothing the AI does is invisible or unaccountable.

The Five-Step Orchestration Pipeline

Automation Orchestrator's canvas unifies task-based automation (playbooks, job templates) and event-based automation (Event-Driven Ansible) into a single pipeline:

Alerts from multiple sources — agents, events, and playbooks are orchestrated on one canvas
Events trigger a deterministic rulebook — EDA picks up the alert
AI analyzes and recommends — an LLM plus MCP tools investigate and propose remediation
Humans approve — a governance gate before anything touches production
Automated remediation at scale — deterministic, auditable execution via AAP

Every one of those five steps writes an entry to the execution timeline, which is what makes the whole chain reconstructible after the fact.

Walking the CVE-2024-6387 Timeline

The demo workflow ran end to end in eight steps, triggered first by an IBM Instana webhook (step 2) and then by a ServiceNow webhook (step 3), both posting to the EDA webhook endpoint with auto-generated API keys. The AI agent — running on Red Hat AI/Nomotron 120b — was given a prompt instructing it to query AAP inventory via MCP, correlate the affected hosts to the correct host group, match an existing remediation job template, and submit a plan for human approval that included a rollback strategy. Its toolset was scoped to four MCP tools: Splunk Query, Splunk Alert Search, Splunk Saved Search, and ServiceNow CMDB Lookup.

Here is the recorded execution timeline from that run:

Stage	Duration	Nature
Alert ingestion	0s	Automated
ITSM ticket creation	1.2s	Automated
Vulnerability analysis (AI + MCP)	4.8s	Automated
Human review	38.4s	Manual
Remediation execution	0.9s	Automated
Ticket close	2.1s	Automated

Add up the automated stages and you're under 10 seconds of machine time. The 38.4-second human review is the only non-deterministic entry — and that's exactly the point of an audit trail: it should make the human decision point visually obvious against a backdrop of near-instantaneous machine execution, not bury it.

The remediation itself patched 12 hosts across prod, staging, and dev, applied as a rolling update in 3 batches of 4 with zero downtime, and every health check passed. The originating ServiceNow ticket, INC0038291, was resolved and closed automatically as the final timeline entry.

Why the Human Review Gate Is the Audit Anchor

Step 4 — Human Review — is where an auditor's eye should go first, because it's configurable and it's where accountability is assigned to a named person rather than a system. In the demo, the gate exposed:

Usernames to notify — a defined approver list, not a broadcast
A custom message — "Please approve this deployment to production"
A timeout — 1 day by default
An explicit on-timeout action — "Fail the workflow"

That last point matters more than it looks. There is no silent auto-approval path. If nobody responds within the timeout window, the workflow fails closed rather than proceeding unattended. For a compliance review, that single configuration choice is often the difference between a workflow that satisfies change-management policy and one that doesn't.

Reading the Timeline as an Auditor

When you pull up an Automation Orchestrator execution record, treat it the same way you'd treat a deployment log during a SOX or ISO 27001 review:

Confirm the trigger source (webhook, schedule, manual) and that its API key was scoped and not shared.
Confirm the MCP tool calls the AI agent made were read-only investigation (queries, lookups) versus the actual remediation, which stays inside a job template.
Confirm the approver identity and timestamp on the human review step, not just "approved: true."
Confirm the timeout action matches policy — fail-closed, not auto-approve.
Confirm the rollback strategy was attached to the plan before approval, not improvised after.

Because the underlying engine is Temporal, every one of these steps is a durable, replayable event in workflow history — not a log line that can be truncated or rotated out, but a first-class record of the workflow's state machine.

A Representative Job Template Task

While Automation Orchestrator's canvas is new, the remediation itself still executes as an ordinary AAP job template. A simplified task from a regreSSHion patch playbook, matched by the AI agent's plan, looks like this:

---
- name: Remediate CVE-2024-6387 on affected sshd hosts
  hosts: "{{ target_host_group }}"
  become: true
  serial: 4
  max_fail_percentage: 0

  tasks:
    - name: Ensure OpenSSH server package is at patched version
      ansible.builtin.package:
        name: openssh-server
        state: latest

    - name: Restart sshd to apply patched binary
      ansible.builtin.service:
        name: sshd
        state: restarted

    - name: Verify sshd health check
      ansible.builtin.wait_for:
        port: 22
        timeout: 30

    - name: Record remediation event for audit trail
      ansible.builtin.debug:
        msg: "CVE-2024-6387 remediated on {{ inventory_hostname }} — batch {{ ansible_play_batch }}"

The serial: 4 directive maps directly to the "3 batches of 4" rolling update described in the demo, and max_fail_percentage: 0 enforces the zero-downtime requirement by halting the rollout the moment a batch fails its health check.

Key Takeaways

Automation Orchestrator's execution timeline is built on Temporal, giving every workflow step — human or AI — a durable, replayable audit record.
In the CVE-2024-6387 demo, automated stages totaled under 10 seconds; the 38.4-second human review was the only manual, non-deterministic step.
The Human Review gate's configurable timeout and explicit "fail the workflow" on-timeout action mean approval gaps never resolve to silent auto-approval.
AI agents operate through scoped MCP tools (Splunk, ServiceNow CMDB) for investigation, while actual remediation stays inside governed AAP job templates.
Auditors should read the timeline for trigger provenance, approver identity, timeout policy, and rollback strategy — not just a pass/fail outcome.

Category: troubleshooting

Browse all Ansible tutorials · AnsiblePilot Home

AnsiblePilot — Master Ansible Automation

Popular Topics

About Luca Berton