Ansible Automation Platform High Availability and Disaster Recovery: Single Topology Architecture

By Luca Berton · Published 2024-01-01 · Category: events

AAP HA/DR architecture with proven db failover under 60s, full AZ recovery under 3 min, and EDB PostgreSQL partnership.

AAP now provides a single topology built with High Availability and Disaster Recovery — proven platform resilience with quantified recovery times, documented failure boundaries, and a tested DR blueprint built on the EDB partnership.

Four Enterprise Guarantees

1. Platform Resilience Is Proven, Not Assumed

Verified automatic recovery from: • Component failure — individual service restarts • Database failover — under 60 seconds • Full AZ loss — recovery under 3 minutes • Zero human intervention required for all scenarios

2. Failure Boundaries Are Documented, Not Discovered in Production

Test scenarios characterize exactly what survives a platform disruption and what does not — giving operators a clear, tested contract for how to design automation that is safe to re-run after any failure event.

3. Complete, Tested Disaster Recovery Blueprint

Full region DR scenarios validate the entire failover chain: Database promotion DNS cutover Execution reconnection

Every step classified as automatic or manual and timed against real infrastructure. This is a tested blueprint — not a runbook on paper.

4. Built on the EDB Partnership

Any disaster recovery for a platform requires a strong data strategy. AAP leverages the EDB partnership to make data resiliency top of mind within the topology.

Recovery Time Objectives

| Failure Scenario | Recovery Time | Intervention | |---|---|---| | Single component failure | Seconds | Automatic | | Database failover | < 60 seconds | Automatic | | Full Availability Zone loss | < 3 minutes | Automatic | | Full region DR | Minutes | Semi-automatic (DNS) |

Architecture

┌─── Region A (Primary) ──────────────────────────────┐
│                                                       │
│  ┌─── AZ 1 ──────────┐    ┌─── AZ 2 ──────────┐    │
│  │ AAP Controller (P) │    │ AAP Controller (S) │    │
│  │ EDB PostgreSQL (P) │◄──►│ EDB PostgreSQL (S) │    │
│  │ Execution Nodes    │    │ Execution Nodes    │    │
│  └────────────────────┘    └────────────────────┘    │
│                   │                                   │
└───────────────────┼───────────────────────────────────┘
                    │ Async replication
┌─── Region B (DR) ─┼──────────────────────────────────┐
│                    ▼                                   │
│  ┌─── AZ 3 ──────────┐                               │
│  │ AAP Controller (S) │                               │
│  │ EDB PostgreSQL (S) │                               │
│  │ Execution Nodes    │                               │
│  └────────────────────┘                               │
└───────────────────────────────────────────────────────┘

EDB PostgreSQL Configuration

- name: Configure EDB Failover Manager
  hosts: db_servers
  roles:
    - role: edb.postgres.efm
      vars:
        efm_cluster_name: aap-cluster
        efm_notification_level: warning
        efm_auto_failover: true
        efm_auto_resume_period: 60
        efm_virtual_ip: "{{ vault_efm_vip }}"
        efm_bind_address: "{{ ansible_default_ipv4.address }}"

DR Failover Procedure

| Step | Action | Type | Time | |---|---|---|---| | 1 | Detect primary region failure | Automatic | ~30s | | 2 | Promote EDB standby to primary | Automatic | ~30s | | 3 | Update DNS to DR region | Manual/Automatic | ~60s | | 4 | Execution nodes reconnect | Automatic | ~30s | | 5 | Verify platform health | Automatic | ~30s | | Total | | | < 3 minutes |

FAQ

Is this topology available in AAP 2.7?

Yes. The single HA/DR topology is the recommended production deployment for AAP 2.7+.

Do I need EDB PostgreSQL or can I use standard PostgreSQL?

EDB PostgreSQL is recommended for the full HA/DR capability including automatic failover. Standard PostgreSQL works for non-HA deployments.

What happens to running jobs during failover?

Running jobs may fail during failover. The documented failure boundaries tell you exactly which jobs are safe to re-run after recovery.

Can I test DR without impacting production?

Yes. The DR blueprint includes test procedures for validating failover without affecting production workloads.

• Ansible Solution Guides: AIOps Partner Walkthroughs • Red Hat Ansible Automation Platform 2.7: What's New • Red Hat Summit 2026 Highlights

Category: events

Browse all Ansible tutorials · AnsiblePilot Home

AnsiblePilot — Master Ansible Automation

Popular Topics

About Luca Berton