AnsiblePilot — Master Ansible Automation

AnsiblePilot is the leading resource for learning Ansible automation, DevOps, and infrastructure as code. Browse over 1,400 tutorials covering Ansible modules, playbooks, roles, collections, and real-world examples. Whether you are a beginner or an experienced engineer, our step-by-step guides help you automate Linux, Windows, cloud, containers, and network infrastructure.

Popular Topics

About Luca Berton

Luca Berton is an Ansible automation expert, author of 8 Ansible books published by Apress and Leanpub including "Ansible for VMware by Examples" and "Ansible for Kubernetes by Example", and creator of the Ansible Pilot YouTube channel. He shares practical automation knowledge through tutorials, books, and video courses to help IT professionals and DevOps engineers master infrastructure automation.

Ansible Automation Platform High Availability and Disaster Recovery: Single Topology Architecture

By Luca Berton · Published 2024-01-01 · Category: events

AAP HA/DR architecture with proven db failover under 60s, full AZ recovery under 3 min, and EDB PostgreSQL partnership.

AAP now provides a single topology built with High Availability and Disaster Recovery — proven platform resilience with quantified recovery times, documented failure boundaries, and a tested DR blueprint built on the EDB partnership.

Four Enterprise Guarantees

1. Platform Resilience Is Proven, Not Assumed

Verified automatic recovery from: • Component failure — individual service restarts • Database failover — under 60 seconds • Full AZ loss — recovery under 3 minutes • Zero human intervention required for all scenarios

2. Failure Boundaries Are Documented, Not Discovered in Production

Test scenarios characterize exactly what survives a platform disruption and what does not — giving operators a clear, tested contract for how to design automation that is safe to re-run after any failure event.

3. Complete, Tested Disaster Recovery Blueprint

Full region DR scenarios validate the entire failover chain: Database promotion DNS cutover Execution reconnection

Every step classified as automatic or manual and timed against real infrastructure. This is a tested blueprint — not a runbook on paper.

4. Built on the EDB Partnership

Any disaster recovery for a platform requires a strong data strategy. AAP leverages the EDB partnership to make data resiliency top of mind within the topology.

See also: Ansible Disaster Recovery Automation: Backup, Failover, and Recovery Playbooks

Recovery Time Objectives

| Failure Scenario | Recovery Time | Intervention | |---|---|---| | Single component failure | Seconds | Automatic | | Database failover | < 60 seconds | Automatic | | Full Availability Zone loss | < 3 minutes | Automatic | | Full region DR | Minutes | Semi-automatic (DNS) |

Architecture

┌─── Region A (Primary) ──────────────────────────────┐
│                                                       │
│  ┌─── AZ 1 ──────────┐    ┌─── AZ 2 ──────────┐    │
│  │ AAP Controller (P) │    │ AAP Controller (S) │    │
│  │ EDB PostgreSQL (P) │◄──►│ EDB PostgreSQL (S) │    │
│  │ Execution Nodes    │    │ Execution Nodes    │    │
│  └────────────────────┘    └────────────────────┘    │
│                   │                                   │
└───────────────────┼───────────────────────────────────┘
                    │ Async replication
┌─── Region B (DR) ─┼──────────────────────────────────┐
│                    ▼                                   │
│  ┌─── AZ 3 ──────────┐                               │
│  │ AAP Controller (S) │                               │
│  │ EDB PostgreSQL (S) │                               │
│  │ Execution Nodes    │                               │
│  └────────────────────┘                               │
└───────────────────────────────────────────────────────┘

See also: AAP 2.6 Backup, Restore, and Disaster Recovery Guide

EDB PostgreSQL Configuration

- name: Configure EDB Failover Manager
  hosts: db_servers
  roles:
    - role: edb.postgres.efm
      vars:
        efm_cluster_name: aap-cluster
        efm_notification_level: warning
        efm_auto_failover: true
        efm_auto_resume_period: 60
        efm_virtual_ip: "{{ vault_efm_vip }}"
        efm_bind_address: "{{ ansible_default_ipv4.address }}"

DR Failover Procedure

| Step | Action | Type | Time | |---|---|---|---| | 1 | Detect primary region failure | Automatic | ~30s | | 2 | Promote EDB standby to primary | Automatic | ~30s | | 3 | Update DNS to DR region | Manual/Automatic | ~60s | | 4 | Execution nodes reconnect | Automatic | ~30s | | 5 | Verify platform health | Automatic | ~30s | | Total | | | < 3 minutes |

See also: Ansible Private Automation Hub: Host & Manage Collections (Guide)

FAQ

Is this topology available in AAP 2.7?

Yes. The single HA/DR topology is the recommended production deployment for AAP 2.7+.

Do I need EDB PostgreSQL or can I use standard PostgreSQL?

EDB PostgreSQL is recommended for the full HA/DR capability including automatic failover. Standard PostgreSQL works for non-HA deployments.

What happens to running jobs during failover?

Running jobs may fail during failover. The documented failure boundaries tell you exactly which jobs are safe to re-run after recovery.

Can I test DR without impacting production?

Yes. The DR blueprint includes test procedures for validating failover without affecting production workloads.

Related Articles

Ansible Solution Guides: AIOps Partner WalkthroughsRed Hat Ansible Automation Platform 2.7: What's NewRed Hat Summit 2026 Highlights

Category: events

Browse all Ansible tutorials · AnsiblePilot Home