AnsiblePilot — Master Ansible Automation

AnsiblePilot is the leading resource for learning Ansible automation, DevOps, and infrastructure as code. Browse over 1,400 tutorials covering Ansible modules, playbooks, roles, collections, and real-world examples. Whether you are a beginner or an experienced engineer, our step-by-step guides help you automate Linux, Windows, cloud, containers, and network infrastructure.

Popular Topics

About Luca Berton

Luca Berton is an Ansible automation expert, author of 8 Ansible books published by Apress and Leanpub including "Ansible for VMware by Examples" and "Ansible for Kubernetes by Example", and creator of the Ansible Pilot YouTube channel. He shares practical automation knowledge through tutorials, books, and video courses to help IT professionals and DevOps engineers master infrastructure automation.

AAP 2.6 Automation Mesh: Distributed Execution Across Sites and Networks

By Luca Berton · Published 2024-01-01 · Category: installation

Deploy and manage Automation Mesh in AAP 2.6 for distributed automation across data centers, DMZs, and cloud regions.

What Is Automation Mesh?

Automation Mesh is an overlay network in AAP 2.6 that separates control capacity from execution capacity. It lets you distribute automation execution across multiple sites, network zones, and cloud regions while maintaining centralized control through Automation Controller.

Mesh replaces the legacy isolated nodes concept from Ansible Tower with a more flexible, resilient, peer-to-peer architecture built on the Receptor protocol.

See also: Ansible Automation Mesh: Scalable Automation Across Hybrid Cloud Environments

Why Automation Mesh Matters

Without Mesh, all automation runs on or directly from the Controller nodes. This creates bottlenecks: • Network limitations — Controller must reach every managed host directly • Security concerns — Controller needs firewall access to every network zone • Scalability ceiling — Controller CPU/RAM limits concurrent job capacity • Latency — remote sites experience slow execution over WAN links

Mesh solves all of these by offloading execution to distributed nodes that are close to the managed hosts.

Mesh Node Types

| Node Type | Role | Runs Jobs? | Description | |-----------|------|-----------|-------------| | Control node | Automation Controller | No | Schedules and orchestrates jobs. Dispatches work to execution nodes. | | Execution node | Job runner | Yes | Runs Ansible playbooks inside Execution Environments. Place near managed hosts. | | Hop node | Network relay | No | Relays Receptor traffic between nodes. Does not run jobs. Used for DMZ/firewall traversal. | | Hybrid node | Control + Execution | Yes | Acts as both control and execution. Default for single-node deployments. |

See also: Ansible Automation Platform 2.6 Architecture and Components: Complete Guide

The Receptor Protocol

Automation Mesh uses Receptor (TCP port 27199) for all node-to-node communication: • Bidirectional — control nodes and execution nodes communicate in both directions • Encrypted — TLS by default • Resilient — automatic reconnection on network interruption • Efficient — multiplexed connections, reduced overhead vs SSH

How Receptor Differs from SSH

| Feature | Receptor (Mesh) | SSH (Legacy Isolated) | |---------|-----------------|----------------------| | Protocol | TCP 27199 | TCP 22 | | Direction | Bidirectional | Control → Isolated only | | Routing | Multi-hop peer-to-peer | Direct connection only | | Connection | Persistent, multiplexed | Per-job connection | | Failover | Automatic path selection | Manual reconfiguration | | Overhead | Low (single connection) | Higher (per-host SSH) |

Topology Patterns

Pattern 1: Simple Hub and Spoke

Best for single-site deployments with moderate scale.

        ┌──────────────┐
        │  Controller   │
        │  (Control)    │
        └──┬────────┬───┘
           │        │
     ┌─────┴──┐  ┌──┴─────┐
     │ Exec 1 │  │ Exec 2 │
     │(DC-A)  │  │(DC-A)  │
     └────────┘  └────────┘
[automationcontroller]
controller.example.org

[automationcontroller:vars] peers=execution_nodes

[execution_nodes] exec1.example.org exec2.example.org

Pattern 2: DMZ Traversal with Hop Nodes

Execution nodes in a restricted network zone reached through a hop node in the DMZ.

     ┌──────────────┐
     │  Controller   │  Corporate Network
     │  (Control)    │
     └──────┬───────┘
            │ TCP 27199
     ═══════╪═══════════  Firewall / DMZ
            │
     ┌──────┴───────┐
     │   Hop Node   │   DMZ
     └──────┬───────┘
            │ TCP 27199
     ═══════╪═══════════  Firewall
        ┌───┴────────────┐
        │                │
  ┌─────┴──┐      ┌─────┴──┐
  │ Exec 1 │      │ Exec 2 │  Restricted Zone
  │(Secure) │      │(Secure) │
  └────────┘      └────────┘
[automationcontroller]
controller.example.org

[automationcontroller:vars] peers=instance_group_hop

[instance_group_hop] hop-dmz.example.org node_type='hop'

[instance_group_hop:vars] peers=instance_group_secure

[instance_group_secure] exec-secure1.example.org exec-secure2.example.org

Pattern 3: Multi-Site with Regional Execution

Distribute execution across geographic regions for low-latency automation.

                ┌──────────────┐
                │  Controller   │
                │  (HQ - US)    │
                └───┬──────┬───┘
                    │      │
          ┌─────────┘      └─────────┐
          │                          │
   ┌──────┴───────┐          ┌──────┴───────┐
   │   Hop Node   │          │   Hop Node   │
   │  (US-East)   │          │  (EU-West)   │
   └──────┬───────┘          └──────┬───────┘
      ┌───┴───┐                 ┌───┴───┐
      │       │                 │       │
  ┌───┴──┐ ┌──┴───┐       ┌───┴──┐ ┌──┴───┐
  │Exec 1│ │Exec 2│       │Exec 3│ │Exec 4│
  │US-E  │ │US-E  │       │EU-W  │ │EU-W  │
  └──────┘ └──────┘       └──────┘ └──────┘
[automationcontroller]
controller-hq.example.org

[automationcontroller:vars] peers=hop_nodes

[hop_nodes] hop-us-east.example.org node_type='hop' hop-eu-west.example.org node_type='hop'

[hop_nodes:vars] peers=execution_nodes

[execution_nodes] exec-use1.example.org peers=hop-us-east.example.org exec-use2.example.org peers=hop-us-east.example.org exec-euw1.example.org peers=hop-eu-west.example.org exec-euw2.example.org peers=hop-eu-west.example.org

Pattern 4: Cloud Hybrid

On-premises Controller with execution nodes in multiple cloud providers.

     ┌──────────────┐
     │  Controller   │  On-Premises
     └──────┬───────┘
            │
     ┌──────┴───────┐
     │   Hop Node   │  On-Prem DMZ
     └──┬────────┬──┘
        │        │
  ┌─────┴──┐ ┌──┴─────┐
  │AWS Exec│ │Azure   │  Cloud
  │Node    │ │Exec    │
  └────────┘ │Node    │
             └────────┘

See also: Ansible for Edge Computing and IoT: Managing Thousands of Distributed Devices

Configuring Mesh in the Installer

Container Enterprise Topology

[automationcontroller]
controller1.example.org
controller2.example.org

[execution_nodes] hop1.example.org receptor_type='hop' exec1.example.org exec2.example.org

[all:vars] # Receptor configuration receptor_port=27199 receptor_protocol=tcp

# TLS for mesh communication receptor_tls_cert=/path/to/receptor.crt receptor_tls_key=/path/to/receptor.key

RPM Enterprise Topology

[automationcontroller]
controller1.example.org
controller2.example.org

[automationcontroller:vars] peers=execution_nodes

[execution_nodes] hop1.example.org node_type='hop' exec1.example.org exec2.example.org

Instance Groups

Instance groups assign execution nodes to specific automation workloads:

# Create an instance group for network automation
- name: Create network instance group
  ansible.platform.instance_group:
    controller_host: "{{ gateway_url }}"
    controller_username: "{{ controller_user }}"
    controller_password: "{{ controller_pass }}"
    name: "network-automation"
    policy_instance_minimum: 1
    policy_instance_percentage: 0
    state: present

# Add execution nodes to the group - name: Assign exec node to network group ansible.platform.instance: controller_host: "{{ gateway_url }}" controller_username: "{{ controller_user }}" controller_password: "{{ controller_pass }}" hostname: "exec-network1.example.org" managed_by_policy: true node_type: "execution" state: present

# Assign instance group to job template - name: Use network group for router config ansible.platform.job_template: controller_host: "{{ gateway_url }}" controller_username: "{{ controller_user }}" controller_password: "{{ controller_pass }}" name: "Router Configuration Backup" instance_groups: - "network-automation" state: present

Instance Group Use Cases

| Instance Group | Execution Nodes | Use Case | |----------------|----------------|----------| | default | All general-purpose nodes | Standard automation | | network-automation | Nodes with network access | Router/switch management | | dmz-servers | Nodes in DMZ | Web server automation | | cloud-aws | Nodes in AWS VPC | AWS resource management | | compliance | Dedicated secure nodes | CIS/STIG scanning |

Mesh System Requirements

Per Red Hat tested configurations, each Automation Mesh node requires:

| Requirement | Minimum | |-------------|---------| | RAM | 16 GB | | CPUs | 4 | | Local disk | 60 GB | | Disk IOPS | 3000 | | OS | RHEL 9.4+ or RHEL 10+ |

Network Requirements

| Port | Protocol | Source | Destination | Purpose | |------|----------|--------|-------------|---------| | 27199 | TCP Receptor | Controller | Execution node | Direct mesh communication | | 27199 | TCP Receptor | Controller | Hop node | Relay mesh communication | | 27199 | TCP Receptor | Hop node | Execution node | Hop to execution relay | | 80/443 | TCP HTTPS | Execution node | Hub / Gateway | Pull EE images, report results |

Monitoring Mesh Health

Via the UI

Navigate to Administration → Topology View in Platform Gateway to see a visual map of all mesh nodes, their connections, and health status.

Via the API

# List all mesh instances
curl -s -k -H "Authorization: Bearer $TOKEN" \
  "https://gateway.example.org/api/controller/v2/instances/" | \
  jq '.results[] | {hostname: .hostname, node_type: .node_type, capacity: .capacity, errors: .errors}'

# Check receptor status on a node receptorctl --socket /var/run/receptor/receptor.sock status

# View mesh routing table receptorctl --socket /var/run/receptor/receptor.sock routes

Health Check Indicators

| Indicator | Healthy | Unhealthy | |-----------|---------|-----------| | Node capacity | > 0 | 0 (node offline or overloaded) | | Errors | Empty string | Error message present | | Last heartbeat | Within last 120s | > 120s ago | | Connection count | Expected peers | Missing peers |

Scaling Mesh

Adding Execution Nodes

Add new execution nodes to handle increased workload: Provision a new RHEL 9 VM meeting minimum requirements Add to the installer inventory under [execution_nodes] Re-run the installer Assign to appropriate instance groups

Capacity Planning

Each execution node's capacity is calculated based on available CPU and memory. The formula:

capacity = min(mem_capacity, cpu_capacity)
mem_capacity = (total_memory - reserved) / per_fork_memory
cpu_capacity = cpus * forks_per_cpu

Default values: • per_fork_memory: 100 MB • forks_per_cpu: 4 • Reserved memory: ~2 GB for OS

A node with 16 GB RAM and 4 CPUs typically has a capacity of ~56 forks (limited by CPU: 4 × 4 × ~3.5).

Troubleshooting

Execution Node Not Connecting

ERROR: receptor connection to exec1.example.org:27199 failed

Check: Firewall allows TCP 27199 between Controller and execution node Receptor service is running: systemctl status receptor TLS certificates are valid and not expired DNS resolution works for the node hostname

Jobs Stuck in Pending

Check: Instance group has healthy execution nodes Execution nodes have available capacity EE image can be pulled from the registry on execution nodes

Hop Node Not Relaying

Check: Hop node has connectivity to both Controller and execution nodes on TCP 27199 Receptor is running and configured with node_type='hop' Routing table shows expected paths: receptorctl routes

FAQ

Can execution nodes run without continuous Controller connectivity?

No. Execution nodes need a connection to the Controller (directly or through hop nodes) to receive job dispatches and return results. If the connection drops mid-job, Receptor will buffer and retry, but prolonged disconnection will cause job failures.

How many hop nodes do I need?

One hop node per network boundary is typical. For HA, deploy two hop nodes per boundary with both connected to execution nodes. Mesh automatically routes through available hop nodes.

Can I mix containerized and RPM execution nodes?

No. All mesh components in a single AAP deployment must use the same installation method. You cannot mix containerized Controller with RPM execution nodes.

What is the maximum mesh size?

Red Hat tests specific topologies (growth and enterprise). For very large meshes (50+ execution nodes), work with Red Hat to validate your topology. The practical limit depends on job frequency, Controller capacity, and network bandwidth.

Can execution nodes be in a different cloud than Controller?

Yes. This is the cloud hybrid pattern — Controller on-premises (or in one cloud) with execution nodes in other clouds. Use hop nodes in the DMZ to bridge network boundaries. Ensure latency between hop and execution nodes is acceptable (< 100ms recommended).

Conclusion

Automation Mesh is what transforms AAP from a single-site tool into an enterprise-scale distributed automation platform. By separating control from execution and adding hop-node routing, Mesh lets you automate anywhere — across data centers, DMZs, cloud regions, and air-gapped environments — while maintaining centralized visibility and control.

Related Articles

AAP 2.6 Architecture and Components: Complete GuideAAP 2.6 Workflow Templates: Advanced Multi-Step Automation GuideAAP 2.6 RBAC and Gateway APIAAP 2.6 Security Best PracticesAAP 2.6 Execution Environments: Build, Manage, and Deploy Custom EEs

Category: installation

Browse all Ansible tutorials · AnsiblePilot Home