AAP 2.6 Automation Mesh: Distributed Execution Across Sites and Networks
By Luca Berton · Published 2024-01-01 · Category: installation
Deploy and manage Automation Mesh in AAP 2.6 for distributed automation across data centers, DMZs, and cloud regions.
What Is Automation Mesh?
Automation Mesh is an overlay network in AAP 2.6 that separates control capacity from execution capacity. It lets you distribute automation execution across multiple sites, network zones, and cloud regions while maintaining centralized control through Automation Controller.
Mesh replaces the legacy isolated nodes concept from Ansible Tower with a more flexible, resilient, peer-to-peer architecture built on the Receptor protocol.
See also: Ansible Automation Mesh: Scalable Automation Across Hybrid Cloud Environments
Why Automation Mesh Matters
Without Mesh, all automation runs on or directly from the Controller nodes. This creates bottlenecks: • Network limitations — Controller must reach every managed host directly • Security concerns — Controller needs firewall access to every network zone • Scalability ceiling — Controller CPU/RAM limits concurrent job capacity • Latency — remote sites experience slow execution over WAN links
Mesh solves all of these by offloading execution to distributed nodes that are close to the managed hosts.
Mesh Node Types
| Node Type | Role | Runs Jobs? | Description | |-----------|------|-----------|-------------| | Control node | Automation Controller | No | Schedules and orchestrates jobs. Dispatches work to execution nodes. | | Execution node | Job runner | Yes | Runs Ansible playbooks inside Execution Environments. Place near managed hosts. | | Hop node | Network relay | No | Relays Receptor traffic between nodes. Does not run jobs. Used for DMZ/firewall traversal. | | Hybrid node | Control + Execution | Yes | Acts as both control and execution. Default for single-node deployments. |
See also: Ansible Automation Platform 2.6 Architecture and Components: Complete Guide
The Receptor Protocol
Automation Mesh uses Receptor (TCP port 27199) for all node-to-node communication: • Bidirectional — control nodes and execution nodes communicate in both directions • Encrypted — TLS by default • Resilient — automatic reconnection on network interruption • Efficient — multiplexed connections, reduced overhead vs SSH
How Receptor Differs from SSH
| Feature | Receptor (Mesh) | SSH (Legacy Isolated) | |---------|-----------------|----------------------| | Protocol | TCP 27199 | TCP 22 | | Direction | Bidirectional | Control → Isolated only | | Routing | Multi-hop peer-to-peer | Direct connection only | | Connection | Persistent, multiplexed | Per-job connection | | Failover | Automatic path selection | Manual reconfiguration | | Overhead | Low (single connection) | Higher (per-host SSH) |
Topology Patterns
Pattern 1: Simple Hub and Spoke
Best for single-site deployments with moderate scale.
┌──────────────┐
│ Controller │
│ (Control) │
└──┬────────┬───┘
│ │
┌─────┴──┐ ┌──┴─────┐
│ Exec 1 │ │ Exec 2 │
│(DC-A) │ │(DC-A) │
└────────┘ └────────┘
[automationcontroller]
controller.example.org
[automationcontroller:vars]
peers=execution_nodes
[execution_nodes]
exec1.example.org
exec2.example.org
Pattern 2: DMZ Traversal with Hop Nodes
Execution nodes in a restricted network zone reached through a hop node in the DMZ.
┌──────────────┐
│ Controller │ Corporate Network
│ (Control) │
└──────┬───────┘
│ TCP 27199
═══════╪═══════════ Firewall / DMZ
│
┌──────┴───────┐
│ Hop Node │ DMZ
└──────┬───────┘
│ TCP 27199
═══════╪═══════════ Firewall
┌───┴────────────┐
│ │
┌─────┴──┐ ┌─────┴──┐
│ Exec 1 │ │ Exec 2 │ Restricted Zone
│(Secure) │ │(Secure) │
└────────┘ └────────┘
[automationcontroller]
controller.example.org
[automationcontroller:vars]
peers=instance_group_hop
[instance_group_hop]
hop-dmz.example.org node_type='hop'
[instance_group_hop:vars]
peers=instance_group_secure
[instance_group_secure]
exec-secure1.example.org
exec-secure2.example.org
Pattern 3: Multi-Site with Regional Execution
Distribute execution across geographic regions for low-latency automation.
┌──────────────┐
│ Controller │
│ (HQ - US) │
└───┬──────┬───┘
│ │
┌─────────┘ └─────────┐
│ │
┌──────┴───────┐ ┌──────┴───────┐
│ Hop Node │ │ Hop Node │
│ (US-East) │ │ (EU-West) │
└──────┬───────┘ └──────┬───────┘
┌───┴───┐ ┌───┴───┐
│ │ │ │
┌───┴──┐ ┌──┴───┐ ┌───┴──┐ ┌──┴───┐
│Exec 1│ │Exec 2│ │Exec 3│ │Exec 4│
│US-E │ │US-E │ │EU-W │ │EU-W │
└──────┘ └──────┘ └──────┘ └──────┘
[automationcontroller]
controller-hq.example.org
[automationcontroller:vars]
peers=hop_nodes
[hop_nodes]
hop-us-east.example.org node_type='hop'
hop-eu-west.example.org node_type='hop'
[hop_nodes:vars]
peers=execution_nodes
[execution_nodes]
exec-use1.example.org peers=hop-us-east.example.org
exec-use2.example.org peers=hop-us-east.example.org
exec-euw1.example.org peers=hop-eu-west.example.org
exec-euw2.example.org peers=hop-eu-west.example.org
Pattern 4: Cloud Hybrid
On-premises Controller with execution nodes in multiple cloud providers.
┌──────────────┐
│ Controller │ On-Premises
└──────┬───────┘
│
┌──────┴───────┐
│ Hop Node │ On-Prem DMZ
└──┬────────┬──┘
│ │
┌─────┴──┐ ┌──┴─────┐
│AWS Exec│ │Azure │ Cloud
│Node │ │Exec │
└────────┘ │Node │
└────────┘
See also: Ansible for Edge Computing and IoT: Managing Thousands of Distributed Devices
Configuring Mesh in the Installer
Container Enterprise Topology
[automationcontroller]
controller1.example.org
controller2.example.org
[execution_nodes]
hop1.example.org receptor_type='hop'
exec1.example.org
exec2.example.org
[all:vars]
# Receptor configuration
receptor_port=27199
receptor_protocol=tcp
# TLS for mesh communication
receptor_tls_cert=/path/to/receptor.crt
receptor_tls_key=/path/to/receptor.key
RPM Enterprise Topology
[automationcontroller]
controller1.example.org
controller2.example.org
[automationcontroller:vars]
peers=execution_nodes
[execution_nodes]
hop1.example.org node_type='hop'
exec1.example.org
exec2.example.org
Instance Groups
Instance groups assign execution nodes to specific automation workloads:
# Create an instance group for network automation
- name: Create network instance group
ansible.platform.instance_group:
controller_host: "{{ gateway_url }}"
controller_username: "{{ controller_user }}"
controller_password: "{{ controller_pass }}"
name: "network-automation"
policy_instance_minimum: 1
policy_instance_percentage: 0
state: present
# Add execution nodes to the group
- name: Assign exec node to network group
ansible.platform.instance:
controller_host: "{{ gateway_url }}"
controller_username: "{{ controller_user }}"
controller_password: "{{ controller_pass }}"
hostname: "exec-network1.example.org"
managed_by_policy: true
node_type: "execution"
state: present
# Assign instance group to job template
- name: Use network group for router config
ansible.platform.job_template:
controller_host: "{{ gateway_url }}"
controller_username: "{{ controller_user }}"
controller_password: "{{ controller_pass }}"
name: "Router Configuration Backup"
instance_groups:
- "network-automation"
state: present
Instance Group Use Cases
| Instance Group | Execution Nodes | Use Case |
|----------------|----------------|----------|
| default | All general-purpose nodes | Standard automation |
| network-automation | Nodes with network access | Router/switch management |
| dmz-servers | Nodes in DMZ | Web server automation |
| cloud-aws | Nodes in AWS VPC | AWS resource management |
| compliance | Dedicated secure nodes | CIS/STIG scanning |
Mesh System Requirements
Per Red Hat tested configurations, each Automation Mesh node requires:
| Requirement | Minimum | |-------------|---------| | RAM | 16 GB | | CPUs | 4 | | Local disk | 60 GB | | Disk IOPS | 3000 | | OS | RHEL 9.4+ or RHEL 10+ |
Network Requirements
| Port | Protocol | Source | Destination | Purpose | |------|----------|--------|-------------|---------| | 27199 | TCP Receptor | Controller | Execution node | Direct mesh communication | | 27199 | TCP Receptor | Controller | Hop node | Relay mesh communication | | 27199 | TCP Receptor | Hop node | Execution node | Hop to execution relay | | 80/443 | TCP HTTPS | Execution node | Hub / Gateway | Pull EE images, report results |
Monitoring Mesh Health
Via the UI
Navigate to Administration → Topology View in Platform Gateway to see a visual map of all mesh nodes, their connections, and health status.
Via the API
# List all mesh instances
curl -s -k -H "Authorization: Bearer $TOKEN" \
"https://gateway.example.org/api/controller/v2/instances/" | \
jq '.results[] | {hostname: .hostname, node_type: .node_type, capacity: .capacity, errors: .errors}'
# Check receptor status on a node
receptorctl --socket /var/run/receptor/receptor.sock status
# View mesh routing table
receptorctl --socket /var/run/receptor/receptor.sock routes
Health Check Indicators
| Indicator | Healthy | Unhealthy | |-----------|---------|-----------| | Node capacity | > 0 | 0 (node offline or overloaded) | | Errors | Empty string | Error message present | | Last heartbeat | Within last 120s | > 120s ago | | Connection count | Expected peers | Missing peers |
Scaling Mesh
Adding Execution Nodes
Add new execution nodes to handle increased workload:
Provision a new RHEL 9 VM meeting minimum requirements
Add to the installer inventory under [execution_nodes]
Re-run the installer
Assign to appropriate instance groups
Capacity Planning
Each execution node's capacity is calculated based on available CPU and memory. The formula:
capacity = min(mem_capacity, cpu_capacity)
mem_capacity = (total_memory - reserved) / per_fork_memory
cpu_capacity = cpus * forks_per_cpu
Default values:
• per_fork_memory: 100 MB
• forks_per_cpu: 4
• Reserved memory: ~2 GB for OS
A node with 16 GB RAM and 4 CPUs typically has a capacity of ~56 forks (limited by CPU: 4 × 4 × ~3.5).
Troubleshooting
Execution Node Not Connecting
ERROR: receptor connection to exec1.example.org:27199 failed
Check:
Firewall allows TCP 27199 between Controller and execution node
Receptor service is running: systemctl status receptor
TLS certificates are valid and not expired
DNS resolution works for the node hostname
Jobs Stuck in Pending
Check: Instance group has healthy execution nodes Execution nodes have available capacity EE image can be pulled from the registry on execution nodes
Hop Node Not Relaying
Check:
Hop node has connectivity to both Controller and execution nodes on TCP 27199
Receptor is running and configured with node_type='hop'
Routing table shows expected paths: receptorctl routes
FAQ
Can execution nodes run without continuous Controller connectivity?
No. Execution nodes need a connection to the Controller (directly or through hop nodes) to receive job dispatches and return results. If the connection drops mid-job, Receptor will buffer and retry, but prolonged disconnection will cause job failures.
How many hop nodes do I need?
One hop node per network boundary is typical. For HA, deploy two hop nodes per boundary with both connected to execution nodes. Mesh automatically routes through available hop nodes.
Can I mix containerized and RPM execution nodes?
No. All mesh components in a single AAP deployment must use the same installation method. You cannot mix containerized Controller with RPM execution nodes.
What is the maximum mesh size?
Red Hat tests specific topologies (growth and enterprise). For very large meshes (50+ execution nodes), work with Red Hat to validate your topology. The practical limit depends on job frequency, Controller capacity, and network bandwidth.
Can execution nodes be in a different cloud than Controller?
Yes. This is the cloud hybrid pattern — Controller on-premises (or in one cloud) with execution nodes in other clouds. Use hop nodes in the DMZ to bridge network boundaries. Ensure latency between hop and execution nodes is acceptable (< 100ms recommended).
Conclusion
Automation Mesh is what transforms AAP from a single-site tool into an enterprise-scale distributed automation platform. By separating control from execution and adding hop-node routing, Mesh lets you automate anywhere — across data centers, DMZs, cloud regions, and air-gapped environments — while maintaining centralized visibility and control.
Related Articles
• AAP 2.6 Architecture and Components: Complete Guide • AAP 2.6 Workflow Templates: Advanced Multi-Step Automation Guide • AAP 2.6 RBAC and Gateway API • AAP 2.6 Security Best Practices • AAP 2.6 Execution Environments: Build, Manage, and Deploy Custom EEsCategory: installation