AAP 2.6 Job Scheduling and Capacity Planning Guide

By Luca Berton · Published 2024-01-01 · Category: troubleshooting

Plan and optimize AAP 2.6 capacity for enterprise workloads. Job scheduling strategies, instance group sizing, fork tuning, concurrent job limits, database sizing, and scaling patterns from 50 to 10,000+ managed hosts.

Capacity Planning for AAP 2.6

Proper capacity planning ensures jobs run on time without queue delays or resource exhaustion. AAP capacity depends on four factors: execution node count, memory per node, fork count per job, and concurrent job slots.

Understanding AAP Capacity

Capacity Formula

Each execution node's capacity is calculated as:

Default values: • fork_memory_MB = 100 MB per fork • forks_per_cpu = 4

Example: A node with 16 GB RAM and 4 CPUs:

This node supports 16 concurrent forks across all jobs.

Sizing Reference

| Environment | Managed Hosts | Execution Nodes | Node Spec | Controller | Database | |-------------|--------------|-----------------|-----------|------------|----------| | Small | 50-500 | 1-2 | 4 CPU, 16 GB | 4 CPU, 16 GB | 4 CPU, 16 GB | | Medium | 500-2,000 | 3-5 | 8 CPU, 32 GB | 8 CPU, 32 GB | 8 CPU, 32 GB | | Large | 2,000-10,000 | 5-15 | 16 CPU, 64 GB | 16 CPU, 64 GB | 16 CPU, 64 GB, SSD | | Enterprise | 10,000+ | 15-50+ | 16 CPU, 64 GB | 16 CPU, 64 GB (HA) | 32 CPU, 128 GB, NVMe |

Job Scheduling

Schedule Types

Maintenance Windows with Blackout Periods

Instance Groups and Job Routing

Define Instance Groups

Route Jobs to Instance Groups

Performance Tuning

Fork Optimization

| Workload | Recommended Forks | Why | |----------|-------------------|-----| | Configuration management | 50-100 | Standard SSH tasks | | Network devices | 5-20 | Devices have limited concurrent sessions | | Cloud provisioning | 10-30 | API rate limits | | Windows (WinRM) | 20-50 | WinRM is heavier than SSH | | Patching/updates | 10-25 | Downloads and reboots are sequential anyway |

ansible.cfg Performance Settings

Job Slicing

Split large inventories across multiple execution nodes:

With 1,000 hosts and job_slice_count: 5, each slice manages ~200 hosts on different execution nodes simultaneously.

Concurrent Job Limits

Database Sizing

PostgreSQL Requirements

| Metric | Small | Medium | Large | Enterprise | |--------|-------|--------|-------|-----------| | Job history (90 days) | 5 GB | 20 GB | 100 GB | 500 GB+ | | IOPS | 1,000 | 3,000 | 5,000 | 10,000+ | | Connections | 200 | 400 | 800 | 1,500+ | | Backup frequency | Daily | Daily | 6-hourly | Continuous (WAL) |

Database Maintenance

Monitoring Capacity

Key Metrics to Track

FAQ

How many execution nodes do I need?

Divide your total peak concurrent forks by per-node capacity. Example: 500 hosts patched with forks=50 needs 10 fork slots. Add overhead for concurrent jobs. Start with peak_concurrent_forks / per_node_capacity × 1.3 as a baseline.

Should I use job slicing or serial execution?

Job slicing for independent hosts (patching, config management). Serial execution (rolling update patterns with serial:) when hosts depend on each other (load balancer drain → update → restore).

What's the maximum number of managed hosts?

No hard limit. Largest known deployments manage 50,000+ nodes. The constraint is execution capacity and database I/O, not AAP software limits.

How do I prevent schedule collisions?

Stagger schedules by 15-30 minutes. Use instance groups to isolate workloads. Set max_concurrent_jobs on instance groups to prevent resource exhaustion. Use workflow templates to chain dependent jobs.

When should I scale horizontally vs vertically?

Scale execution nodes horizontally (add more nodes). Scale the database vertically (bigger instance). Controller nodes can be added for API HA but one handles most workloads.

Conclusion

Capacity planning for AAP 2.6 requires understanding the relationship between forks, memory, concurrent jobs, and execution nodes. Start with the sizing reference for your host count, tune fork settings per workload type, and use instance groups to isolate different automation domains. Monitor queue times and capacity metrics to scale proactively.

Category: troubleshooting

Browse all Ansible tutorials · AnsiblePilot Home

AnsiblePilot — Master Ansible Automation

Popular Topics

About Luca Berton

AAP 2.6 Job Scheduling and Capacity Planning Guide