Ansible Performance Tuning: Speed Up Playbooks 10x with These Optimizations
By Luca Berton · Published 2024-01-01 · Category: installation
Optimize Ansible playbook performance. Configure pipelining, SSH multiplexing, fact caching, async tasks, mitogen, forks, and callback plugins. Reduce execution time from hours to minutes at scale.
Why Ansible Feels Slow
Out of the box, Ansible establishes a new SSH connection for every task on every host. For a playbook with 20 tasks across 100 hosts, that's 2,000 SSH connections. Each connection involves TCP handshake, key exchange, authentication, and module transfer. At scale, this overhead dominates execution time.
The good news: most of this is fixable with configuration.
Quick Wins (ansible.cfg)
Impact of each setting:
| Setting | Default | Optimized | Speedup | |---------|---------|-----------|---------| | forks | 5 | 50 | ~10x on 50+ hosts | | pipelining | False | True | ~2x per task | | ControlPersist | none | 60s | ~3x for multi-task plays | | gathering: smart | implicit | smart | Skip unchanged facts | | fact_caching | off | jsonfile | Skip fact gathering entirely |
Pipelining
The single biggest performance improvement. Without pipelining, Ansible copies each module to the remote host via SFTP, executes it, and cleans up. With pipelining, modules are piped directly through the SSH connection.
Requirements: requiretty must NOT be set in /etc/sudoers on managed hosts. Most modern distributions don't set it. Check with:
If it's set, remove it or add an exception:
SSH Multiplexing
Reuse SSH connections instead of creating new ones for each task:
This creates a master SSH connection that subsequent connections reuse. The 60s persist means connections stay open for 60 seconds of idle time.
For large inventories, increase the control socket path length:
Forks — Parallel Execution
Set forks based on your control node's capacity: • Laptop: 20-30 • Dedicated control node: 50-100 • AAP controller: 200+
Monitor CPU and memory on the control node. Each fork uses ~50-100 MB RAM.
Fact Caching
Gathering facts (hostname, OS, IP, disks, etc.) takes 2-5 seconds per host. Cache them:
Or use Redis for shared caching across control nodes:
Disable Facts When Not Needed
Async Tasks
Run long tasks asynchronously and poll or check later:
Parallel Package Installation
Mitogen
Mitogen replaces Ansible's SSH module transfer mechanism with a persistent Python interpreter on remote hosts. It can provide 1.25x to 7x speedup depending on playbook complexity.
Caveats: • Not always compatible with latest Ansible versions • Doesn't work with all connection types • Some modules may behave differently • Test thoroughly before production use
Free Strategy — True Parallel Execution
Default strategy (linear) waits for all hosts to complete a task before moving to the next. free strategy lets each host proceed independently:
When to use free: • Tasks are independent (no cross-host dependencies) • Hosts have different speeds (fast hosts don't wait for slow ones) • Large-scale deployments where any parallelism helps
Don't use free when: • Task order matters across hosts (rolling updates) • You need serial execution (serial: 1)
Reduce Task Count
Use Package Lists Instead of Loops
Use ansible.builtin.template with loop for Multiple Files
Use ansible.builtin.copy with directory Recursion
Profile Your Playbooks
Enable Timing Callbacks
Output:
Custom Timing
Optimized ansible.cfg Template
FAQ
How much faster can Ansible get with tuning?
Typical improvement is 3-10x. A playbook taking 30 minutes can often be reduced to 5-10 minutes with pipelining, SSH multiplexing, increased forks, and fact caching. The biggest gains come from pipelining and forks.
Is Mitogen safe for production?
Mitogen is used in production by many organizations, but it has compatibility limitations with newer Ansible versions. Test thoroughly in staging first. The maintained fork at github.com/mitogen-hq/mitogen is the most reliable.
Should I always use strategy: free?
No. Use free only when tasks are independent across hosts. For most production deployments, linear (default) with serial for rolling updates is safer. free is best for read-only operations like compliance checks or fact gathering.
Why is gather_facts so slow?
Ansible runs the setup module which collects hundreds of facts (hardware, network, OS, mounts, etc.). Use gather_subset to collect only what you need:
How many forks should I use?
Rule of thumb: start with the number of CPU cores × 5. A 4-core machine handles 20 forks well. Monitor memory — each fork uses ~50-100 MB. For 100+ forks, use a dedicated control node with 8+ GB RAM.
Conclusion
Ansible performance tuning is about reducing connection overhead and maximizing parallelism. Enable pipelining, SSH multiplexing, and fact caching in ansible.cfg. Increase forks to match your infrastructure scale. Use async for long-running tasks, free strategy for independent operations, and profile with callback plugins to find bottlenecks. These optimizations routinely deliver 5-10x speedup with minimal effort.
Related Articles • Ansible Documentation Complete Guide • Ansible Lint Complete Guide • AAP 2.6 Automation Mesh • Ansible Handlers Complete Guide • Ansible Troubleshooting Guide
Category: installation