Ansible Troubleshooting: Common Errors, Solutions, and Debug Techniques
By Luca Berton · Published 2024-01-01 · Category: installation
Fix common Ansible errors fast. Debug playbook failures, SSH connection issues, module errors, variable problems, and permission errors.
Debug Modes
Verbosity Levels
# Level 1: Task results
ansible-playbook site.yml -v
# Level 2: Task input + results
ansible-playbook site.yml -vv
# Level 3: Connection debugging
ansible-playbook site.yml -vvv
# Level 4: Full SSH/WinRM debugging
ansible-playbook site.yml -vvvv
Debug Module
- name: Print variable value
ansible.builtin.debug:
var: my_variable
- name: Print formatted message
ansible.builtin.debug:
msg: "Host {{ inventory_hostname }} has IP {{ ansible_host }} and OS {{ ansible_distribution }}"
- name: Print all variables for a host
ansible.builtin.debug:
var: hostvars[inventory_hostname]
Debugger
# Enable debugger on failure
- name: Problematic task
ansible.builtin.command:
cmd: /opt/app/start
debugger: on_failed
# Enable for entire play
- hosts: all
debugger: on_failed
tasks:
# ...
Debugger commands:
• p task — print task info
• p task.args — print task arguments
• p host — print current host
• p result — print task result
• p vars — print all variables
• r — retry the task
• c — continue to next task
• q — quit
Check Mode (Dry Run)
# Preview changes without applying
ansible-playbook site.yml --check --diff
Step Mode
# Confirm each task before running
ansible-playbook site.yml --step
See also: Ansible Troubleshooting Installation Issues on macOS and Python
Connection Errors
"UNREACHABLE! => No route to host"
fatal: [web-01]: UNREACHABLE! => {"msg": "Failed to connect to the host via ssh: ssh: connect to host 10.0.1.10 port 22: No route to host"}
Fixes:
# 1. Verify connectivity
ping 10.0.1.10
# 2. Check SSH directly
ssh -o ConnectTimeout=5 user@10.0.1.10
# 3. Check firewall on target
sudo iptables -L -n | grep 22
sudo firewall-cmd --list-ports
# 4. Verify inventory hostname/IP
ansible-inventory --host web-01
# 5. Check ansible_host variable
ansible web-01 -m ping -vvv
"Permission denied (publickey,password)"
fatal: [web-01]: UNREACHABLE! => {"msg": "Failed to connect to the host via ssh: Permission denied (publickey,password)."}
Fixes:
# 1. Verify SSH key
ssh -i ~/.ssh/id_ed25519 user@10.0.1.10
# 2. Check key permissions
chmod 600 ~/.ssh/id_ed25519
chmod 700 ~/.ssh
# 3. Specify key in inventory
# ansible_ssh_private_key_file: ~/.ssh/id_ed25519
# 4. Use ssh-agent
eval $(ssh-agent)
ssh-add ~/.ssh/id_ed25519
# 5. Enable password auth (temporary)
ansible-playbook site.yml --ask-pass
"Host key verification failed"
# Quick fix (controlled environments)
export ANSIBLE_HOST_KEY_CHECKING=False
# Or in ansible.cfg
[defaults]
host_key_checking = False
# Proper fix: accept host keys
ssh-keyscan -H 10.0.1.10 >> ~/.ssh/known_hosts
SSH Timeout
# ansible.cfg — increase timeouts
[defaults]
timeout = 30
[ssh_connection]
ssh_args = -o ConnectTimeout=30 -o ServerAliveInterval=15 -o ServerAliveCountMax=3
Module Errors
"MODULE FAILURE: No module named 'xyz'"
# Verify collection is installed
ansible-galaxy collection list | grep community.general
# Install missing collection
ansible-galaxy collection install community.general
# Check module exists
ansible-doc community.general.xyz
"Unsupported parameters"
fatal: [web-01]: FAILED! => {"msg": "Unsupported parameters for (ansible.builtin.apt) module: autoremove."}
Fix: Check documentation for correct parameter names and your Ansible version:
# Show module documentation
ansible-doc ansible.builtin.apt
# Check Ansible version
ansible --version
"Missing required arguments"
fatal: [web-01]: FAILED! => {"msg": "missing required arguments: name"}
Fix: Read the module docs and supply all required parameters:
ansible-doc ansible.builtin.apt | grep "required:"
"No such file or directory"
# ❌ Template not found
- name: Deploy config
ansible.builtin.template:
src: config.j2 # Looks in templates/ directory of role
dest: /etc/app/config.yml
# ✅ Specify full path or check role structure
# Role: roles/myapp/templates/config.j2
# Playbook: templates/config.j2 (relative to playbook)
See also: AAP 2.6 Troubleshooting Guide: Common Issues and Solutions
Variable Errors
"undefined variable"
fatal: [web-01]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'app_version' is undefined"}
Fixes:
# 1. Set default value
app_port: "{{ app_port | default(8080) }}"
# 2. Check variable precedence
# Extra vars > task vars > block vars > role vars > play vars > host vars > group vars > defaults
# 3. Verify variable is defined
- name: Check required variables
ansible.builtin.assert:
that:
- app_version is defined
- app_version | length > 0
fail_msg: "app_version must be defined"
# 4. Debug where variable comes from
- ansible.builtin.debug:
msg: "{{ lookup('vars', 'app_version', default='NOT DEFINED') }}"
"dict object has no attribute"
# ❌ Accessing nested dict that might not exist
msg: "{{ result.json.data.name }}"
# ✅ Safe access with default
msg: "{{ result.json.data.name | default('unknown') }}"
# ✅ Or check first
when: result.json.data is defined and result.json.data.name is defined
Variable Precedence Issues
# Show effective variables for a host
ansible-inventory --host web-01 --yaml
# Show all variable sources
ansible-playbook site.yml -e "debug_vars=true" -vv
Permission Errors
"sudo: a password is required"
# Option 1: Ask for become password
# ansible-playbook site.yml --ask-become-pass
# Option 2: Set in inventory
# ansible_become_password: "{{ vault_sudo_password }}"
# Option 3: Passwordless sudo (recommended)
# On target host: echo "ansible ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/ansible
"Failed to set permissions on the temporary files"
fatal: [web-01]: FAILED! => {"msg": "Failed to set permissions on the temporary files Ansible needs to create when becoming an unprivileged user"}
Fix: Enable pipelining or set allow_world_readable_tmpfiles:
[defaults]
# Best fix: enable pipelining
pipelining = True
# Alternative (less secure)
# allow_world_readable_tmpfiles = True
See also: Ansible Undefined Variable Error: 12 Real Examples and Fixes
Playbook Errors
"Syntax Error"
# Check syntax before running
ansible-playbook site.yml --syntax-check
# Lint for best practices
ansible-lint site.yml
"ERROR! Unexpected Exception during execution"
Usually a Python issue on the control node:
# Check Python version
python3 --version
# Reinstall Ansible
pip install --upgrade ansible
# Check for conflicting packages
pip check
"Conditional check failed"
# ❌ Wrong: Jinja2 in when clause
- name: Install package
ansible.builtin.apt:
name: nginx
when: "{{ install_nginx }}" # WRONG — double templating
# ✅ Correct: bare variable in when
- name: Install package
ansible.builtin.apt:
name: nginx
when: install_nginx # Correct
# ✅ Correct: string comparison
when: ansible_distribution == "Ubuntu"
# ✅ Correct: version comparison
when: ansible_distribution_version is version('22.04', '>=')
Task Hanging / Never Completing
# Add timeout
- name: Long-running task
ansible.builtin.command:
cmd: /opt/app/rebuild
async: 600 # Max 10 minutes
poll: 30 # Check every 30 seconds
# Or use timeout at connection level
[defaults]
timeout = 60
Systematic Troubleshooting
Step 1: Isolate the Problem
# Test single host
ansible web-01 -m ping
# Test single task
ansible web-01 -m ansible.builtin.apt -a "name=nginx state=present" -b
# Run single play
ansible-playbook site.yml --limit web-01 --start-at-task "Deploy config"
# Run with tags
ansible-playbook site.yml --tags deploy
Step 2: Check the Basics
# Verify inventory
ansible-inventory --graph
ansible-inventory --host problematic-host
# Verify connectivity
ansible all -m ping
# Verify Python on remote
ansible web-01 -m raw -a "python3 --version"
# Verify become works
ansible web-01 -m command -a "whoami" -b
Step 3: Read the Full Error
Ansible error messages contain useful details. Look for:
• msg: Human-readable error description
• stderr: Command stderr output
• stdout: Command stdout output
• rc: Return code (non-zero = failure)
• module_stderr: Module-specific errors
# Capture and display full error
- name: Run command
ansible.builtin.command:
cmd: /opt/app/start
register: result
ignore_errors: true
- name: Show error details
ansible.builtin.debug:
var: result
when: result is failed
Step 4: Use ansible-config
# Show current configuration
ansible-config dump --only-changed
# Show where config values come from
ansible-config dump | grep -i timeout
# Validate config file
ansible-config view
Logging
# ansible.cfg
[defaults]
log_path = /var/log/ansible/ansible.log
# Or set environment variable
# export ANSIBLE_LOG_PATH=/var/log/ansible/ansible.log
FAQ
How do I debug a specific task?
Use ansible-playbook site.yml --start-at-task "Task Name" -vvv --limit single-host. This runs from that specific task with full debugging on a single host.
Why does my playbook work manually but fail in cron?
Cron has a minimal environment. Common issues: (1) PATH doesn't include /usr/local/bin, (2) SSH agent isn't available, (3) ansible.cfg isn't found. Fix: use full paths, specify ANSIBLE_CONFIG, and use --private-key instead of ssh-agent.
How do I find which variable is overriding mine?
Use ansible-inventory --host hostname --yaml to see effective variables. Add -vvv to playbook runs to see where each variable is loaded from. Remember the variable precedence order.
How do I handle intermittent failures?
Use retries and delay:
- name: Wait for service
ansible.builtin.uri:
url: http://localhost:8080/health
register: result
retries: 10
delay: 5
until: result.status == 200
Conclusion
Ansible troubleshooting follows a consistent pattern: increase verbosity (-vvv), isolate the problem (single host, single task), check the basics (connectivity, permissions, Python), read the full error message, and fix systematically. Enable profile_tasks callback to identify slow tasks, use --check --diff for safe previewing, and ansible-lint to catch issues before they hit production.
Related Articles
• Ansible Performance Tuning • Ansible Lint Complete Guide • Ansible Documentation Complete Guide • Ansible Handlers Complete Guide • Install Ansible Complete Guide • Fix 'Use loop or with_' Deprecated Error • Failed Ansible Installation on Amazon Linux 2022 • Handle Non-Compliant Variable Names • Handle Variable Dependencies Without Breaking • Fixing Kubernetes PersistentVolume ErrorCategory: installation