AnsiblePilot — Master Ansible Automation

AnsiblePilot is the leading resource for learning Ansible automation, DevOps, and infrastructure as code. Browse over 1,400 tutorials covering Ansible modules, playbooks, roles, collections, and real-world examples. Whether you are a beginner or an experienced engineer, our step-by-step guides help you automate Linux, Windows, cloud, containers, and network infrastructure.

Popular Topics

About Luca Berton

Luca Berton is an Ansible automation expert, author of 8 Ansible books published by Apress and Leanpub including "Ansible for VMware by Examples" and "Ansible for Kubernetes by Example", and creator of the Ansible Pilot YouTube channel. He shares practical automation knowledge through tutorials, books, and video courses to help IT professionals and DevOps engineers master infrastructure automation.

Ansible Troubleshooting: Common Errors, Solutions, and Debug Techniques

By Luca Berton · Published 2024-01-01 · Category: installation

Fix common Ansible errors fast. Debug playbook failures, SSH connection issues, module errors, variable problems, and permission errors.

Debug Modes

Verbosity Levels

# Level 1: Task results
ansible-playbook site.yml -v

# Level 2: Task input + results ansible-playbook site.yml -vv

# Level 3: Connection debugging ansible-playbook site.yml -vvv

# Level 4: Full SSH/WinRM debugging ansible-playbook site.yml -vvvv

Debug Module

- name: Print variable value
  ansible.builtin.debug:
    var: my_variable

- name: Print formatted message ansible.builtin.debug: msg: "Host {{ inventory_hostname }} has IP {{ ansible_host }} and OS {{ ansible_distribution }}"

- name: Print all variables for a host ansible.builtin.debug: var: hostvars[inventory_hostname]

Debugger

# Enable debugger on failure
- name: Problematic task
  ansible.builtin.command:
    cmd: /opt/app/start
  debugger: on_failed

# Enable for entire play - hosts: all debugger: on_failed tasks: # ...

Debugger commands: • p task — print task info • p task.args — print task arguments • p host — print current host • p result — print task result • p vars — print all variables • r — retry the task • c — continue to next task • q — quit

Check Mode (Dry Run)

# Preview changes without applying
ansible-playbook site.yml --check --diff

Step Mode

# Confirm each task before running
ansible-playbook site.yml --step

See also: Ansible Troubleshooting Installation Issues on macOS and Python

Connection Errors

"UNREACHABLE! => No route to host"

fatal: [web-01]: UNREACHABLE! => {"msg": "Failed to connect to the host via ssh: ssh: connect to host 10.0.1.10 port 22: No route to host"}

Fixes:

# 1. Verify connectivity
ping 10.0.1.10

# 2. Check SSH directly ssh -o ConnectTimeout=5 user@10.0.1.10

# 3. Check firewall on target sudo iptables -L -n | grep 22 sudo firewall-cmd --list-ports

# 4. Verify inventory hostname/IP ansible-inventory --host web-01

# 5. Check ansible_host variable ansible web-01 -m ping -vvv

"Permission denied (publickey,password)"

fatal: [web-01]: UNREACHABLE! => {"msg": "Failed to connect to the host via ssh: Permission denied (publickey,password)."}

Fixes:

# 1. Verify SSH key
ssh -i ~/.ssh/id_ed25519 user@10.0.1.10

# 2. Check key permissions chmod 600 ~/.ssh/id_ed25519 chmod 700 ~/.ssh

# 3. Specify key in inventory # ansible_ssh_private_key_file: ~/.ssh/id_ed25519

# 4. Use ssh-agent eval $(ssh-agent) ssh-add ~/.ssh/id_ed25519

# 5. Enable password auth (temporary) ansible-playbook site.yml --ask-pass

"Host key verification failed"

# Quick fix (controlled environments)
export ANSIBLE_HOST_KEY_CHECKING=False

# Or in ansible.cfg [defaults] host_key_checking = False

# Proper fix: accept host keys ssh-keyscan -H 10.0.1.10 >> ~/.ssh/known_hosts

SSH Timeout

# ansible.cfg — increase timeouts
[defaults]
timeout = 30

[ssh_connection] ssh_args = -o ConnectTimeout=30 -o ServerAliveInterval=15 -o ServerAliveCountMax=3

Module Errors

"MODULE FAILURE: No module named 'xyz'"

# Verify collection is installed
ansible-galaxy collection list | grep community.general

# Install missing collection ansible-galaxy collection install community.general

# Check module exists ansible-doc community.general.xyz

"Unsupported parameters"

fatal: [web-01]: FAILED! => {"msg": "Unsupported parameters for (ansible.builtin.apt) module: autoremove."}

Fix: Check documentation for correct parameter names and your Ansible version:

# Show module documentation
ansible-doc ansible.builtin.apt

# Check Ansible version ansible --version

"Missing required arguments"

fatal: [web-01]: FAILED! => {"msg": "missing required arguments: name"}

Fix: Read the module docs and supply all required parameters:

ansible-doc ansible.builtin.apt | grep "required:"

"No such file or directory"

# ❌ Template not found
- name: Deploy config
  ansible.builtin.template:
    src: config.j2              # Looks in templates/ directory of role
    dest: /etc/app/config.yml

# ✅ Specify full path or check role structure # Role: roles/myapp/templates/config.j2 # Playbook: templates/config.j2 (relative to playbook)

See also: AAP 2.6 Troubleshooting Guide: Common Issues and Solutions

Variable Errors

"undefined variable"

fatal: [web-01]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'app_version' is undefined"}

Fixes:

# 1. Set default value
app_port: "{{ app_port | default(8080) }}"

# 2. Check variable precedence # Extra vars > task vars > block vars > role vars > play vars > host vars > group vars > defaults

# 3. Verify variable is defined - name: Check required variables ansible.builtin.assert: that: - app_version is defined - app_version | length > 0 fail_msg: "app_version must be defined"

# 4. Debug where variable comes from - ansible.builtin.debug: msg: "{{ lookup('vars', 'app_version', default='NOT DEFINED') }}"

"dict object has no attribute"

# ❌ Accessing nested dict that might not exist
msg: "{{ result.json.data.name }}"

# ✅ Safe access with default msg: "{{ result.json.data.name | default('unknown') }}"

# ✅ Or check first when: result.json.data is defined and result.json.data.name is defined

Variable Precedence Issues

# Show effective variables for a host
ansible-inventory --host web-01 --yaml

# Show all variable sources ansible-playbook site.yml -e "debug_vars=true" -vv

Permission Errors

"sudo: a password is required"

# Option 1: Ask for become password
# ansible-playbook site.yml --ask-become-pass

# Option 2: Set in inventory # ansible_become_password: "{{ vault_sudo_password }}"

# Option 3: Passwordless sudo (recommended) # On target host: echo "ansible ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/ansible

"Failed to set permissions on the temporary files"

fatal: [web-01]: FAILED! => {"msg": "Failed to set permissions on the temporary files Ansible needs to create when becoming an unprivileged user"}

Fix: Enable pipelining or set allow_world_readable_tmpfiles:

[defaults]
# Best fix: enable pipelining
pipelining = True

# Alternative (less secure) # allow_world_readable_tmpfiles = True

See also: Ansible Undefined Variable Error: 12 Real Examples and Fixes

Playbook Errors

"Syntax Error"

# Check syntax before running
ansible-playbook site.yml --syntax-check

# Lint for best practices ansible-lint site.yml

"ERROR! Unexpected Exception during execution"

Usually a Python issue on the control node:

# Check Python version
python3 --version

# Reinstall Ansible pip install --upgrade ansible

# Check for conflicting packages pip check

"Conditional check failed"

# ❌ Wrong: Jinja2 in when clause
- name: Install package
  ansible.builtin.apt:
    name: nginx
  when: "{{ install_nginx }}"    # WRONG — double templating

# ✅ Correct: bare variable in when - name: Install package ansible.builtin.apt: name: nginx when: install_nginx # Correct

# ✅ Correct: string comparison when: ansible_distribution == "Ubuntu"

# ✅ Correct: version comparison when: ansible_distribution_version is version('22.04', '>=')

Task Hanging / Never Completing

# Add timeout
- name: Long-running task
  ansible.builtin.command:
    cmd: /opt/app/rebuild
  async: 600          # Max 10 minutes
  poll: 30            # Check every 30 seconds

# Or use timeout at connection level [defaults] timeout = 60

Systematic Troubleshooting

Step 1: Isolate the Problem

# Test single host
ansible web-01 -m ping

# Test single task ansible web-01 -m ansible.builtin.apt -a "name=nginx state=present" -b

# Run single play ansible-playbook site.yml --limit web-01 --start-at-task "Deploy config"

# Run with tags ansible-playbook site.yml --tags deploy

Step 2: Check the Basics

# Verify inventory
ansible-inventory --graph
ansible-inventory --host problematic-host

# Verify connectivity ansible all -m ping

# Verify Python on remote ansible web-01 -m raw -a "python3 --version"

# Verify become works ansible web-01 -m command -a "whoami" -b

Step 3: Read the Full Error

Ansible error messages contain useful details. Look for: • msg: Human-readable error description • stderr: Command stderr output • stdout: Command stdout output • rc: Return code (non-zero = failure) • module_stderr: Module-specific errors

# Capture and display full error
- name: Run command
  ansible.builtin.command:
    cmd: /opt/app/start
  register: result
  ignore_errors: true

- name: Show error details ansible.builtin.debug: var: result when: result is failed

Step 4: Use ansible-config

# Show current configuration
ansible-config dump --only-changed

# Show where config values come from ansible-config dump | grep -i timeout

# Validate config file ansible-config view

Logging

# ansible.cfg
[defaults]
log_path = /var/log/ansible/ansible.log

# Or set environment variable # export ANSIBLE_LOG_PATH=/var/log/ansible/ansible.log

FAQ

How do I debug a specific task?

Use ansible-playbook site.yml --start-at-task "Task Name" -vvv --limit single-host. This runs from that specific task with full debugging on a single host.

Why does my playbook work manually but fail in cron?

Cron has a minimal environment. Common issues: (1) PATH doesn't include /usr/local/bin, (2) SSH agent isn't available, (3) ansible.cfg isn't found. Fix: use full paths, specify ANSIBLE_CONFIG, and use --private-key instead of ssh-agent.

How do I find which variable is overriding mine?

Use ansible-inventory --host hostname --yaml to see effective variables. Add -vvv to playbook runs to see where each variable is loaded from. Remember the variable precedence order.

How do I handle intermittent failures?

Use retries and delay:

- name: Wait for service
  ansible.builtin.uri:
    url: http://localhost:8080/health
  register: result
  retries: 10
  delay: 5
  until: result.status == 200

Conclusion

Ansible troubleshooting follows a consistent pattern: increase verbosity (-vvv), isolate the problem (single host, single task), check the basics (connectivity, permissions, Python), read the full error message, and fix systematically. Enable profile_tasks callback to identify slow tasks, use --check --diff for safe previewing, and ansible-lint to catch issues before they hit production.

Related Articles

Ansible Performance TuningAnsible Lint Complete GuideAnsible Documentation Complete GuideAnsible Handlers Complete GuideInstall Ansible Complete GuideFix 'Use loop or with_' Deprecated ErrorFailed Ansible Installation on Amazon Linux 2022Handle Non-Compliant Variable NamesHandle Variable Dependencies Without BreakingFixing Kubernetes PersistentVolume Error

Category: installation

Browse all Ansible tutorials · AnsiblePilot Home