AnsiblePilot — Master Ansible Automation

AnsiblePilot is the leading resource for learning Ansible automation, DevOps, and infrastructure as code. Browse over 1,400 tutorials covering Ansible modules, playbooks, roles, collections, and real-world examples. Whether you are a beginner or an experienced engineer, our step-by-step guides help you automate Linux, Windows, cloud, containers, and network infrastructure.

Popular Topics

About Luca Berton

Luca Berton is an Ansible automation expert, author of 8 Ansible books published by Apress and Leanpub including "Ansible for VMware by Examples" and "Ansible for Kubernetes by Example", and creator of the Ansible Pilot YouTube channel. He shares practical automation knowledge through tutorials, books, and video courses to help IT professionals and DevOps engineers master infrastructure automation.

Ansible retries & until: Retry Failed Tasks Automatically (Guide)

By Luca Berton · Published 2024-01-01 · Category: installation

Complete guide to Ansible retries and until loops. Retry failed tasks automatically, wait for services, implement polling patterns, and handle transient.

Ansible's retries and until directives let you automatically retry tasks that fail due to transient errors — network timeouts, services still starting, APIs returning temporary errors. This is essential for reliable automation in real-world environments.

Basic Syntax

- name: Wait for service to be ready
  ansible.builtin.uri:
    url: http://localhost:8080/health
    status_code: 200
  register: result
  until: result.status == 200
  retries: 10
  delay: 5
until — Condition that must be true for the task to succeed • retries — Maximum number of attempts (default: 3) • delay — Seconds between retries (default: 5)

The task runs up to retries times, waiting delay seconds between each attempt, until the until condition is true.

See also: Ansible block, rescue, always: Error Handling Complete Guide (2026)

Wait for a Service to Start

- name: Start application
  ansible.builtin.service:
    name: myapp
    state: started

- name: Wait for application to respond ansible.builtin.uri: url: http://localhost:{{ app_port }}/health status_code: 200 register: health_check until: health_check.status == 200 retries: 30 delay: 10 # Total wait: up to 5 minutes (30 * 10s)

Wait for a Port to Open

- name: Wait for PostgreSQL to accept connections
  ansible.builtin.wait_for:
    host: "{{ db_host }}"
    port: 5432
    state: started
    timeout: 300
  register: port_check
  until: port_check is success
  retries: 5
  delay: 10

See also: Ansible changed_when & failed_when: Control Task Status (Guide)

Retry API Calls

- name: Create resource via API (retry on 503)
  ansible.builtin.uri:
    url: "https://api.example.com/resources"
    method: POST
    body:
      name: "{{ resource_name }}"
    body_format: json
    headers:
      Authorization: "Bearer {{ api_token }}"
    status_code: [200, 201]
  register: api_result
  until: api_result.status in [200, 201]
  retries: 5
  delay: 15

Retry Commands Based on Output

- name: Wait for cluster to be healthy
  ansible.builtin.command: kubectl get nodes
  register: kubectl_result
  changed_when: false
  until: "'NotReady' not in kubectl_result.stdout"
  retries: 20
  delay: 15

- name: Wait for database migration to complete ansible.builtin.command: /opt/myapp/check_migration_status.sh register: migration changed_when: false until: "'COMPLETE' in migration.stdout" retries: 30 delay: 10

See also: Ansible Playbook Structure: Anatomy, Best Practices & Examples (2026)

Retry Package Installation

- name: Install package (retry on network errors)
  ansible.builtin.package:
    name: nginx
    state: present
  register: pkg_result
  until: pkg_result is success
  retries: 3
  delay: 30

Retry SSH Connections

- name: Wait for host to come back after reboot
  ansible.builtin.wait_for_connection:
    timeout: 300
    delay: 10
  register: connection
  until: connection is success
  retries: 3
  delay: 60

Complex Until Conditions

Multiple Conditions (AND)

- name: Wait for app to be fully ready
  ansible.builtin.uri:
    url: http://localhost:8080/status
    return_content: true
  register: status
  until:
    - status.status == 200
    - "'ready' in status.content"
    - "'error' not in status.content"
  retries: 20
  delay: 5

OR Conditions

- name: Wait for one of multiple success states
  ansible.builtin.command: check_status.sh
  register: result
  changed_when: false
  until: "'RUNNING' in result.stdout or 'COMPLETED' in result.stdout"
  retries: 15
  delay: 10

Retry with Registered Variable Checks

- name: Check replication status
  community.postgresql.postgresql_query:
    db: myapp
    query: "SELECT state FROM pg_stat_replication WHERE client_addr = '{{ replica_ip }}'"
  register: repl_status
  until:
    - repl_status.rowcount > 0
    - repl_status.query_result[0].state == 'streaming'
  retries: 12
  delay: 10

Default Values

When you omit directives:

# These are the defaults:
retries: 3    # Try up to 3 times
delay: 5      # Wait 5 seconds between retries

If you specify until without retries, it defaults to 3 attempts. Without delay, it defaults to 5 seconds.

Retry Information in Output

Ansible shows retry progress in the output:

TASK [Wait for service] *************************
FAILED - RETRYING: Wait for service (10 retries left).
FAILED - RETRYING: Wait for service (9 retries left).
ok: [webserver]

Access Retry Information

- name: Task with retries
  ansible.builtin.uri:
    url: http://localhost:8080/health
  register: result
  until: result.status == 200
  retries: 10
  delay: 5

- name: Show retry info ansible.builtin.debug: msg: "Took {{ result.attempts }} attempts"

The result.attempts variable contains the number of attempts made.

Real-World Patterns

Wait After Reboot

- name: Reboot the server
  ansible.builtin.reboot:
    reboot_timeout: 600
    msg: "Rebooting for kernel update"

# Alternative manual approach: - name: Reboot ansible.builtin.command: shutdown -r now async: 1 poll: 0

- name: Wait for server to come back ansible.builtin.wait_for_connection: delay: 30 timeout: 300

- name: Verify services after reboot ansible.builtin.service_facts: register: services until: "'nginx.service' in services.ansible_facts.services" retries: 10 delay: 10

Wait for Cloud Instance

- name: Create EC2 instance
  amazon.aws.ec2_instance:
    name: webserver
    instance_type: t3.micro
    image_id: "{{ ami_id }}"
    wait: true
  register: ec2

- name: Wait for SSH on new instance ansible.builtin.wait_for: host: "{{ ec2.instances[0].public_ip_address }}" port: 22 delay: 10 timeout: 300

- name: Wait for cloud-init to finish ansible.builtin.command: cloud-init status --wait delegate_to: "{{ ec2.instances[0].public_ip_address }}" register: cloud_init changed_when: false until: cloud_init.rc == 0 retries: 30 delay: 10

Poll for Job Completion

- name: Start backup job
  ansible.builtin.uri:
    url: "https://backup-api.example.com/jobs"
    method: POST
    body: '{"type": "full"}'
    body_format: json
  register: job

- name: Wait for backup to complete ansible.builtin.uri: url: "https://backup-api.example.com/jobs/{{ job.json.id }}" method: GET register: job_status until: job_status.json.state in ['completed', 'failed'] retries: 60 delay: 30 failed_when: job_status.json.state == 'failed'

retries vs async/poll

| Feature | retries/until | async/poll | |---------|----------------|-------------| | Purpose | Retry on failure | Background execution | | Blocks play | Yes (synchronous) | No (with poll: 0) | | Condition | Custom until expression | Completion only | | Delay type | Fixed between retries | Fixed polling interval | | Use when | Transient failures, readiness | Long-running tasks |

FAQ

How do I retry a failed task in Ansible?

Add retries, delay, and until to any task. Register the result and set an until condition: the task retries up to retries times, waiting delay seconds between attempts, until the condition is true.

What are the default values for retries and delay?

The default is 3 retries with 5 seconds delay between each. If you specify until without retries, Ansible uses these defaults.

How do I wait for a service to be ready?

Use ansible.builtin.uri with until: result.status == 200 and set retries and delay based on how long the service typically takes to start. For port-level checks, use ansible.builtin.wait_for.

Can I use retries without until?

You can, but it's not recommended. Without until, the task retries only on hard failures (exceptions). With until, you define explicit success criteria, which is more reliable.

How do I know how many retries a task needed?

The registered variable includes an attempts field: result.attempts contains the number of attempts made (1 = succeeded first try).

Conclusion

Ansible's retry mechanism is essential for reliable automation: • until: condition — Define what "success" looks like • retries: N — Maximum attempts (default: 3) • delay: N — Seconds between retries (default: 5) • Always register the result to use in until conditions • Combine with changed_when: false for read-only checks

Use retries whenever you interact with services that might not be immediately available — APIs, databases, cloud resources, or freshly started services.

Related Articles

Ansible changed_when & failed_when: Control Task StatusAnsible block, rescue, always: Error Handling GuideAnsible Ignore Errors Complete GuideAnsible async: Run Long Tasks in Background

See also

How to Retry a Failed Task in Ansible (retries, delay, until)

Category: installation

Browse all Ansible tutorials · AnsiblePilot Home