How to Do Rolling Updates with Zero Downtime in Ansible
By Luca Berton · Published 2024-01-01 · Category: troubleshooting
How to perform rolling updates in Ansible with serial, load balancer integration, health checks, and rollback. Zero-downtime deployment patterns with examples.
How to Do Rolling Updates with Zero Downtime in Ansible
Rolling updates deploy to a subset of servers at a time, keeping your service available throughout. Ansible's serial directive makes this straightforward.
See also: Ansible run_once vs delegate_to vs serial: Control Task Execution Scope
Basic Rolling Update
- hosts: webservers
serial: 1 # One host at a time
become: true
tasks:
- name: Deploy new version
ansible.builtin.copy:
src: "myapp-{{ version }}.tar.gz"
dest: /opt/myapp/
- name: Restart application
ansible.builtin.systemd:
name: myapp
state: restarted
- name: Wait for health check
ansible.builtin.uri:
url: "http://{{ inventory_hostname }}:8080/health"
status_code: 200
retries: 30
delay: 2
register: health
until: health.status == 200
Serial Options
# Fixed number
serial: 2 # 2 hosts at a time
# Percentage
serial: "25%" # 25% of hosts at a time
# Ramping (canary pattern)
serial:
- 1 # First: 1 host (canary)
- 5 # Then: 5 at a time
- "25%" # Then: 25% at a time
See also: How to Check if a File Exists in Ansible (4 Methods)
With Load Balancer Integration
- hosts: webservers
serial: 1
become: true
pre_tasks:
- name: Remove from load balancer
ansible.builtin.uri:
url: "https://lb.example.com/api/nodes/{{ inventory_hostname }}/disable"
method: POST
headers:
Authorization: "Bearer {{ vault_lb_token }}"
delegate_to: localhost
- name: Wait for connections to drain
ansible.builtin.pause:
seconds: 15
tasks:
- name: Deploy application
ansible.builtin.unarchive:
src: "releases/myapp-{{ version }}.tar.gz"
dest: /opt/myapp/
- name: Restart service
ansible.builtin.systemd:
name: myapp
state: restarted
- name: Health check
ansible.builtin.uri:
url: "http://localhost:8080/health"
status_code: 200
retries: 30
delay: 3
register: health
until: health.status == 200
post_tasks:
- name: Re-enable in load balancer
ansible.builtin.uri:
url: "https://lb.example.com/api/nodes/{{ inventory_hostname }}/enable"
method: POST
headers:
Authorization: "Bearer {{ vault_lb_token }}"
delegate_to: localhost
Fail Fast: max_fail_percentage
- hosts: webservers
serial: 5
max_fail_percentage: 20 # Stop if >20% of hosts fail
tasks:
- name: Deploy and verify
# ...
See also: How to Install Multiple Packages in Ansible (One Task)
Canary Deployment Pattern
# Step 1: Deploy to canary
- hosts: webservers[0]
tasks:
- name: Deploy canary
ansible.builtin.include_tasks: deploy.yml
- name: Run smoke tests
ansible.builtin.uri:
url: "http://{{ inventory_hostname }}:8080/api/smoke-test"
status_code: 200
retries: 10
delay: 5
- name: Wait for manual approval
ansible.builtin.pause:
prompt: "Canary looks good? Press Enter to continue or Ctrl+C to abort"
# Step 2: Deploy to rest
- hosts: webservers:!webservers[0]
serial: "25%"
tasks:
- ansible.builtin.include_tasks: deploy.yml
FAQ
How does serial work in Ansible?
serial limits how many hosts are processed at a time. With serial: 2 and 10 hosts, Ansible deploys to 2 hosts, waits for completion, then moves to the next 2. This prevents all hosts from being down simultaneously.
What is max_fail_percentage?
max_fail_percentage stops the play if too many hosts fail. Set to 20 to abort if more than 20% of hosts in a batch fail, preventing a bad deployment from rolling out to all servers.
How do I do a canary deployment with Ansible?
Use a ramping serial: [1, 5, "25%"] or split into two plays — first targeting one host (the canary), then the rest after validation. Add a pause for manual approval between stages.
Related Articles
• Ansible run_once: Execute Tasks Once • Ansible Playbook Guide • Ansible Handlers: Run Tasks on ChangeComplete Rolling Update Playbook
---
- name: Rolling update with zero downtime
hosts: webservers
serial: 1
max_fail_percentage: 0
become: true
pre_tasks:
- name: Disable host in load balancer
ansible.builtin.uri:
url: "http://{{ lb_host }}/api/backends/{{ inventory_hostname }}/disable"
method: POST
delegate_to: localhost
- name: Wait for connections to drain
ansible.builtin.wait_for:
timeout: 30
tasks:
- name: Update application
ansible.builtin.dnf:
name: myapp
state: latest
notify: restart myapp
- name: Run database migrations
ansible.builtin.command: /opt/myapp/bin/migrate
run_once: true
changed_when: true
post_tasks:
- name: Verify application health
ansible.builtin.uri:
url: "http://localhost:8080/health"
status_code: 200
register: health
until: health.status == 200
retries: 10
delay: 5
- name: Re-enable in load balancer
ansible.builtin.uri:
url: "http://{{ lb_host }}/api/backends/{{ inventory_hostname }}/enable"
method: POST
delegate_to: localhost
handlers:
- name: restart myapp
ansible.builtin.service:
name: myapp
state: restarted
Serial Strategies
# Update one host at a time
serial: 1
# Update 2 hosts at a time
serial: 2
# Update 25% at a time
serial: "25%"
# Canary deployment: 1 first, then 25%, then all
serial:
- 1
- "25%"
- "100%"
FAQ
What does serial do in Ansible?
serial controls how many hosts are updated simultaneously. Without it, Ansible updates all hosts in parallel. With serial: 1, hosts are updated one at a time.
How do I handle a failed update mid-roll?
Set max_fail_percentage: 0 to stop the entire rollout if any host fails. Use block/rescue to implement automatic rollback on failure.
Can I do blue-green deployments with Ansible?
Yes. Maintain two groups (blue/green), deploy to the inactive group, run health checks, then switch the load balancer to point to the new group.
Category: troubleshooting