Ansible on Talos Linux: Reboot-aware Patching Workflow Complete Guide
By Luca Berton · Published 2024-01-01 · Category: installation
Automate reboot-aware patching workflow on Talos Linux (Kubernetes-native, GA rolling) with Ansible.
Talos Linux is an immutable, API-managed Kubernetes OS: no SSH, no shell, no package manager, and no apt/dnf/rpm-ostree. You never patch a Talos node in place. Instead you upgrade the whole OS image with talosctl upgrade, which reboots the node into the new version. The "reboot-aware" part is making that rolling and non-disruptive: cordon and drain the node first, upgrade, wait for it to rejoin, then uncordon — one node at a time.
This guide automates that workflow with Ansible, driving talosctl and the kubernetes.core collection from the control node.
> This is not a Fedora CoreOS / rpm-ostree workflow. Talos has no rpm-ostree, no Zincati, and no SSH — everything below goes through the Talos API and the Kubernetes API.
How Talos upgrades work
• OS upgrade:talosctl upgrade --image replaces the running Talos image and reboots the node. Talos keeps the previous version, so talosctl rollback reverts to it.
• Kubernetes upgrade: separate from the OS, done with talosctl upgrade-k8s --to .
• Reboot-aware: Ansible cordons and drains the node before the upgrade and uncordons it once it is Ready, so workloads move off first.
See also: Ansible on Bottlerocket: Reboot-aware Patching Workflow Complete Guide
Prerequisites
No agent runs on the Talos nodes — everything happens from the control node:
• talosctl and the cluster talosconfig, plus kubectl and the kubeconfig (both produced during bootstrap).
• ansible-core 2.15+ with kubernetes.core 3.x+ and the kubernetes Python library.
• PodDisruptionBudgets on critical workloads so draining respects availability. For control plane nodes, route talosctl through a different, healthy control plane endpoint so the API stays up while one node reboots.
Inventory
# inventory/talos.ini
[talos]
localhost ansible_connection=local
[talos:vars]
kubeconfig=/path/to/talos/kubeconfig
talosconfig=/path/to/talos/talosconfig
talos_version=v1.8.3
See also: Ansible on Fedora CoreOS: Reboot-aware Patching Workflow Complete Guide
Rolling upgrade playbook
The play upgrades one node at a time. For each node it drains, runs talosctl upgrade (which reboots into the new image), waits for the node to become Ready again, then uncordons it.
---
- name: Reboot-aware Talos upgrade (rolling, one node at a time)
hosts: localhost
connection: local
gather_facts: false
vars:
kubeconfig: "{{ hostvars['localhost'].kubeconfig }}"
talosconfig: "{{ hostvars['localhost'].talosconfig }}"
installer_image: "ghcr.io/siderolabs/installer:{{ hostvars['localhost'].talos_version }}"
talos_nodes:
- { name: cp1, ip: 192.168.0.2 }
- { name: w1, ip: 192.168.0.10 }
- { name: w2, ip: 192.168.0.11 }
tasks:
- name: Upgrade each node in turn
ansible.builtin.include_tasks: upgrade-node.yml
loop: "{{ talos_nodes }}"
loop_control:
loop_var: node
upgrade-node.yml:
---
- name: Cordon and drain {{ node.name }}
kubernetes.core.k8s_drain:
kubeconfig: "{{ kubeconfig }}"
name: "{{ node.name }}"
state: drain
delete_options:
ignore_daemonsets: true
delete_emptydir_data: true
wait_timeout: 300
- name: Upgrade Talos on {{ node.name }} (reboots into the new image)
ansible.builtin.command:
cmd: >-
talosctl upgrade
--talosconfig {{ talosconfig }}
--nodes {{ node.ip }} --endpoints {{ node.ip }}
--image {{ installer_image }} --preserve
register: upgrade
changed_when: upgrade.rc == 0
- name: Wait for {{ node.name }} to rejoin and become Ready
kubernetes.core.k8s_info:
kubeconfig: "{{ kubeconfig }}"
kind: Node
name: "{{ node.name }}"
register: n
retries: 40
delay: 15
until:
- n.resources | length > 0
- n.resources[0].status.conditions
| selectattr('type', 'equalto', 'Ready')
| selectattr('status', 'equalto', 'True') | list | length > 0
- name: Uncordon {{ node.name }}
kubernetes.core.k8s_drain:
kubeconfig: "{{ kubeconfig }}"
name: "{{ node.name }}"
state: uncordon
> --preserve keeps the node's ephemeral data across the upgrade — important on control plane nodes so etcd data survives. With a single control plane node you also tolerate a short API outage while it reboots; in HA, point --endpoints at another control plane node.
Upgrading Kubernetes itself
The OS upgrade above does not change the Kubernetes version. Do that separately, once the OS is current:
talosctl --talosconfig talos/talosconfig -n 192.168.0.2 upgrade-k8s --to 1.31.0
See also: Ansible on Fedora Silverblue 45: Reboot-aware Patching Workflow Complete Guide
Validation
ansible-playbook -i inventory/talos.ini rolling-upgrade.yml
talosctl --talosconfig talos/talosconfig -n 192.168.0.2 version
kubectl --kubeconfig talos/kubeconfig get nodes -o wide
Each node should report the new Talos version and Ready. Re-running is safe: talosctl upgrade detects when a node already runs the target image and skips the reboot, and the drain/uncordon steps converge.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| Drain hangs or times out | A PodDisruptionBudget or an unmanaged Pod blocks eviction | Raise wait_timeout, fix the PDB, or set force: true in delete_options for Pods with no controller |
| Node never returns after upgrade | Bad image or failed boot | talosctl rollback --nodes reverts to the previous Talos version |
| etcd unhealthy after a control plane upgrade | Upgraded without --preserve, or two control plane nodes at once | Upgrade control plane nodes one at a time with --preserve; check talosctl etcd status |
| certificate signed by unknown authority | Wrong or missing talosconfig | Pass the cluster's --talosconfig generated at bootstrap |
| Kubernetes version unchanged after upgrade | The OS upgrade does not bump Kubernetes | Run talosctl upgrade-k8s --to separately |
FAQ
Q. Does Talos use apt, dnf, or rpm-ostree for patching?
None of them. Talos is a single immutable image with no package manager. You "patch" by upgrading the whole image with talosctl upgrade, which reboots the node into the new version.
Q. How do I roll back a bad upgrade?
talosctl rollback --nodes boots the node back into the previous Talos version, which Talos retains on the other partition.
Q. Does Ansible SSH into the Talos nodes to reboot them?
No — there is no SSH. The reboot happens as part of talosctl upgrade, driven over the Talos API; Ansible runs connection: local on the control node.
Q. How do I patch without downtime? Upgrade one node at a time (the loop above), draining first so workloads reschedule, and keep at least three control plane nodes so the API stays available during each reboot.
Q. Is the OS upgrade the same as a Kubernetes upgrade?
No. talosctl upgrade changes the Talos OS image; talosctl upgrade-k8s changes the Kubernetes component versions. Run them as separate, deliberate steps.
Related guides
• bootstrap a Talos Linux 1.8 cluster • install an ingress controller on Talos Linux • keeping Kubernetes clusters up to date • drain and cordon with the kubernetes.core collection • manage cluster resources with the k8s moduleConclusion
Patching Talos Linux is image-based, not package-based: talosctl upgrade swaps the OS image and reboots, and talosctl rollback undoes it. Ansible makes that safe and repeatable by draining each node first, upgrading one at a time with --preserve, waiting for Ready, and uncordoning — a true reboot-aware rolling upgrade driven entirely through the Talos and Kubernetes APIs, with no SSH and no in-place package edits.
Category: installation