Ansible on Talos Linux 1.8: Cluster Bootstrap Complete Guide
By Luca Berton · Published 2024-01-01 · Category: installation
Automate cluster bootstrap on Talos Linux 1.8 (Talos 1.8, GA 2024-10) with Ansible. Bring up a fresh control plane and join workers idempotently.
Talos Linux 1.8 (released 2024) is a minimal, immutable, API-managed operating system built for one job: running Kubernetes. There is no SSH, no shell, and no package manager — you never log into a Talos node. The entire machine state is described by a declarative machine config (YAML) and applied over the Talos API with the talosctl client. That declarative model is exactly what makes Talos a natural fit for Ansible.
This guide automates a real cluster bootstrap on Talos Linux 1.8 end-to-end with Ansible: template the machine configs, apply them, run the one-time talosctl bootstrap, fetch the kubeconfig, and wait for the nodes to become Ready — all idempotently.
How Ansible fits Talos Linux
Because Talos has no SSH, Ansible does not connect to the nodes the usual way, and there is no official Talos Ansible collection. The pattern that works is:
• Run the play on the control node with connection: local.
• Use Ansible to template controlplane.yaml / worker.yaml from inventory (one version-controlled source of truth).
• Wrap talosctl with ansible.builtin.command, made idempotent with creates.
• Once Kubernetes is up, hand off to the kubernetes.core collection for in-cluster manifests (CNI, StorageClass, workloads).
Ansible owns the config and orchestration; talosctl owns the machine API; kubernetes.core owns in-cluster state.
See also: Ansible on Talos Linux 1.8: Ingress Controller Installation Complete Guide
Prerequisites
On the control node:
• talosctl matching the Talos version you are installing — the talosctl that generates the config determines the installed Talos version. Install with brew install siderolabs/tap/talosctl.
• kubectl and ansible-core 2.15+ with kubernetes.core 3.x+ (ansible-galaxy collection install kubernetes.core) plus the kubernetes Python library for the post-bootstrap tasks.
• One machine booted off the Talos ISO (from the Image Factory) to be the control plane, and one or more for workers. Booted off the ISO, Talos runs in RAM in maintenance mode and writes nothing to disk until you apply a config.
Network: the workstation needs direct access to each node on TCP 50000 (the Talos API) for the first apply, and the Kubernetes API runs on 6443.
Bootstrap flow at a glance
talosctl gen config # generate controlplane.yaml, worker.yaml, talosconfig
talosctl apply-config # push config to each node (--insecure on first apply)
talosctl bootstrap # ONCE, on a single control plane node -> forms etcd
talosctl kubeconfig # download the cluster kubeconfig
kubectl get nodes # verify
See also: Ansible on Talos Linux 1.8: StorageClass and PVC Provisioning Complete Guide
Inventory
Keep the cluster topology in inventory so the configs and commands stay data-driven:
# inventory/talos.ini
[talos_controlplane]
cp1 talos_ip=192.168.0.2
[talos_workers]
w1 talos_ip=192.168.0.10
w2 talos_ip=192.168.0.11
[talos:children]
talos_controlplane
talos_workers
[talos:vars]
ansible_connection=local
cluster_name=mycluster
cluster_endpoint=https://192.168.0.2:6443
Cluster bootstrap playbook
The play generates the machine configs once, applies the right config to each node, bootstraps a single control plane node, and retrieves the kubeconfig. The one-time steps are guarded with creates so re-runs are safe.
---
- name: Bootstrap a Talos Linux 1.8 cluster
hosts: localhost
connection: local
gather_facts: false
vars:
work_dir: "{{ playbook_dir }}/talos"
cp_ip: "{{ hostvars['cp1'].talos_ip }}"
tasks:
- name: Ensure working directory exists
ansible.builtin.file:
path: "{{ work_dir }}"
state: directory
mode: "0700"
- name: Generate machine configs and talosconfig (once)
ansible.builtin.command:
cmd: >-
talosctl gen config {{ cluster_name }} {{ cluster_endpoint }}
--output-dir {{ work_dir }}
creates: "{{ work_dir }}/controlplane.yaml"
- name: Apply control plane config (insecure, first boot)
ansible.builtin.command:
cmd: >-
talosctl apply-config --insecure
--nodes {{ hostvars[item].talos_ip }}
--file {{ work_dir }}/controlplane.yaml
loop: "{{ groups['talos_controlplane'] }}"
- name: Apply worker config (insecure, first boot)
ansible.builtin.command:
cmd: >-
talosctl apply-config --insecure
--nodes {{ hostvars[item].talos_ip }}
--file {{ work_dir }}/worker.yaml
loop: "{{ groups['talos_workers'] }}"
- name: Bootstrap etcd on one control plane node (only once)
ansible.builtin.command:
cmd: >-
talosctl bootstrap
--talosconfig {{ work_dir }}/talosconfig
--nodes {{ cp_ip }} --endpoints {{ cp_ip }}
creates: "{{ work_dir }}/.bootstrapped"
register: bootstrap
- name: Mark cluster as bootstrapped
ansible.builtin.copy:
dest: "{{ work_dir }}/.bootstrapped"
content: "bootstrapped {{ cp_ip }}\n"
mode: "0600"
when: bootstrap is changed
- name: Retrieve the kubeconfig
ansible.builtin.command:
cmd: >-
talosctl kubeconfig {{ work_dir }}/kubeconfig
--talosconfig {{ work_dir }}/talosconfig
--nodes {{ cp_ip }} --endpoints {{ cp_ip }}
creates: "{{ work_dir }}/kubeconfig"
- name: Wait for all nodes to become Ready
kubernetes.core.k8s_info:
kubeconfig: "{{ work_dir }}/kubeconfig"
kind: Node
register: nodes
retries: 30
delay: 10
until:
- nodes.resources | length > 0
- nodes.resources | map(attribute='status.conditions') | flatten
| selectattr('type', 'equalto', 'Ready')
| selectattr('status', 'equalto', 'True')
| list | length == nodes.resources | length
> The first apply-config uses --insecure because the node's PKI is not yet set up; later management uses the generated talosconfig. The bootstrap step must run exactly once on a single control plane node — the .bootstrapped marker plus creates enforces that on re-runs.
See also: Ansible on Microk8s: Cluster Bootstrap Complete Guide
Validation
ansible-playbook -i inventory/talos.ini bootstrap-talos.yml
# then, against the fetched artifacts:
talosctl --talosconfig talos/talosconfig -n 192.168.0.2 health
kubectl --kubeconfig talos/kubeconfig get nodes -o wide
Run the playbook a second time to confirm idempotency: gen config, bootstrap, and kubeconfig are skipped by their creates guards, so only the declarative apply-config tasks re-execute.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| specified install disk does not exist: "/dev/sda" | Node's disk is vda/nvme0n1, not sda | Run talosctl disks --insecure -n , set install.disk in the machine config, re-apply |
| connection refused / timeout on apply-config | Talos API port 50000 not reachable | Open TCP 50000 to the node; the first apply must hit the node directly (no endpoint proxy yet) |
| certificate signed by unknown authority | Applying without --insecure before PKI exists, or wrong talosconfig | Use --insecure for the first apply; afterwards pass the generated --talosconfig |
| etcd never forms / API never comes up | bootstrap not run, or run on more than one node | Bootstrap exactly once, on a single control plane node |
| Worker stuck NotReady | No CNI installed yet | Apply your CNI (e.g. Cilium) with kubernetes.core.k8s after bootstrap |
FAQ
Q. How does Ansible connect to Talos if there's no SSH?
It doesn't connect to the nodes at all. The play runs locally on the control node (connection: local) and drives the cluster through talosctl, which speaks the Talos API on port 50000.
Q. Can I use apt/dnf or a shell task to change a Talos node?
No. Talos is immutable and ships no package manager or shell. Everything is set through the machine config; to change a node you edit its config and run talosctl apply-config, and you upgrade with talosctl upgrade.
Q. Which talosctl version should I use?
Match it to the Talos version you want to run — the talosctl that generates the machine config determines the installed Talos version. For Talos 1.8, use a 1.8.x talosctl.
Q. Is the bootstrap idempotent?
The apply-config step is declarative and safe to re-apply. bootstrap is a one-time operation, so the playbook guards it with a marker file and creates; a second run skips it.
Q. How do I add more workers later?
Boot the new machine off the Talos ISO, add it to the [talos_workers] group, and re-run the play — only the new apply-config task changes, and the node joins the existing cluster automatically.
Related guides
• install an ingress controller on Talos Linux 1.8 • provision StorageClass and PVCs on Talos Linux 1.8 • reboot-aware patching workflow for Talos Linux • manage cluster resources with the kubernetes.core.k8s module • bootstrap RKE2 with AnsibleConclusion
Talos Linux 1.8 turns a cluster into pure declarative config, and Ansible is the ideal driver for it: template the machine configs from inventory, wrap talosctl for the gen config → apply-config → bootstrap → kubeconfig flow, and finish with kubernetes.core for in-cluster resources. Guard the one-time steps with creates, keep controlplane.yaml and worker.yaml in Git, and you get a repeatable bootstrap that scales from a single control plane node to a full HA cluster by editing inventory alone.
Category: installation