Ansible on Talos Linux 1.8: Cluster Bootstrap Complete Guide

By Luca Berton · Published 2024-01-01 · Category: installation

Automate cluster bootstrap on Talos Linux 1.8 (Talos 1.8, GA 2024-10) with Ansible. Bring up a fresh control plane and join workers idempotently.

Talos Linux 1.8 (released 2024) is a minimal, immutable, API-managed operating system built for one job: running Kubernetes. There is no SSH, no shell, and no package manager — you never log into a Talos node. The entire machine state is described by a declarative machine config (YAML) and applied over the Talos API with the talosctl client. That declarative model is exactly what makes Talos a natural fit for Ansible.

This guide automates a real cluster bootstrap on Talos Linux 1.8 end-to-end with Ansible: template the machine configs, apply them, run the one-time talosctl bootstrap, fetch the kubeconfig, and wait for the nodes to become Ready — all idempotently.

How Ansible fits Talos Linux

Because Talos has no SSH, Ansible does not connect to the nodes the usual way, and there is no official Talos Ansible collection. The pattern that works is:

Run the play on the control node with connection: local.
Use Ansible to template controlplane.yaml / worker.yaml from inventory (one version-controlled source of truth).
Wrap talosctl with ansible.builtin.command, made idempotent with creates.
Once Kubernetes is up, hand off to the kubernetes.core collection for in-cluster manifests (CNI, StorageClass, workloads).

Ansible owns the config and orchestration; talosctl owns the machine API; kubernetes.core owns in-cluster state.

Prerequisites

On the control node:

talosctl matching the Talos version you are installing — the talosctl that generates the config determines the installed Talos version. Install with brew install siderolabs/tap/talosctl.
kubectl and ansible-core 2.15+ with kubernetes.core 3.x+ (ansible-galaxy collection install kubernetes.core) plus the kubernetes Python library for the post-bootstrap tasks.
One machine booted off the Talos ISO (from the Image Factory) to be the control plane, and one or more for workers. Booted off the ISO, Talos runs in RAM in maintenance mode and writes nothing to disk until you apply a config.

Network: the workstation needs direct access to each node on TCP 50000 (the Talos API) for the first apply, and the Kubernetes API runs on 6443.

Bootstrap flow at a glance

talosctl gen config     # generate controlplane.yaml, worker.yaml, talosconfig
talosctl apply-config   # push config to each node (--insecure on first apply)
talosctl bootstrap      # ONCE, on a single control plane node -> forms etcd
talosctl kubeconfig     # download the cluster kubeconfig
kubectl get nodes       # verify

Inventory

Keep the cluster topology in inventory so the configs and commands stay data-driven:

# inventory/talos.ini
[talos_controlplane]
cp1 talos_ip=192.168.0.2

[talos_workers]
w1 talos_ip=192.168.0.10
w2 talos_ip=192.168.0.11

[talos:children]
talos_controlplane
talos_workers

[talos:vars]
ansible_connection=local
cluster_name=mycluster
cluster_endpoint=https://192.168.0.2:6443

Cluster bootstrap playbook

The play generates the machine configs once, applies the right config to each node, bootstraps a single control plane node, and retrieves the kubeconfig. The one-time steps are guarded with creates so re-runs are safe.

---
- name: Bootstrap a Talos Linux 1.8 cluster
  hosts: localhost
  connection: local
  gather_facts: false
  vars:
    work_dir: "{{ playbook_dir }}/talos"
    cp_ip: "{{ hostvars['cp1'].talos_ip }}"
  tasks:
    - name: Ensure working directory exists
      ansible.builtin.file:
        path: "{{ work_dir }}"
        state: directory
        mode: "0700"

    - name: Generate machine configs and talosconfig (once)
      ansible.builtin.command:
        cmd: >-
          talosctl gen config {{ cluster_name }} {{ cluster_endpoint }}
          --output-dir {{ work_dir }}
        creates: "{{ work_dir }}/controlplane.yaml"

    - name: Apply control plane config (insecure, first boot)
      ansible.builtin.command:
        cmd: >-
          talosctl apply-config --insecure
          --nodes {{ hostvars[item].talos_ip }}
          --file {{ work_dir }}/controlplane.yaml
      loop: "{{ groups['talos_controlplane'] }}"

    - name: Apply worker config (insecure, first boot)
      ansible.builtin.command:
        cmd: >-
          talosctl apply-config --insecure
          --nodes {{ hostvars[item].talos_ip }}
          --file {{ work_dir }}/worker.yaml
      loop: "{{ groups['talos_workers'] }}"

    - name: Bootstrap etcd on one control plane node (only once)
      ansible.builtin.command:
        cmd: >-
          talosctl bootstrap
          --talosconfig {{ work_dir }}/talosconfig
          --nodes {{ cp_ip }} --endpoints {{ cp_ip }}
        creates: "{{ work_dir }}/.bootstrapped"
      register: bootstrap

    - name: Mark cluster as bootstrapped
      ansible.builtin.copy:
        dest: "{{ work_dir }}/.bootstrapped"
        content: "bootstrapped {{ cp_ip }}\n"
        mode: "0600"
      when: bootstrap is changed

    - name: Retrieve the kubeconfig
      ansible.builtin.command:
        cmd: >-
          talosctl kubeconfig {{ work_dir }}/kubeconfig
          --talosconfig {{ work_dir }}/talosconfig
          --nodes {{ cp_ip }} --endpoints {{ cp_ip }}
        creates: "{{ work_dir }}/kubeconfig"

    - name: Wait for all nodes to become Ready
      kubernetes.core.k8s_info:
        kubeconfig: "{{ work_dir }}/kubeconfig"
        kind: Node
      register: nodes
      retries: 30
      delay: 10
      until:
        - nodes.resources | length > 0
        - nodes.resources | map(attribute='status.conditions') | flatten
          | selectattr('type', 'equalto', 'Ready')
          | selectattr('status', 'equalto', 'True')
          | list | length == nodes.resources | length

> The first apply-config uses --insecure because the node's PKI is not yet set up; later management uses the generated talosconfig. The bootstrap step must run exactly once on a single control plane node — the .bootstrapped marker plus creates enforces that on re-runs.

Validation

ansible-playbook -i inventory/talos.ini bootstrap-talos.yml

# then, against the fetched artifacts:
talosctl --talosconfig talos/talosconfig -n 192.168.0.2 health
kubectl --kubeconfig talos/kubeconfig get nodes -o wide

Run the playbook a second time to confirm idempotency: gen config, bootstrap, and kubeconfig are skipped by their creates guards, so only the declarative apply-config tasks re-execute.

Troubleshooting

Symptom	Likely cause	Fix
`specified install disk does not exist: "/dev/sda"`	Node's disk is `vda`/`nvme0n1`, not `sda`	Run `talosctl disks --insecure -n` , set `install.disk` in the machine config, re-apply
`connection refused` / timeout on apply-config	Talos API port 50000 not reachable	Open TCP 50000 to the node; the first apply must hit the node directly (no endpoint proxy yet)
`certificate signed by unknown authority`	Applying without `--insecure` before PKI exists, or wrong `talosconfig`	Use `--insecure` for the first apply; afterwards pass the generated `--talosconfig`
etcd never forms / API never comes up	`bootstrap` not run, or run on more than one node	Bootstrap exactly once, on a single control plane node
Worker stuck `NotReady`	No CNI installed yet	Apply your CNI (e.g. Cilium) with `kubernetes.core.k8s` after bootstrap

FAQ

Q. How does Ansible connect to Talos if there's no SSH? It doesn't connect to the nodes at all. The play runs locally on the control node (connection: local) and drives the cluster through talosctl, which speaks the Talos API on port 50000.

Q. Can I use apt/dnf or a shell task to change a Talos node? No. Talos is immutable and ships no package manager or shell. Everything is set through the machine config; to change a node you edit its config and run talosctl apply-config, and you upgrade with talosctl upgrade.

Q. Which talosctl version should I use? Match it to the Talos version you want to run — the talosctl that generates the machine config determines the installed Talos version. For Talos 1.8, use a 1.8.x talosctl.

Q. Is the bootstrap idempotent? The apply-config step is declarative and safe to re-apply. bootstrap is a one-time operation, so the playbook guards it with a marker file and creates; a second run skips it.

Q. How do I add more workers later? Boot the new machine off the Talos ISO, add it to the [talos_workers] group, and re-run the play — only the new apply-config task changes, and the node joins the existing cluster automatically.

Conclusion

Talos Linux 1.8 turns a cluster into pure declarative config, and Ansible is the ideal driver for it: template the machine configs from inventory, wrap talosctl for the gen config → apply-config → bootstrap → kubeconfig flow, and finish with kubernetes.core for in-cluster resources. Guard the one-time steps with creates, keep controlplane.yaml and worker.yaml in Git, and you get a repeatable bootstrap that scales from a single control plane node to a full HA cluster by editing inventory alone.

Category: installation

Browse all Ansible tutorials · AnsiblePilot Home

AnsiblePilot — Master Ansible Automation

Popular Topics

About Luca Berton