AnsiblePilot — Master Ansible Automation

AnsiblePilot is the leading resource for learning Ansible automation, DevOps, and infrastructure as code. Browse over 1,400 tutorials covering Ansible modules, playbooks, roles, collections, and real-world examples. Whether you are a beginner or an experienced engineer, our step-by-step guides help you automate Linux, Windows, cloud, containers, and network infrastructure.

Popular Topics

About Luca Berton

Luca Berton is an Ansible automation expert, author of 8 Ansible books published by Apress and Leanpub including "Ansible for VMware by Examples" and "Ansible for Kubernetes by Example", and creator of the Ansible Pilot YouTube channel. He shares practical automation knowledge through tutorials, books, and video courses to help IT professionals and DevOps engineers master infrastructure automation.

Ansible on Talos Linux 1.8: Cluster Bootstrap Complete Guide

By Luca Berton · Published 2024-01-01 · Category: installation

Automate cluster bootstrap on Talos Linux 1.8 (Talos 1.8, GA 2024-10) with Ansible. Bring up a fresh control plane and join workers idempotently.

Talos Linux 1.8 (released 2024) is a minimal, immutable, API-managed operating system built for one job: running Kubernetes. There is no SSH, no shell, and no package manager — you never log into a Talos node. The entire machine state is described by a declarative machine config (YAML) and applied over the Talos API with the talosctl client. That declarative model is exactly what makes Talos a natural fit for Ansible.

This guide automates a real cluster bootstrap on Talos Linux 1.8 end-to-end with Ansible: template the machine configs, apply them, run the one-time talosctl bootstrap, fetch the kubeconfig, and wait for the nodes to become Ready — all idempotently.

How Ansible fits Talos Linux

Because Talos has no SSH, Ansible does not connect to the nodes the usual way, and there is no official Talos Ansible collection. The pattern that works is: • Run the play on the control node with connection: local. • Use Ansible to template controlplane.yaml / worker.yaml from inventory (one version-controlled source of truth). • Wrap talosctl with ansible.builtin.command, made idempotent with creates. • Once Kubernetes is up, hand off to the kubernetes.core collection for in-cluster manifests (CNI, StorageClass, workloads).

Ansible owns the config and orchestration; talosctl owns the machine API; kubernetes.core owns in-cluster state.

See also: Ansible on Talos Linux 1.8: Ingress Controller Installation Complete Guide

Prerequisites

On the control node: • talosctl matching the Talos version you are installing — the talosctl that generates the config determines the installed Talos version. Install with brew install siderolabs/tap/talosctl. • kubectl and ansible-core 2.15+ with kubernetes.core 3.x+ (ansible-galaxy collection install kubernetes.core) plus the kubernetes Python library for the post-bootstrap tasks. • One machine booted off the Talos ISO (from the Image Factory) to be the control plane, and one or more for workers. Booted off the ISO, Talos runs in RAM in maintenance mode and writes nothing to disk until you apply a config.

Network: the workstation needs direct access to each node on TCP 50000 (the Talos API) for the first apply, and the Kubernetes API runs on 6443.

Bootstrap flow at a glance

talosctl gen config     # generate controlplane.yaml, worker.yaml, talosconfig
talosctl apply-config   # push config to each node (--insecure on first apply)
talosctl bootstrap      # ONCE, on a single control plane node -> forms etcd
talosctl kubeconfig     # download the cluster kubeconfig
kubectl get nodes       # verify

See also: Ansible on Talos Linux 1.8: StorageClass and PVC Provisioning Complete Guide

Inventory

Keep the cluster topology in inventory so the configs and commands stay data-driven:

# inventory/talos.ini
[talos_controlplane]
cp1 talos_ip=192.168.0.2

[talos_workers] w1 talos_ip=192.168.0.10 w2 talos_ip=192.168.0.11

[talos:children] talos_controlplane talos_workers

[talos:vars] ansible_connection=local cluster_name=mycluster cluster_endpoint=https://192.168.0.2:6443

Cluster bootstrap playbook

The play generates the machine configs once, applies the right config to each node, bootstraps a single control plane node, and retrieves the kubeconfig. The one-time steps are guarded with creates so re-runs are safe.

---
- name: Bootstrap a Talos Linux 1.8 cluster
  hosts: localhost
  connection: local
  gather_facts: false
  vars:
    work_dir: "{{ playbook_dir }}/talos"
    cp_ip: "{{ hostvars['cp1'].talos_ip }}"
  tasks:
    - name: Ensure working directory exists
      ansible.builtin.file:
        path: "{{ work_dir }}"
        state: directory
        mode: "0700"

- name: Generate machine configs and talosconfig (once) ansible.builtin.command: cmd: >- talosctl gen config {{ cluster_name }} {{ cluster_endpoint }} --output-dir {{ work_dir }} creates: "{{ work_dir }}/controlplane.yaml"

- name: Apply control plane config (insecure, first boot) ansible.builtin.command: cmd: >- talosctl apply-config --insecure --nodes {{ hostvars[item].talos_ip }} --file {{ work_dir }}/controlplane.yaml loop: "{{ groups['talos_controlplane'] }}"

- name: Apply worker config (insecure, first boot) ansible.builtin.command: cmd: >- talosctl apply-config --insecure --nodes {{ hostvars[item].talos_ip }} --file {{ work_dir }}/worker.yaml loop: "{{ groups['talos_workers'] }}"

- name: Bootstrap etcd on one control plane node (only once) ansible.builtin.command: cmd: >- talosctl bootstrap --talosconfig {{ work_dir }}/talosconfig --nodes {{ cp_ip }} --endpoints {{ cp_ip }} creates: "{{ work_dir }}/.bootstrapped" register: bootstrap

- name: Mark cluster as bootstrapped ansible.builtin.copy: dest: "{{ work_dir }}/.bootstrapped" content: "bootstrapped {{ cp_ip }}\n" mode: "0600" when: bootstrap is changed

- name: Retrieve the kubeconfig ansible.builtin.command: cmd: >- talosctl kubeconfig {{ work_dir }}/kubeconfig --talosconfig {{ work_dir }}/talosconfig --nodes {{ cp_ip }} --endpoints {{ cp_ip }} creates: "{{ work_dir }}/kubeconfig"

- name: Wait for all nodes to become Ready kubernetes.core.k8s_info: kubeconfig: "{{ work_dir }}/kubeconfig" kind: Node register: nodes retries: 30 delay: 10 until: - nodes.resources | length > 0 - nodes.resources | map(attribute='status.conditions') | flatten | selectattr('type', 'equalto', 'Ready') | selectattr('status', 'equalto', 'True') | list | length == nodes.resources | length

> The first apply-config uses --insecure because the node's PKI is not yet set up; later management uses the generated talosconfig. The bootstrap step must run exactly once on a single control plane node — the .bootstrapped marker plus creates enforces that on re-runs.

See also: Ansible on Microk8s: Cluster Bootstrap Complete Guide

Validation

ansible-playbook -i inventory/talos.ini bootstrap-talos.yml

# then, against the fetched artifacts: talosctl --talosconfig talos/talosconfig -n 192.168.0.2 health kubectl --kubeconfig talos/kubeconfig get nodes -o wide

Run the playbook a second time to confirm idempotency: gen config, bootstrap, and kubeconfig are skipped by their creates guards, so only the declarative apply-config tasks re-execute.

Troubleshooting

| Symptom | Likely cause | Fix | |---|---|---| | specified install disk does not exist: "/dev/sda" | Node's disk is vda/nvme0n1, not sda | Run talosctl disks --insecure -n , set install.disk in the machine config, re-apply | | connection refused / timeout on apply-config | Talos API port 50000 not reachable | Open TCP 50000 to the node; the first apply must hit the node directly (no endpoint proxy yet) | | certificate signed by unknown authority | Applying without --insecure before PKI exists, or wrong talosconfig | Use --insecure for the first apply; afterwards pass the generated --talosconfig | | etcd never forms / API never comes up | bootstrap not run, or run on more than one node | Bootstrap exactly once, on a single control plane node | | Worker stuck NotReady | No CNI installed yet | Apply your CNI (e.g. Cilium) with kubernetes.core.k8s after bootstrap |

FAQ

Q. How does Ansible connect to Talos if there's no SSH? It doesn't connect to the nodes at all. The play runs locally on the control node (connection: local) and drives the cluster through talosctl, which speaks the Talos API on port 50000.

Q. Can I use apt/dnf or a shell task to change a Talos node? No. Talos is immutable and ships no package manager or shell. Everything is set through the machine config; to change a node you edit its config and run talosctl apply-config, and you upgrade with talosctl upgrade.

Q. Which talosctl version should I use? Match it to the Talos version you want to run — the talosctl that generates the machine config determines the installed Talos version. For Talos 1.8, use a 1.8.x talosctl.

Q. Is the bootstrap idempotent? The apply-config step is declarative and safe to re-apply. bootstrap is a one-time operation, so the playbook guards it with a marker file and creates; a second run skips it.

Q. How do I add more workers later? Boot the new machine off the Talos ISO, add it to the [talos_workers] group, and re-run the play — only the new apply-config task changes, and the node joins the existing cluster automatically.

install an ingress controller on Talos Linux 1.8provision StorageClass and PVCs on Talos Linux 1.8reboot-aware patching workflow for Talos Linuxmanage cluster resources with the kubernetes.core.k8s modulebootstrap RKE2 with Ansible

Conclusion

Talos Linux 1.8 turns a cluster into pure declarative config, and Ansible is the ideal driver for it: template the machine configs from inventory, wrap talosctl for the gen config → apply-config → bootstrap → kubeconfig flow, and finish with kubernetes.core for in-cluster resources. Guard the one-time steps with creates, keep controlplane.yaml and worker.yaml in Git, and you get a repeatable bootstrap that scales from a single control plane node to a full HA cluster by editing inventory alone.

Category: installation

Browse all Ansible tutorials · AnsiblePilot Home