Ansible for IoT and Edge Computing: Automate Device Fleets at Scale

By Luca Berton · Published 2024-01-01 · Category: installation

Automate IoT and edge computing infrastructure with Ansible. Manage device fleets, deploy edge applications, configure network gateways, update firmware.

Why Ansible for IoT and Edge?

Edge computing pushes workloads from centralized clouds to thousands of distributed locations — retail stores, factories, cell towers, vehicles, and remote sites. Managing these devices manually is impossible at scale.

Ansible is uniquely suited for edge automation:

Agentless — no client software to install on resource-constrained devices
SSH-based — works over any network connection, including cellular and satellite
Idempotent — safe to re-run on unreliable connections (if it fails, just run again)
Low overhead — managed devices need only Python and SSH
Offline-capable — ansible-pull lets devices self-configure without inbound connectivity

Edge Architecture Patterns

Pattern 1: Centralized Push (AAP + Automation Mesh)

[AAP Controller] → [Hop Nodes] → [Edge Devices]
    (HQ)           (Regional)     (1000s of sites)

AAP's Automation Mesh extends execution across network boundaries:

# Automation Mesh topology
# Controller (HQ) → Hop Node (Region) → Execution Node (Edge)
#
# Hop nodes relay traffic without running playbooks
# Execution nodes run playbooks locally at the edge

Pattern 2: Pull-Based (ansible-pull)

For devices behind NAT or with intermittent connectivity:

# Cron-driven ansible-pull on each device
# /etc/cron.d/ansible-pull
*/30 * * * * root ansible-pull \
  -U https://github.com/company/edge-config.git \
  -d /opt/ansible \
  -i localhost, \
  --accept-host-key \
  site.yml >> /var/log/ansible-pull.log 2>&1

# site.yml (in the Git repo)
---
- name: Edge device self-configuration
  hosts: localhost
  connection: local
  become: true

  tasks:
    - name: Ensure edge application is running
      ansible.builtin.systemd:
        name: edge-agent
        state: started
        enabled: true

    - name: Update configuration from Git
      ansible.builtin.template:
        src: edge-config.yml.j2
        dest: /etc/edge-agent/config.yml
      notify: restart edge-agent

    - name: Report status to central server
      ansible.builtin.uri:
        url: "https://management.company.com/api/devices/{{ ansible_hostname }}/heartbeat"
        method: POST
        body_format: json
        body:
          hostname: "{{ ansible_hostname }}"
          uptime: "{{ ansible_uptime_seconds }}"
          version: "{{ edge_agent_version }}"
          ip: "{{ ansible_default_ipv4.address }}"
      ignore_errors: true  # Don't fail if management server unreachable

  handlers:
    - name: restart edge-agent
      ansible.builtin.systemd:
        name: edge-agent
        state: restarted

Pattern 3: Hybrid (Push + Pull)

# Normal operations: ansible-pull every 30 minutes
# Emergency updates: AAP pushes directly via Automation Mesh
# Firmware updates: AAP workflow with rolling update strategy

Manage Raspberry Pi Fleets

---
- name: Configure Raspberry Pi edge devices
  hosts: raspberry_pis
  become: true

  vars:
    wifi_ssid: "{{ vault_wifi_ssid }}"
    wifi_password: "{{ vault_wifi_password }}"
    ntp_server: "time.company.com"

  tasks:
    - name: Set hostname
      ansible.builtin.hostname:
        name: "edge-{{ site_id }}-{{ inventory_hostname_short }}"

    - name: Configure WiFi
      ansible.builtin.template:
        src: wpa_supplicant.conf.j2
        dest: /etc/wpa_supplicant/wpa_supplicant.conf
        mode: '0600'
      notify: restart networking

    - name: Set timezone
      community.general.timezone:
        name: "{{ device_timezone | default('UTC') }}"

    - name: Configure NTP
      ansible.builtin.template:
        src: timesyncd.conf.j2
        dest: /etc/systemd/timesyncd.conf
      notify: restart timesyncd

    - name: Install edge packages
      ansible.builtin.apt:
        name:
          - docker.io
          - mosquitto-clients  # MQTT client
          - python3-pip
          - jq
          - monitoring-plugins-basic
        state: present
        update_cache: true

    - name: Deploy edge application container
      community.docker.docker_container:
        name: edge-sensor
        image: "registry.company.com/edge-sensor:{{ app_version }}"
        state: started
        restart_policy: always
        ports:
          - "8080:8080"
        volumes:
          - /data/sensor:/data
        env:
          SITE_ID: "{{ site_id }}"
          MQTT_BROKER: "{{ mqtt_broker }}"
          DEVICE_ID: "{{ inventory_hostname }}"

    - name: Configure watchdog
      ansible.builtin.copy:
        content: |
          [Unit]
          Description=Hardware Watchdog
          [Service]
          ExecStart=/usr/sbin/watchdog
          [Install]
          WantedBy=multi-user.target
        dest: /etc/systemd/system/watchdog.service
      notify: enable watchdog

    - name: Set GPU memory split (headless)
      ansible.builtin.lineinfile:
        path: /boot/config.txt
        regexp: '^gpu_mem='
        line: 'gpu_mem=16'
      notify: reboot required

  handlers:
    - name: restart networking
      ansible.builtin.systemd:
        name: networking
        state: restarted

    - name: restart timesyncd
      ansible.builtin.systemd:
        name: systemd-timesyncd
        state: restarted

    - name: enable watchdog
      ansible.builtin.systemd:
        name: watchdog
        state: started
        enabled: true

    - name: reboot required
      ansible.builtin.debug:
        msg: "Reboot required on {{ inventory_hostname }}"

Firmware and OS Updates

Rolling Updates with Serial

---
- name: Rolling firmware update
  hosts: edge_devices
  become: true
  serial: "{{ update_batch_size | default('10%') }}"
  max_fail_percentage: 5

  pre_tasks:
    - name: Health check before update
      ansible.builtin.uri:
        url: "http://localhost:8080/health"
      register: health
      failed_when: health.status != 200

  tasks:
    - name: Download firmware
      ansible.builtin.get_url:
        url: "{{ firmware_url }}"
        dest: /tmp/firmware-{{ firmware_version }}.bin
        checksum: "sha256:{{ firmware_checksum }}"

    - name: Stop application
      ansible.builtin.systemd:
        name: edge-agent
        state: stopped

    - name: Apply firmware update
      ansible.builtin.command:
        cmd: "/usr/local/bin/fw-update /tmp/firmware-{{ firmware_version }}.bin"
      register: fw_result
      failed_when: fw_result.rc != 0

    - name: Reboot device
      ansible.builtin.reboot:
        reboot_timeout: 300
        connect_timeout: 30
        post_reboot_delay: 30

  post_tasks:
    - name: Verify firmware version
      ansible.builtin.command:
        cmd: cat /sys/firmware/version
      register: current_fw
      failed_when: firmware_version not in current_fw.stdout

    - name: Health check after update
      ansible.builtin.uri:
        url: "http://localhost:8080/health"
      register: post_health
      retries: 5
      delay: 30
      until: post_health.status == 200

OS Image Updates (A/B Partition)

---
- name: A/B partition OS update
  hosts: edge_devices
  become: true

  tasks:
    - name: Identify inactive partition
      ansible.builtin.shell: |
        current=$(findmnt -n -o SOURCE /)
        if [[ "$current" == *"partA"* ]]; then
          echo "partB"
        else
          echo "partA"
        fi
      register: inactive_partition
      changed_when: false

    - name: Write new image to inactive partition
      ansible.builtin.command:
        cmd: >
          dd if=/tmp/os-image-{{ os_version }}.img
          of=/dev/mmcblk0{{ inactive_partition.stdout }}
          bs=4M status=progress
      async: 600
      poll: 30

    - name: Update boot configuration
      ansible.builtin.lineinfile:
        path: /boot/grub/grub.cfg
        regexp: '^set default='
        line: "set default={{ inactive_partition.stdout }}"

    - name: Set rollback timer
      ansible.builtin.copy:
        content: |
          [Unit]
          Description=Rollback if health check fails
          [Timer]
          OnBootSec=300
          [Install]
          WantedBy=timers.target
        dest: /etc/systemd/system/rollback-check.timer

    - name: Reboot into new partition
      ansible.builtin.reboot:
        reboot_timeout: 300

Network Gateway Configuration

---
- name: Configure edge network gateways
  hosts: gateways
  become: true

  tasks:
    - name: Configure MQTT broker
      ansible.builtin.template:
        src: mosquitto.conf.j2
        dest: /etc/mosquitto/mosquitto.conf
      notify: restart mosquitto

    - name: Configure VPN tunnel to HQ
      ansible.builtin.template:
        src: wireguard.conf.j2
        dest: /etc/wireguard/wg0.conf
        mode: '0600'
      notify: restart wireguard

    - name: Enable IP forwarding
      ansible.posix.sysctl:
        name: net.ipv4.ip_forward
        value: '1'
        sysctl_set: true

    - name: Configure NAT for edge network
      ansible.builtin.iptables:
        table: nat
        chain: POSTROUTING
        out_interface: wg0
        jump: MASQUERADE

    - name: Deploy local container registry mirror
      community.docker.docker_container:
        name: registry-mirror
        image: registry:2
        state: started
        restart_policy: always
        ports:
          - "5000:5000"
        volumes:
          - /data/registry:/var/lib/registry
        env:
          REGISTRY_PROXY_REMOTEURL: "https://registry.company.com"

  handlers:
    - name: restart mosquitto
      ansible.builtin.systemd:
        name: mosquitto
        state: restarted

    - name: restart wireguard
      ansible.builtin.systemd:
        name: wg-quick@wg0
        state: restarted

Dynamic Inventory for Edge Devices

#!/usr/bin/env python3
# edge_inventory.py - Dynamic inventory from device management API
import json
import requests
import os

API_URL = os.environ.get('EDGE_API_URL', 'https://management.company.com/api')
API_TOKEN = os.environ.get('EDGE_API_TOKEN')

def get_inventory():
    headers = {'Authorization': f'Bearer {API_TOKEN}'}
    devices = requests.get(f'{API_URL}/devices', headers=headers).json()
    
    inventory = {
        '_meta': {'hostvars': {}},
        'all': {'children': ['edge_devices', 'gateways']},
        'edge_devices': {'hosts': []},
        'gateways': {'hosts': []}
    }
    
    # Group by site
    sites = {}
    for device in devices:
        hostname = device['hostname']
        site = device['site_id']
        
        # Add to site group
        site_group = f"site_{site}"
        if site_group not in sites:
            sites[site_group] = {'hosts': []}
            inventory['all']['children'].append(site_group)
        sites[site_group]['hosts'].append(hostname)
        
        # Add to type group
        device_type = 'gateways' if device['role'] == 'gateway' else 'edge_devices'
        inventory[device_type]['hosts'].append(hostname)
        
        # Host variables
        inventory['_meta']['hostvars'][hostname] = {
            'ansible_host': device['ip_address'],
            'site_id': site,
            'device_type': device['hardware'],
            'firmware_version': device['firmware'],
            'app_version': device.get('app_version', 'unknown')
        }
    
    inventory.update(sites)
    return inventory

if __name__ == '__main__':
    print(json.dumps(get_inventory(), indent=2))

Monitoring Edge Fleets

---
- name: Deploy monitoring to edge devices
  hosts: edge_devices
  become: true

  tasks:
    - name: Install node-exporter
      ansible.builtin.get_url:
        url: "https://github.com/prometheus/node_exporter/releases/download/v1.8.0/node_exporter-1.8.0.linux-{{ go_arch }}.tar.gz"
        dest: /tmp/node_exporter.tar.gz

    - name: Deploy node-exporter
      ansible.builtin.unarchive:
        src: /tmp/node_exporter.tar.gz
        dest: /usr/local/bin/
        remote_src: true
        extra_opts: [--strip-components=1]
        creates: /usr/local/bin/node_exporter

    - name: Create systemd service
      ansible.builtin.copy:
        content: |
          [Unit]
          Description=Node Exporter
          After=network.target
          [Service]
          User=nobody
          ExecStart=/usr/local/bin/node_exporter \
            --collector.textfile.directory=/var/lib/node_exporter \
            --collector.systemd \
            --collector.processes
          [Install]
          WantedBy=multi-user.target
        dest: /etc/systemd/system/node-exporter.service
      notify: restart node-exporter

    - name: Create custom metrics directory
      ansible.builtin.file:
        path: /var/lib/node_exporter
        state: directory
        owner: nobody

    - name: Deploy edge-specific metric collector
      ansible.builtin.copy:
        content: |
          #!/bin/bash
          # Custom edge metrics
          echo "# HELP edge_sensor_temperature Edge device CPU temperature"
          echo "# TYPE edge_sensor_temperature gauge"
          temp=$(cat /sys/class/thermal/thermal_zone0/temp 2>/dev/null || echo 0)
          echo "edge_sensor_temperature $((temp / 1000))"
          
          echo "# HELP edge_uplink_status Edge WAN link status"
          echo "# TYPE edge_uplink_status gauge"
          ping -c 1 -W 2 8.8.8.8 >/dev/null 2>&1 && echo "edge_uplink_status 1" || echo "edge_uplink_status 0"
        dest: /usr/local/bin/edge-metrics.sh
        mode: '0755'

    - name: Schedule metric collection
      ansible.builtin.cron:
        name: "edge metrics"
        minute: "*/5"
        job: "/usr/local/bin/edge-metrics.sh > /var/lib/node_exporter/edge.prom"

  handlers:
    - name: restart node-exporter
      ansible.builtin.systemd:
        name: node-exporter
        state: restarted
        daemon_reload: true
        enabled: true

FAQ

Can Ansible manage thousands of edge devices?

Yes. AAP's Automation Mesh distributes execution across hop and execution nodes, handling 10,000+ devices. For pull-based models, ansible-pull scales indefinitely since each device manages itself. Use dynamic inventory to track devices and serial for rolling updates.

What about devices with intermittent connectivity?

Use ansible-pull with a cron schedule. The device pulls configuration from a Git repository whenever it has connectivity. For critical updates, queue jobs in AAP — Automation Mesh retries when the device reconnects.

How do I handle device-specific configuration at scale?

Use dynamic inventory with host variables from a device management database or CMDB. Group variables handle site-level config, host variables handle device-specific settings. Jinja2 templates generate unique configurations per device.

Does Ansible work on ARM devices (Raspberry Pi, Jetson)?

Yes. Ansible's control node needs a standard Linux system, but managed nodes (including ARM) just need Python 3 and SSH. Raspberry Pi, NVIDIA Jetson, and ARM servers all work as managed nodes.

Conclusion

Ansible automates edge computing at scale through three patterns: centralized push with AAP Automation Mesh for managed environments, ansible-pull for devices behind NAT or with intermittent connectivity, and hybrid approaches combining both. From Raspberry Pi fleets to industrial gateways to retail edge servers, Ansible's agentless architecture and SSH-based communication make it the natural choice for managing distributed device infrastructure.

Category: installation

Browse all Ansible tutorials · AnsiblePilot Home

AnsiblePilot — Master Ansible Automation

Popular Topics

About Luca Berton