AnsiblePilot — Master Ansible Automation

AnsiblePilot is the leading resource for learning Ansible automation, DevOps, and infrastructure as code. Browse over 1,400 tutorials covering Ansible modules, playbooks, roles, collections, and real-world examples. Whether you are a beginner or an experienced engineer, our step-by-step guides help you automate Linux, Windows, cloud, containers, and network infrastructure.

Popular Topics

About Luca Berton

Luca Berton is an Ansible automation expert, author of 8 Ansible books published by Apress and Leanpub including "Ansible for VMware by Examples" and "Ansible for Kubernetes by Example", and creator of the Ansible Pilot YouTube channel. He shares practical automation knowledge through tutorials, books, and video courses to help IT professionals and DevOps engineers master infrastructure automation.

Ansible Monitoring: Integrate with Prometheus, Grafana & Alerting (Complete Guide)

By Luca Berton · Published 2024-01-01 · Category: installation

How to use Ansible for monitoring automation. Deploy Prometheus, Grafana, and alerting. Monitor Ansible playbook execution with callback plugins and metrics.

Ansible Monitoring: Integrate with Prometheus, Grafana & Alerting (Complete Guide)

Ansible serves two monitoring roles: deploying monitoring infrastructure (Prometheus, Grafana, alerting) and being monitored itself (tracking playbook execution, task metrics, and automation health). This guide covers both.

See also: AAP 2.6 Monitoring and Logging: Prometheus, Grafana, and Log Aggregation

Part 1: Deploy Monitoring Stack with Ansible

Install Prometheus

---
- name: Deploy Prometheus
  hosts: monitoring
  become: true
  vars:
    prometheus_version: "2.53.0"
  tasks:
    - name: Create prometheus user
      ansible.builtin.user:
        name: prometheus
        system: true
        shell: /usr/sbin/nologin

- name: Download Prometheus ansible.builtin.get_url: url: "https://github.com/prometheus/prometheus/releases/download/v{{ prometheus_version }}/prometheus-{{ prometheus_version }}.linux-amd64.tar.gz" dest: /tmp/prometheus.tar.gz

- name: Extract Prometheus ansible.builtin.unarchive: src: /tmp/prometheus.tar.gz dest: /opt/ remote_src: true

- name: Deploy Prometheus config ansible.builtin.template: src: prometheus.yml.j2 dest: /etc/prometheus/prometheus.yml notify: restart prometheus

- name: Create systemd service ansible.builtin.template: src: prometheus.service.j2 dest: /etc/systemd/system/prometheus.service notify: restart prometheus

- name: Start Prometheus ansible.builtin.systemd: name: prometheus state: started enabled: true daemon_reload: true

handlers: - name: restart prometheus ansible.builtin.systemd: name: prometheus state: restarted

Install Node Exporter on All Hosts

- name: Deploy Node Exporter
  hosts: all
  become: true
  vars:
    node_exporter_version: "1.8.0"
  tasks:
    - name: Download Node Exporter
      ansible.builtin.get_url:
        url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-amd64.tar.gz"
        dest: /tmp/node_exporter.tar.gz

- name: Extract and install ansible.builtin.unarchive: src: /tmp/node_exporter.tar.gz dest: /usr/local/bin/ remote_src: true extra_opts: [--strip-components=1] creates: /usr/local/bin/node_exporter

- name: Create systemd service ansible.builtin.copy: content: | [Unit] Description=Node Exporter After=network.target

[Service] User=node_exporter ExecStart=/usr/local/bin/node_exporter Restart=always

[Install] WantedBy=multi-user.target dest: /etc/systemd/system/node_exporter.service

- name: Start Node Exporter ansible.builtin.systemd: name: node_exporter state: started enabled: true daemon_reload: true

Install Grafana

- name: Deploy Grafana
  hosts: monitoring
  become: true
  tasks:
    - name: Add Grafana repository
      ansible.builtin.apt_repository:
        repo: "deb https://apt.grafana.com stable main"
        state: present

- name: Add Grafana GPG key ansible.builtin.apt_key: url: https://apt.grafana.com/gpg.key state: present

- name: Install Grafana ansible.builtin.apt: name: grafana state: present update_cache: true

- name: Configure Grafana datasource ansible.builtin.copy: content: | apiVersion: 1 datasources: - name: Prometheus type: prometheus url: http://localhost:9090 isDefault: true access: proxy dest: /etc/grafana/provisioning/datasources/prometheus.yml

- name: Start Grafana ansible.builtin.systemd: name: grafana-server state: started enabled: true

Part 2: Monitor Ansible Execution

Callback Plugins for Metrics

Timer Callback (Built-in)

# ansible.cfg
[defaults]
callbacks_enabled = ansible.builtin.timer, ansible.builtin.profile_tasks

JSON Logging

[defaults]
callbacks_enabled = ansible.builtin.json
log_path = /var/log/ansible/playbook.log

Custom Prometheus Metrics

# Push playbook metrics to Prometheus Pushgateway
- name: Report playbook metrics
  hosts: localhost
  tasks:
    - name: Push execution metrics
      ansible.builtin.uri:
        url: "http://pushgateway:9091/metrics/job/ansible/instance/{{ playbook_dir | basename }}"
        method: POST
        body: |
          ansible_playbook_duration_seconds {{ ansible_date_time.epoch | int - start_time }}
          ansible_playbook_hosts_total {{ ansible_play_hosts | length }}
          ansible_playbook_success 1
        headers:
          Content-Type: text/plain

AAP / AWX Built-in Metrics

Ansible Automation Platform provides: • Job success/failure rates • Execution time per playbook • Host status summaries • REST API for custom dashboards • Prometheus endpoint at /api/v2/metrics/

Alerting on Ansible Failures

# Prometheus alerting rule
groups:
  - name: ansible
    rules:
      - alert: AnsiblePlaybookFailed
        expr: ansible_playbook_success == 0
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "Ansible playbook failed"

Ansible Callback to Slack/Teams on Failure

# In playbook — notify on failure
- name: Deploy application
  hosts: webservers
  tasks:
    - name: Deploy
      ansible.builtin.include_role:
        name: deploy
  
  post_tasks:
    - name: Notify success
      ansible.builtin.uri:
        url: "{{ slack_webhook }}"
        method: POST
        body_format: json
        body:
          text: "✅ Deployment to {{ ansible_play_hosts | length }} hosts completed successfully"
      run_once: true
      when: ansible_play_hosts_all | length == ansible_play_hosts | length
      delegate_to: localhost

See also: Ansible Monitoring and Observability: Prometheus, Grafana, and ELK Stack Integration

Part 3: Self-Healing with Ansible

- name: Auto-remediate common issues
  hosts: all
  tasks:
    - name: Check disk space
      ansible.builtin.shell: df -h / | tail -1 | awk '{print $5}' | tr -d '%'
      register: disk_usage
      changed_when: false

- name: Clean up if disk usage > 85% when: disk_usage.stdout | int > 85 block: - name: Clean apt cache ansible.builtin.apt: autoclean: true

- name: Remove old logs ansible.builtin.shell: find /var/log -name "*.gz" -mtime +30 -delete changed_when: true

- name: Check if services are running ansible.builtin.systemd: name: "{{ item }}" state: started loop: - nginx - postgresql register: service_results failed_when: false

- name: Alert on service failures ansible.builtin.uri: url: "{{ alertmanager_webhook }}" method: POST body_format: json body: alerts: - labels: alertname: "ServiceDown" host: "{{ inventory_hostname }}" service: "{{ item.item }}" annotations: summary: "{{ item.item }} is not running on {{ inventory_hostname }}" loop: "{{ service_results.results }}" when: item.failed | default(false) delegate_to: localhost

FAQ

How do I monitor Ansible playbook execution?

Use callback plugins like timer and profile_tasks for execution metrics. For centralized monitoring, push metrics to Prometheus Pushgateway or use AAP/AWX which provides built-in metrics endpoints and job tracking dashboards.

Can Ansible deploy Prometheus and Grafana?

Yes. Ansible is ideal for deploying monitoring stacks. Use playbooks to install Prometheus, Node Exporter, Grafana, and Alertmanager across your infrastructure, then configure dashboards and alerting rules as code.

How do I get alerts when Ansible playbooks fail?

Use callback plugins to send notifications to Slack, Teams, or email. In AAP/AWX, configure notification templates. For custom alerting, push metrics to Prometheus and create alert rules in Alertmanager.

Can Ansible do self-healing automation?

Yes, especially with Event-Driven Ansible (EDA) in AAP. EDA listens for monitoring alerts and automatically triggers remediation playbooks. With open-source Ansible, schedule periodic health-check playbooks via cron.

See also: Integrate Automation Controller, Prometheus, and Grafana to IT Monitor Realtime

Conclusion

Ansible excels at both deploying monitoring infrastructure and being monitored. Use it to automate Prometheus, Grafana, and alerting setup, then monitor Ansible itself with callback plugins, metrics endpoints, and integration with your alerting stack.

Related Articles

Ansible Callback Plugins: Customize Output & LoggingAnsible uri Module: HTTP REST API CallsAnsible Automation Platform (AAP) Guide

Category: installation

Browse all Ansible tutorials · AnsiblePilot Home