Ansible Monitoring: Integrate with Prometheus, Grafana & Alerting (Complete Guide)

By Luca Berton · Published 2024-01-01 · Category: installation

How to use Ansible for monitoring automation. Deploy Prometheus, Grafana, and alerting. Monitor Ansible playbook execution with callback plugins and metrics.

Ansible Monitoring: Integrate with Prometheus, Grafana & Alerting (Complete Guide)

Ansible serves two monitoring roles: deploying monitoring infrastructure (Prometheus, Grafana, alerting) and being monitored itself (tracking playbook execution, task metrics, and automation health). This guide covers both.

Part 1: Deploy Monitoring Stack with Ansible

Install Prometheus

--- - name: Deploy Prometheus hosts: monitoring become: true vars: prometheus_version: "2.53.0" tasks: - name: Create prometheus user ansible.builtin.user: name: prometheus system: true shell: /usr/sbin/nologin - name: Download Prometheus ansible.builtin.get_url: url: "https://github.com/prometheus/prometheus/releases/download/v{{ prometheus_version }}/prometheus-{{ prometheus_version }}.linux-amd64.tar.gz" dest: /tmp/prometheus.tar.gz - name: Extract Prometheus ansible.builtin.unarchive: src: /tmp/prometheus.tar.gz dest: /opt/ remote_src: true - name: Deploy Prometheus config ansible.builtin.template: src: prometheus.yml.j2 dest: /etc/prometheus/prometheus.yml notify: restart prometheus - name: Create systemd service ansible.builtin.template: src: prometheus.service.j2 dest: /etc/systemd/system/prometheus.service notify: restart prometheus - name: Start Prometheus ansible.builtin.systemd: name: prometheus state: started enabled: true daemon_reload: true

handlers: - name: restart prometheus ansible.builtin.systemd: name: prometheus state: restarted

Install Node Exporter on All Hosts

- name: Deploy Node Exporter hosts: all become: true vars: node_exporter_version: "1.8.0" tasks: - name: Download Node Exporter ansible.builtin.get_url: url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-amd64.tar.gz" dest: /tmp/node_exporter.tar.gz - name: Extract and install ansible.builtin.unarchive: src: /tmp/node_exporter.tar.gz dest: /usr/local/bin/ remote_src: true extra_opts: [--strip-components=1] creates: /usr/local/bin/node_exporter - name: Create systemd service ansible.builtin.copy: content: | [Unit] Description=Node Exporter After=network.target [Service] User=node_exporter ExecStart=/usr/local/bin/node_exporter Restart=always [Install] WantedBy=multi-user.target dest: /etc/systemd/system/node_exporter.service

- name: Start Node Exporter ansible.builtin.systemd: name: node_exporter state: started enabled: true daemon_reload: true

Install Grafana

- name: Deploy Grafana hosts: monitoring become: true tasks: - name: Add Grafana repository ansible.builtin.apt_repository: repo: "deb https://apt.grafana.com stable main" state: present - name: Add Grafana GPG key ansible.builtin.apt_key: url: https://apt.grafana.com/gpg.key state: present - name: Install Grafana ansible.builtin.apt: name: grafana state: present update_cache: true - name: Configure Grafana datasource ansible.builtin.copy: content: | apiVersion: 1 datasources: - name: Prometheus type: prometheus url: http://localhost:9090 isDefault: true access: proxy dest: /etc/grafana/provisioning/datasources/prometheus.yml

- name: Start Grafana ansible.builtin.systemd: name: grafana-server state: started enabled: true

Part 2: Monitor Ansible Execution

Callback Plugins for Metrics

Timer Callback (Built-in)

# ansible.cfg
[defaults]
callbacks_enabled = ansible.builtin.timer, ansible.builtin.profile_tasks

JSON Logging

[defaults]
callbacks_enabled = ansible.builtin.json
log_path = /var/log/ansible/playbook.log

Custom Prometheus Metrics

# Push playbook metrics to Prometheus Pushgateway
- name: Report playbook metrics
  hosts: localhost
  tasks:
    - name: Push execution metrics
      ansible.builtin.uri:
        url: "http://pushgateway:9091/metrics/job/ansible/instance/{{ playbook_dir | basename }}"
        method: POST
        body: |
          ansible_playbook_duration_seconds {{ ansible_date_time.epoch | int - start_time }}
          ansible_playbook_hosts_total {{ ansible_play_hosts | length }}
          ansible_playbook_success 1
        headers:
          Content-Type: text/plain

AAP / AWX Built-in Metrics

Ansible Automation Platform provides: • Job success/failure rates • Execution time per playbook • Host status summaries • REST API for custom dashboards • Prometheus endpoint at /api/v2/metrics/

Alerting on Ansible Failures

# Prometheus alerting rule
groups:
  - name: ansible
    rules:
      - alert: AnsiblePlaybookFailed
        expr: ansible_playbook_success == 0
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "Ansible playbook failed"

Ansible Callback to Slack/Teams on Failure

# In playbook — notify on failure
- name: Deploy application
  hosts: webservers
  tasks:
    - name: Deploy
      ansible.builtin.include_role:
        name: deploy
  
  post_tasks:
    - name: Notify success
      ansible.builtin.uri:
        url: "{{ slack_webhook }}"
        method: POST
        body_format: json
        body:
          text: "✅ Deployment to {{ ansible_play_hosts | length }} hosts completed successfully"
      run_once: true
      when: ansible_play_hosts_all | length == ansible_play_hosts | length
      delegate_to: localhost

Part 3: Self-Healing with Ansible

- name: Auto-remediate common issues
  hosts: all
  tasks:
    - name: Check disk space
      ansible.builtin.shell: df -h / | tail -1 | awk '{print $5}' | tr -d '%'
      register: disk_usage
      changed_when: false
- name: Clean up if disk usage > 85%
      when: disk_usage.stdout | int > 85
      block:
        - name: Clean apt cache
          ansible.builtin.apt:
            autoclean: true
- name: Remove old logs
          ansible.builtin.shell: find /var/log -name "*.gz" -mtime +30 -delete
          changed_when: true
- name: Check if services are running
      ansible.builtin.systemd:
        name: "{{ item }}"
        state: started
      loop:
        - nginx
        - postgresql
      register: service_results
      failed_when: false
- name: Alert on service failures
      ansible.builtin.uri:
        url: "{{ alertmanager_webhook }}"
        method: POST
        body_format: json
        body:
          alerts:
            - labels:
                alertname: "ServiceDown"
                host: "{{ inventory_hostname }}"
                service: "{{ item.item }}"
              annotations:
                summary: "{{ item.item }} is not running on {{ inventory_hostname }}"
      loop: "{{ service_results.results }}"
      when: item.failed | default(false)
      delegate_to: localhost

FAQ

How do I monitor Ansible playbook execution?

Use callback plugins like timer and profile_tasks for execution metrics. For centralized monitoring, push metrics to Prometheus Pushgateway or use AAP/AWX which provides built-in metrics endpoints and job tracking dashboards.

Can Ansible deploy Prometheus and Grafana?

Yes. Ansible is ideal for deploying monitoring stacks. Use playbooks to install Prometheus, Node Exporter, Grafana, and Alertmanager across your infrastructure, then configure dashboards and alerting rules as code.

How do I get alerts when Ansible playbooks fail?

Use callback plugins to send notifications to Slack, Teams, or email. In AAP/AWX, configure notification templates. For custom alerting, push metrics to Prometheus and create alert rules in Alertmanager.

Can Ansible do self-healing automation?

Yes, especially with Event-Driven Ansible (EDA) in AAP. EDA listens for monitoring alerts and automatically triggers remediation playbooks. With open-source Ansible, schedule periodic health-check playbooks via cron.

Conclusion

Ansible excels at both deploying monitoring infrastructure and being monitored. Use it to automate Prometheus, Grafana, and alerting setup, then monitor Ansible itself with callback plugins, metrics endpoints, and integration with your alerting stack.

• Ansible Callback Plugins: Customize Output & Logging • Ansible uri Module: HTTP REST API Calls • Ansible Automation Platform (AAP) Guide

Category: installation

Browse all Ansible tutorials · AnsiblePilot Home

AnsiblePilot — Master Ansible Automation

Popular Topics

About Luca Berton

Ansible Monitoring: Integrate with Prometheus, Grafana & Alerting (Complete Guide)

Ansible Monitoring: Integrate with Prometheus, Grafana & Alerting (Complete Guide)

Part 1: Deploy Monitoring Stack with Ansible

Install Prometheus

Install Node Exporter on All Hosts

Install Grafana

Part 2: Monitor Ansible Execution

Callback Plugins for Metrics

Timer Callback (Built-in)

JSON Logging

Custom Prometheus Metrics

AAP / AWX Built-in Metrics

Alerting on Ansible Failures

Ansible Callback to Slack/Teams on Failure

Part 3: Self-Healing with Ansible

FAQ

How do I monitor Ansible playbook execution?

Can Ansible deploy Prometheus and Grafana?

How do I get alerts when Ansible playbooks fail?

Can Ansible do self-healing automation?

Conclusion

Related Articles