Ansible Monitoring: Integrate with Prometheus, Grafana & Alerting (Complete Guide)
By Luca Berton · Published 2024-01-01 · Category: installation
How to use Ansible for monitoring automation. Deploy Prometheus, Grafana, and alerting. Monitor Ansible playbook execution with callback plugins and metrics.
Ansible Monitoring: Integrate with Prometheus, Grafana & Alerting (Complete Guide)
Ansible serves two monitoring roles: deploying monitoring infrastructure (Prometheus, Grafana, alerting) and being monitored itself (tracking playbook execution, task metrics, and automation health). This guide covers both.
See also: AAP 2.6 Monitoring and Logging: Prometheus, Grafana, and Log Aggregation
Part 1: Deploy Monitoring Stack with Ansible
Install Prometheus
---
- name: Deploy Prometheus
hosts: monitoring
become: true
vars:
prometheus_version: "2.53.0"
tasks:
- name: Create prometheus user
ansible.builtin.user:
name: prometheus
system: true
shell: /usr/sbin/nologin
- name: Download Prometheus
ansible.builtin.get_url:
url: "https://github.com/prometheus/prometheus/releases/download/v{{ prometheus_version }}/prometheus-{{ prometheus_version }}.linux-amd64.tar.gz"
dest: /tmp/prometheus.tar.gz
- name: Extract Prometheus
ansible.builtin.unarchive:
src: /tmp/prometheus.tar.gz
dest: /opt/
remote_src: true
- name: Deploy Prometheus config
ansible.builtin.template:
src: prometheus.yml.j2
dest: /etc/prometheus/prometheus.yml
notify: restart prometheus
- name: Create systemd service
ansible.builtin.template:
src: prometheus.service.j2
dest: /etc/systemd/system/prometheus.service
notify: restart prometheus
- name: Start Prometheus
ansible.builtin.systemd:
name: prometheus
state: started
enabled: true
daemon_reload: true
handlers:
- name: restart prometheus
ansible.builtin.systemd:
name: prometheus
state: restarted
Install Node Exporter on All Hosts
- name: Deploy Node Exporter
hosts: all
become: true
vars:
node_exporter_version: "1.8.0"
tasks:
- name: Download Node Exporter
ansible.builtin.get_url:
url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-amd64.tar.gz"
dest: /tmp/node_exporter.tar.gz
- name: Extract and install
ansible.builtin.unarchive:
src: /tmp/node_exporter.tar.gz
dest: /usr/local/bin/
remote_src: true
extra_opts: [--strip-components=1]
creates: /usr/local/bin/node_exporter
- name: Create systemd service
ansible.builtin.copy:
content: |
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter
Restart=always
[Install]
WantedBy=multi-user.target
dest: /etc/systemd/system/node_exporter.service
- name: Start Node Exporter
ansible.builtin.systemd:
name: node_exporter
state: started
enabled: true
daemon_reload: true
Install Grafana
- name: Deploy Grafana
hosts: monitoring
become: true
tasks:
- name: Add Grafana repository
ansible.builtin.apt_repository:
repo: "deb https://apt.grafana.com stable main"
state: present
- name: Add Grafana GPG key
ansible.builtin.apt_key:
url: https://apt.grafana.com/gpg.key
state: present
- name: Install Grafana
ansible.builtin.apt:
name: grafana
state: present
update_cache: true
- name: Configure Grafana datasource
ansible.builtin.copy:
content: |
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://localhost:9090
isDefault: true
access: proxy
dest: /etc/grafana/provisioning/datasources/prometheus.yml
- name: Start Grafana
ansible.builtin.systemd:
name: grafana-server
state: started
enabled: true
Part 2: Monitor Ansible Execution
Callback Plugins for Metrics
Timer Callback (Built-in)
# ansible.cfg
[defaults]
callbacks_enabled = ansible.builtin.timer, ansible.builtin.profile_tasks
JSON Logging
[defaults]
callbacks_enabled = ansible.builtin.json
log_path = /var/log/ansible/playbook.log
Custom Prometheus Metrics
# Push playbook metrics to Prometheus Pushgateway
- name: Report playbook metrics
hosts: localhost
tasks:
- name: Push execution metrics
ansible.builtin.uri:
url: "http://pushgateway:9091/metrics/job/ansible/instance/{{ playbook_dir | basename }}"
method: POST
body: |
ansible_playbook_duration_seconds {{ ansible_date_time.epoch | int - start_time }}
ansible_playbook_hosts_total {{ ansible_play_hosts | length }}
ansible_playbook_success 1
headers:
Content-Type: text/plain
AAP / AWX Built-in Metrics
Ansible Automation Platform provides:
• Job success/failure rates
• Execution time per playbook
• Host status summaries
• REST API for custom dashboards
• Prometheus endpoint at /api/v2/metrics/
Alerting on Ansible Failures
# Prometheus alerting rule
groups:
- name: ansible
rules:
- alert: AnsiblePlaybookFailed
expr: ansible_playbook_success == 0
for: 0m
labels:
severity: critical
annotations:
summary: "Ansible playbook failed"
Ansible Callback to Slack/Teams on Failure
# In playbook — notify on failure
- name: Deploy application
hosts: webservers
tasks:
- name: Deploy
ansible.builtin.include_role:
name: deploy
post_tasks:
- name: Notify success
ansible.builtin.uri:
url: "{{ slack_webhook }}"
method: POST
body_format: json
body:
text: "✅ Deployment to {{ ansible_play_hosts | length }} hosts completed successfully"
run_once: true
when: ansible_play_hosts_all | length == ansible_play_hosts | length
delegate_to: localhost
See also: Ansible Monitoring and Observability: Prometheus, Grafana, and ELK Stack Integration
Part 3: Self-Healing with Ansible
- name: Auto-remediate common issues
hosts: all
tasks:
- name: Check disk space
ansible.builtin.shell: df -h / | tail -1 | awk '{print $5}' | tr -d '%'
register: disk_usage
changed_when: false
- name: Clean up if disk usage > 85%
when: disk_usage.stdout | int > 85
block:
- name: Clean apt cache
ansible.builtin.apt:
autoclean: true
- name: Remove old logs
ansible.builtin.shell: find /var/log -name "*.gz" -mtime +30 -delete
changed_when: true
- name: Check if services are running
ansible.builtin.systemd:
name: "{{ item }}"
state: started
loop:
- nginx
- postgresql
register: service_results
failed_when: false
- name: Alert on service failures
ansible.builtin.uri:
url: "{{ alertmanager_webhook }}"
method: POST
body_format: json
body:
alerts:
- labels:
alertname: "ServiceDown"
host: "{{ inventory_hostname }}"
service: "{{ item.item }}"
annotations:
summary: "{{ item.item }} is not running on {{ inventory_hostname }}"
loop: "{{ service_results.results }}"
when: item.failed | default(false)
delegate_to: localhost
FAQ
How do I monitor Ansible playbook execution?
Use callback plugins like timer and profile_tasks for execution metrics. For centralized monitoring, push metrics to Prometheus Pushgateway or use AAP/AWX which provides built-in metrics endpoints and job tracking dashboards.
Can Ansible deploy Prometheus and Grafana?
Yes. Ansible is ideal for deploying monitoring stacks. Use playbooks to install Prometheus, Node Exporter, Grafana, and Alertmanager across your infrastructure, then configure dashboards and alerting rules as code.
How do I get alerts when Ansible playbooks fail?
Use callback plugins to send notifications to Slack, Teams, or email. In AAP/AWX, configure notification templates. For custom alerting, push metrics to Prometheus and create alert rules in Alertmanager.
Can Ansible do self-healing automation?
Yes, especially with Event-Driven Ansible (EDA) in AAP. EDA listens for monitoring alerts and automatically triggers remediation playbooks. With open-source Ansible, schedule periodic health-check playbooks via cron.
See also: Integrate Automation Controller, Prometheus, and Grafana to IT Monitor Realtime
Conclusion
Ansible excels at both deploying monitoring infrastructure and being monitored. Use it to automate Prometheus, Grafana, and alerting setup, then monitor Ansible itself with callback plugins, metrics endpoints, and integration with your alerting stack.
Related Articles
• Ansible Callback Plugins: Customize Output & Logging • Ansible uri Module: HTTP REST API Calls • Ansible Automation Platform (AAP) GuideCategory: installation