Ansible Event-Driven Automation (EDA): Complete Guide with Rulebooks and Examples
By Luca Berton · Published 2024-01-01 · Category: installation
Master Ansible Event-Driven Automation (EDA) for real-time incident response and auto-remediation.
Ansible Event-Driven Automation (EDA) enables real-time, automated responses to events from monitoring systems, cloud providers, webhooks, and custom sources. Instead of running playbooks on a schedule, EDA listens for events and triggers automation instantly when conditions are met. This guide covers the complete EDA architecture, rulebook syntax, event sources, and production patterns.
What is Event-Driven Automation?
Traditional Ansible automation is imperative — you run a playbook when you decide to. EDA flips this model: you define rules that watch for specific events and automatically trigger actions when conditions match.
Event Source → Event → Rule Matching → Action → Resolution
(Webhook) (alert) (severity=critical) (run playbook) (service restored)
Key Concepts
| Concept | Description | |---------|-------------| | Rulebook | YAML file defining sources, rules, conditions, and actions | | Event Source | Plugin that receives events (webhook, Kafka, alertmanager, etc.) | | Condition | Jinja2 expression that evaluates event data | | Action | What happens when condition matches (run_playbook, run_job_template, etc.) | | ansible-rulebook | CLI tool that executes rulebooks |
See also: Event-Driven Ansible (EDA): Automate Responses to Events Guide
Installation
# Install ansible-rulebook
pip install ansible-rulebook
# Install required collections
ansible-galaxy collection install ansible.eda
# Verify installation
ansible-rulebook --version
Requirements: Python 3.9+, Java 17+ (for Drools rule engine).
Rulebook Syntax
Basic Structure
---
- name: Respond to monitoring alerts
hosts: all
sources:
- ansible.eda.webhook:
host: 0.0.0.0
port: 5000
rules:
- name: Restart service on failure alert
condition: event.payload.status == "firing" and event.payload.alert == "service_down"
action:
run_playbook:
name: playbooks/restart-service.yml
extra_vars:
target_service: "{{ event.payload.service }}"
target_host: "{{ event.payload.host }}"
Running a Rulebook
# Run with verbose output
ansible-rulebook --rulebook rulebook.yml -i inventory.yml --verbose
# Run with specific variables
ansible-rulebook --rulebook rulebook.yml -i inventory.yml \
--vars vars.yml
See also: A Preview of Ansible Journey in 2024
Event Sources
Webhook Source
sources:
- ansible.eda.webhook:
host: 0.0.0.0
port: 5000
token: "{{ webhook_secret }}"
Alertmanager Source
sources:
- ansible.eda.alertmanager:
host: 0.0.0.0
port: 8888
Kafka Source
sources:
- ansible.eda.kafka:
host: kafka-broker.example.com
port: 9092
topic: infrastructure-events
group_id: eda-consumer
File Watch Source
sources:
- ansible.eda.file_watch:
path: /var/log/application/
recursive: true
URL Check Source
sources:
- ansible.eda.url_check:
urls:
- https://api.example.com/health
- https://web.example.com/health
delay: 30
Production Rulebook Examples
Example 1: Auto-Remediation for Service Failures
---
- name: Service auto-remediation
hosts: all
sources:
- ansible.eda.webhook:
host: 0.0.0.0
port: 5000
rules:
- name: Restart failed service
condition: >-
event.payload.status == "firing"
and event.payload.labels.severity in ["critical", "warning"]
and event.payload.labels.alertname == "ServiceDown"
throttle:
once_within: 5 minutes
group_by:
- event.payload.labels.instance
- event.payload.labels.service
action:
run_playbook:
name: playbooks/restart-service.yml
extra_vars:
target_host: "{{ event.payload.labels.instance }}"
service_name: "{{ event.payload.labels.service }}"
alert_severity: "{{ event.payload.labels.severity }}"
- name: Scale up on high CPU
condition: >-
event.payload.labels.alertname == "HighCPU"
and event.payload.labels.cpu_percent | int > 90
throttle:
once_within: 15 minutes
group_by:
- event.payload.labels.cluster
action:
run_playbook:
name: playbooks/scale-up.yml
extra_vars:
cluster: "{{ event.payload.labels.cluster }}"
- name: Disk cleanup on low space
condition: >-
event.payload.labels.alertname == "DiskSpaceLow"
and event.payload.labels.disk_percent | int > 85
action:
run_playbook:
name: playbooks/disk-cleanup.yml
extra_vars:
target_host: "{{ event.payload.labels.instance }}"
mount_point: "{{ event.payload.labels.mountpoint }}"
Example 2: Security Incident Response
---
- name: Security incident response
hosts: all
sources:
- ansible.eda.webhook:
host: 0.0.0.0
port: 5001
rules:
- name: Block brute force attacker
condition: >-
event.payload.alert_type == "brute_force"
and event.payload.failed_attempts | int > 10
action:
run_playbook:
name: playbooks/block-ip.yml
extra_vars:
attacker_ip: "{{ event.payload.source_ip }}"
target_host: "{{ event.payload.target_host }}"
block_duration: "24h"
- name: Isolate compromised host
condition: >-
event.payload.alert_type == "malware_detected"
and event.payload.confidence | float > 0.9
action:
run_playbook:
name: playbooks/isolate-host.yml
extra_vars:
compromised_host: "{{ event.payload.hostname }}"
malware_hash: "{{ event.payload.file_hash }}"
alert_id: "{{ event.payload.alert_id }}"
- name: Rotate credentials on exposure
condition: >-
event.payload.alert_type == "credential_exposure"
action:
run_playbook:
name: playbooks/rotate-credentials.yml
extra_vars:
exposed_service: "{{ event.payload.service }}"
exposure_source: "{{ event.payload.source }}"
Example 3: Cloud Auto-Scaling and Cost Management
---
- name: Cloud infrastructure automation
hosts: all
sources:
- ansible.eda.webhook:
host: 0.0.0.0
port: 5002
rules:
- name: Auto-scale on queue depth
condition: >-
event.payload.metric == "sqs_queue_depth"
and event.payload.value | int > 1000
throttle:
once_within: 10 minutes
action:
run_playbook:
name: playbooks/scale-workers.yml
extra_vars:
queue_name: "{{ event.payload.queue }}"
current_depth: "{{ event.payload.value }}"
- name: Terminate idle instances
condition: >-
event.payload.metric == "instance_idle"
and event.payload.idle_minutes | int > 60
and event.payload.environment != "production"
action:
run_playbook:
name: playbooks/terminate-instance.yml
extra_vars:
instance_id: "{{ event.payload.instance_id }}"
region: "{{ event.payload.region }}"
- name: Right-size over-provisioned instances
condition: >-
event.payload.alert_type == "rightsizing_recommendation"
and event.payload.savings_percent | int > 30
action:
run_playbook:
name: playbooks/rightsize-instance.yml
extra_vars:
instance_id: "{{ event.payload.instance_id }}"
recommended_type: "{{ event.payload.recommended_instance_type }}"
Example 4: CI/CD Pipeline Triggers
---
- name: CI/CD event-driven deployment
hosts: all
sources:
- ansible.eda.webhook:
host: 0.0.0.0
port: 5003
rules:
- name: Deploy on successful build
condition: >-
event.payload.event == "build_complete"
and event.payload.status == "success"
and event.payload.branch == "main"
action:
run_playbook:
name: playbooks/deploy-application.yml
extra_vars:
app_version: "{{ event.payload.version }}"
artifact_url: "{{ event.payload.artifact_url }}"
commit_sha: "{{ event.payload.commit }}"
- name: Rollback on failed deployment
condition: >-
event.payload.event == "health_check_failed"
and event.payload.consecutive_failures | int >= 3
action:
run_playbook:
name: playbooks/rollback-deployment.yml
extra_vars:
app_name: "{{ event.payload.application }}"
failed_version: "{{ event.payload.version }}"
See also: Ansible ServiceNow Integration: Automate ITSM Workflows and Change Management
Integration with AAP Controller
EDA integrates natively with Ansible Automation Platform (AAP) Controller:
rules:
- name: Trigger AAP job template
condition: event.payload.alert == "critical"
action:
run_job_template:
name: "Remediate Critical Alert"
organization: "Operations"
job_args:
extra_vars:
alert_data: "{{ event.payload }}"
EDA Controller Setup in AAP
Navigate to Automation Decisions → Rulebook Activations Create a new activation with your rulebook Select the appropriate Decision Environment (execution environment for EDA) Configure credentials for event sources Enable the activationConditions Reference
Comparison Operators
# Equality
condition: event.payload.status == "critical"
# Numeric comparison
condition: event.payload.value | int > 100
# String contains
condition: "'error' in event.payload.message"
# Regex match
condition: event.payload.host is match("web-.*")
# List membership
condition: event.payload.severity in ["critical", "high"]
# Multiple conditions (AND)
condition: >-
event.payload.status == "firing"
and event.payload.severity == "critical"
and event.payload.environment == "production"
# OR conditions (use any_of)
condition:
any:
- event.payload.alert == "disk_full"
- event.payload.alert == "disk_readonly"
Throttling and Deduplication
Prevent alert storms from triggering excessive automation:
rules:
- name: Throttled remediation
condition: event.payload.alert == "service_down"
throttle:
once_within: 5 minutes
group_by:
- event.payload.host
- event.payload.service
action:
run_playbook:
name: playbooks/remediate.yml
Best Practices
1. Always Use Throttling
Without throttling, a flapping alert can trigger hundreds of remediation attempts. Always set once_within for production rulebooks.
2. Log All Events
rules:
- name: Log all events for debugging
condition: "true"
action:
debug:
msg: "Received event: {{ event }}"
3. Use Decision Environments
Package your EDA dependencies (Python packages, Java, collections) into a Decision Environment for reproducible execution.
4. Test with Mock Events
# Send test event to webhook
curl -X POST http://localhost:5000/endpoint \
-H "Content-Type: application/json" \
-d '{"status": "firing", "alert": "service_down", "service": "nginx", "host": "web-01"}'
5. Implement Escalation
rules:
- name: Auto-remediate first occurrence
condition: >-
event.payload.alert == "service_down"
and event.payload.occurrence | int <= 3
action:
run_playbook:
name: playbooks/restart-service.yml
- name: Escalate repeated failures
condition: >-
event.payload.alert == "service_down"
and event.payload.occurrence | int > 3
action:
run_playbook:
name: playbooks/escalate-to-oncall.yml
Frequently Asked Questions
What is Ansible Event-Driven Automation (EDA)?
EDA is a framework that listens for events from external sources (monitoring systems, cloud providers, webhooks) and automatically triggers Ansible playbooks or AAP job templates when specific conditions are met. It enables real-time, reactive automation instead of scheduled or manual execution.
What events can trigger Ansible EDA?
Any event that can be delivered via webhooks, message queues (Kafka, AMQP), APIs, or file changes. Common sources include Prometheus/Alertmanager, Dynatrace, PagerDuty, AWS EventBridge, ServiceNow, GitHub webhooks, and custom applications.
How is EDA different from running playbooks on a cron schedule?
Cron-based automation runs at fixed intervals regardless of need. EDA responds to events in real-time — when an alert fires, when a file changes, or when a webhook is received. This means faster response times and no wasted execution cycles.
Can EDA integrate with Ansible Automation Platform?
Yes. EDA is a core component of AAP 2.4+. The EDA Controller provides a web UI for managing rulebook activations, viewing event logs, and integrating with AAP Controller job templates. EDA can trigger any job template in your AAP environment.
How do I prevent alert storms from overwhelming EDA?
Use the throttle directive with once_within to limit how often a rule can fire for the same event source. Group similar events with group_by to deduplicate by host, service, or other dimensions.
Related Articles
• Ansible Automation Platform 2.6 New Features • Ansible Monitoring Prometheus Grafana ELK • Ansible CI CD Pipeline Integration • Ansible Zero Trust Security AutomationConclusion
Event-Driven Automation transforms Ansible from a tool you run into a system that acts autonomously. By connecting event sources to rulebooks, you can auto-remediate service failures in seconds, respond to security incidents in real-time, and optimize cloud costs automatically. Start with simple webhook-based rules, add throttling for safety, and gradually expand to cover your most common operational scenarios. EDA is the bridge between monitoring and action. • Event-Driven Ansible: Automate IT Operations
Category: installation