Ansible Cost Optimization: Reduce Cloud Spend with Automation Complete Guide
By Luca Berton · Published 2024-01-01 · Category: containers-kubernetes
Complete guide to reducing cloud costs with Ansible automation. Schedule instance start/stop, right-size VMs, clean up unused resources, enforce tagging.
Cloud bills grow when no one's watching. Dev instances run 24/7, unused EBS volumes pile up, oversized VMs waste money daily. Ansible automates the boring cost control work — schedule on/off, clean up orphans, enforce tags, right-size resources. Here's how to cut 20-40% off your cloud spend.
Quick Wins: Scheduling and Cleanup
Schedule Dev/Test Instances (Save 65%)
Dev environments don't need to run nights and weekends. Stopping them saves ~65% immediately.
---
- name: Stop dev instances outside business hours
hosts: localhost
connection: local
vars:
region: us-east-1
tasks:
- name: Find running dev instances
amazon.aws.ec2_instance_info:
region: "{{ region }}"
filters:
"tag:Environment": "development"
"instance-state-name": "running"
register: dev_instances
- name: Stop dev instances
amazon.aws.ec2_instance:
instance_ids: "{{ dev_instances.instances | map(attribute='instance_id') | list }}"
region: "{{ region }}"
state: stopped
when: dev_instances.instances | length > 0
- name: Report savings
ansible.builtin.debug:
msg: "Stopped {{ dev_instances.instances | length }} dev instances"
---
- name: Start dev instances at business hours
hosts: localhost
connection: local
tasks:
- name: Find stopped dev instances
amazon.aws.ec2_instance_info:
region: "{{ region }}"
filters:
"tag:Environment": "development"
"tag:AutoStart": "true"
"instance-state-name": "stopped"
register: stopped_instances
- name: Start dev instances
amazon.aws.ec2_instance:
instance_ids: "{{ stopped_instances.instances | map(attribute='instance_id') | list }}"
region: "{{ region }}"
state: running
when: stopped_instances.instances | length > 0
Schedule with cron or AAP:
# Stop at 7 PM, start at 8 AM (weekdays only)
0 19 * * 1-5 ansible-playbook stop-dev.yml
0 8 * * 1-5 ansible-playbook start-dev.yml
Clean Up Orphaned Resources
---
- name: Clean up orphaned AWS resources
hosts: localhost
connection: local
vars:
region: us-east-1
dry_run: true # Set to false to actually delete
tasks:
# Unattached EBS volumes
- name: Find unattached EBS volumes
amazon.aws.ec2_vol_info:
region: "{{ region }}"
filters:
status: available
register: unattached_volumes
- name: Calculate wasted EBS cost
ansible.builtin.set_fact:
ebs_waste_gb: "{{ unattached_volumes.volumes | map(attribute='size') | sum }}"
- name: Report unattached volumes
ansible.builtin.debug:
msg: "Found {{ unattached_volumes.volumes | length }} unattached volumes ({{ ebs_waste_gb }} GB) — ~${{ (ebs_waste_gb | int * 0.10) | round(2) }}/month"
- name: Delete unattached volumes (older than 30 days)
amazon.aws.ec2_vol:
id: "{{ item.id }}"
region: "{{ region }}"
state: absent
loop: "{{ unattached_volumes.volumes }}"
when:
- not dry_run
- (ansible_date_time.epoch | int) - (item.create_time | to_datetime('%Y-%m-%dT%H:%M:%S') | int) > 2592000
# Unused Elastic IPs
- name: Find unused Elastic IPs
amazon.aws.ec2_eip_info:
region: "{{ region }}"
register: all_eips
- name: Identify unassociated EIPs
ansible.builtin.set_fact:
unused_eips: "{{ all_eips.addresses | selectattr('association_id', 'undefined') | list }}"
- name: Report unused EIPs
ansible.builtin.debug:
msg: "Found {{ unused_eips | length }} unused EIPs — ${{ (unused_eips | length * 3.65) | round(2) }}/month"
- name: Release unused EIPs
amazon.aws.ec2_eip:
public_ip: "{{ item.public_ip }}"
region: "{{ region }}"
state: absent
loop: "{{ unused_eips }}"
when: not dry_run
# Old snapshots
- name: Find old snapshots (>90 days)
amazon.aws.ec2_snapshot_info:
region: "{{ region }}"
filters:
"owner-id": "{{ aws_account_id }}"
register: all_snapshots
- name: Identify old snapshots
ansible.builtin.set_fact:
old_snapshots: >-
{{ all_snapshots.snapshots
| selectattr('start_time', 'lt', (ansible_date_time.epoch | int - 7776000) | string)
| list }}
- name: Report old snapshots
ansible.builtin.debug:
msg: "Found {{ old_snapshots | length }} snapshots older than 90 days"
# Summary
- name: Cost savings summary
ansible.builtin.debug:
msg: |
=== Cost Savings Opportunity ===
Unattached EBS: {{ unattached_volumes.volumes | length }} volumes ({{ ebs_waste_gb }} GB)
Unused EIPs: {{ unused_eips | length }}
Old snapshots: {{ old_snapshots | length }}
Estimated monthly savings: ${{ ((ebs_waste_gb | int * 0.10) + (unused_eips | length * 3.65)) | round(2) }}
See also: Automating Azure DevTest Labs Course by Luca Berton | Pluralsight
Right-Sizing
Identify Oversized Instances
---
- name: Right-sizing analysis
hosts: localhost
connection: local
tasks:
- name: Get CloudWatch CPU metrics (last 14 days)
amazon.aws.cloudwatch_metric_statistics:
namespace: AWS/EC2
metric_name: CPUUtilization
dimensions:
- name: InstanceId
value: "{{ item }}"
start_time: "{{ '%Y-%m-%dT%H:%M:%S' | strftime(ansible_date_time.epoch | int - 1209600) }}"
end_time: "{{ ansible_date_time.iso8601 }}"
period: 3600
statistics: ['Average', 'Maximum']
region: "{{ region }}"
register: cpu_metrics
loop: "{{ instance_ids }}"
- name: Identify underutilized instances
ansible.builtin.debug:
msg: >
Instance {{ item.item }}:
Avg CPU: {{ item.datapoints | map(attribute='average') | list | average | round(1) }}%,
Max CPU: {{ item.datapoints | map(attribute='maximum') | list | max | round(1) }}%
→ OVERSIZED (consider downsizing)
loop: "{{ cpu_metrics.results }}"
when:
- item.datapoints | length > 0
- (item.datapoints | map(attribute='maximum') | list | max) < 30
Automated Right-Sizing
- name: Downsize underutilized instances
hosts: localhost
connection: local
vars:
downsize_map:
t3.xlarge: t3.large
t3.large: t3.medium
m5.xlarge: m5.large
m5.large: t3.large
r5.xlarge: r5.large
tasks:
- name: Stop instance for resize
amazon.aws.ec2_instance:
instance_ids: ["{{ instance_id }}"]
region: "{{ region }}"
state: stopped
when: current_type in downsize_map
- name: Change instance type
amazon.aws.ec2_instance:
instance_ids: ["{{ instance_id }}"]
region: "{{ region }}"
instance_type: "{{ downsize_map[current_type] }}"
when: current_type in downsize_map
- name: Start instance
amazon.aws.ec2_instance:
instance_ids: ["{{ instance_id }}"]
region: "{{ region }}"
state: running
when: current_type in downsize_map
Tag Enforcement
Audit Missing Tags
---
- name: Enforce tagging policy
hosts: localhost
connection: local
vars:
required_tags:
- Environment
- Owner
- Project
- CostCenter
region: us-east-1
tasks:
- name: Get all EC2 instances
amazon.aws.ec2_instance_info:
region: "{{ region }}"
register: all_instances
- name: Find untagged instances
ansible.builtin.set_fact:
untagged: >-
{{ all_instances.instances
| rejectattr('state.name', 'equalto', 'terminated')
| selectattr('tags', 'undefined')
| list }}
- name: Find instances missing required tags
ansible.builtin.set_fact:
missing_tags: []
- name: Check each instance for required tags
ansible.builtin.set_fact:
missing_tags: >-
{{ missing_tags + [{
'id': item.instance_id,
'name': item.tags.Name | default('unnamed'),
'missing': required_tags | difference(item.tags.keys() | list)
}] }}
loop: "{{ all_instances.instances }}"
when:
- item.state.name != 'terminated'
- item.tags is defined
- required_tags | difference(item.tags.keys() | list) | length > 0
- name: Report compliance
ansible.builtin.debug:
msg: |
=== Tagging Compliance Report ===
Total instances: {{ all_instances.instances | rejectattr('state.name', 'equalto', 'terminated') | list | length }}
Fully tagged: {{ all_instances.instances | rejectattr('state.name', 'equalto', 'terminated') | list | length - missing_tags | length - untagged | length }}
Missing tags: {{ missing_tags | length }}
No tags at all: {{ untagged | length }}
- name: Auto-tag with defaults
amazon.aws.ec2_tag:
resource: "{{ item.id }}"
region: "{{ region }}"
tags:
Owner: "unknown"
Environment: "unknown"
CostCenter: "unallocated"
state: present
loop: "{{ untagged }}"
when: auto_tag | default(false)
See also: Ansible for Cloud Automation: AWS, Azure, and GCP Complete Guide
Multi-Cloud Cost Report
---
- name: Generate multi-cloud cost report
hosts: localhost
connection: local
tasks:
# AWS
- name: Get AWS cost (last 30 days)
ansible.builtin.command: >
aws ce get-cost-and-usage
--time-period Start={{ start_date }},End={{ end_date }}
--granularity MONTHLY
--metrics BlendedCost
--group-by Type=TAG,Key=Environment
register: aws_costs
# Azure
- name: Get Azure resource groups
azure.azcollection.azure_rm_resourcegroup_info:
register: azure_rgs
- name: Count Azure resources by group
ansible.builtin.debug:
msg: "{{ item.name }}: {{ item.tags | default({}) }}"
loop: "{{ azure_rgs.resourcegroups }}"
# Summary
- name: Generate cost report
ansible.builtin.template:
src: cost-report.md.j2
dest: "/reports/cost-report-{{ ansible_date_time.date }}.md"
delegate_to: localhost
Automated Savings Policies
Stop Idle Resources
- name: Stop instances with no SSH connections for 7 days
hosts: all
become: true
gather_facts: true
tasks:
- name: Check last SSH login
ansible.builtin.command: last -1 --time-format iso
register: last_login
changed_when: false
- name: Flag idle instances
ansible.builtin.set_fact:
is_idle: true
when: >
last_login.stdout == '' or
(ansible_date_time.epoch | int) - (last_login.stdout.split()[0] | to_datetime | int) > 604800
- name: Report idle instances
ansible.builtin.debug:
msg: "{{ inventory_hostname }} is idle (no login in 7+ days)"
when: is_idle | default(false)
Delete Old AMIs/Images
- name: Clean up old AMIs
hosts: localhost
connection: local
tasks:
- name: Find AMIs older than 90 days
amazon.aws.ec2_ami_info:
owners: self
region: "{{ region }}"
register: all_amis
- name: Deregister old AMIs
amazon.aws.ec2_ami:
image_id: "{{ item.image_id }}"
region: "{{ region }}"
state: absent
delete_snapshot: true
loop: "{{ all_amis.images }}"
when:
- not dry_run
- item.tags.Permanent is not defined
- (ansible_date_time.epoch | int) - (item.creation_date | to_datetime('%Y-%m-%dT%H:%M:%S') | int) > 7776000
See also: Learn Ansible: Complete Beginner's Guide & Learning Path (2026)
FAQ
How much can Ansible automation save on cloud costs?
Typical savings: 20-40% through scheduling (65% on dev instances), cleanup (5-10% from orphaned resources), and right-sizing (10-20% from oversized instances). The biggest single win is usually dev/test scheduling.
Should I use Ansible or a dedicated FinOps tool?
Use both. FinOps tools (CloudHealth, Spot.io, Kubecost) provide visibility and recommendations. Ansible executes the actual changes — stopping instances, resizing, deleting resources. Ansible is the action layer; FinOps tools are the intelligence layer.
How do I prevent teams from bypassing cost controls?
Combine Ansible enforcement with AWS SCPs (Service Control Policies), Azure Policy, or GCP Organization Policies. Ansible handles the automation; cloud-native policies prevent workarounds.
What about Reserved Instances and Savings Plans?
Ansible can generate utilization reports to identify RI/SP candidates. The actual purchase should be a human decision, but Ansible automates the data collection and analysis that informs it.
How often should I run cost optimization playbooks?
• Scheduling (start/stop): Daily via cron • Cleanup (orphaned resources): Weekly • Right-sizing analysis: Monthly • Tag compliance: Daily • Cost reports: WeeklyConclusion
Cloud cost optimization isn't a one-time project — it's a continuous process. Ansible automates the repetitive parts: scheduling dev environments, cleaning up orphaned resources, enforcing tagging policies, and generating cost reports. Start with instance scheduling (biggest immediate savings), add cleanup automation, then build toward continuous right-sizing. The playbooks pay for themselves in the first month.
Related Articles
• Ansible Cloud Automation: AWS, Azure, GCP • Ansible async and poll • Ansible for Kubernetes • Ansible Cron Module GuideCategory: containers-kubernetes