Ansible Cost Optimization: Reduce Cloud Spend with Automation Complete Guide

By Luca Berton · Published 2024-01-01 · Category: containers-kubernetes

Complete guide to reducing cloud costs with Ansible automation. Schedule instance start/stop, right-size VMs, clean up unused resources, enforce tagging.

Cloud bills grow when no one's watching. Dev instances run 24/7, unused EBS volumes pile up, oversized VMs waste money daily. Ansible automates the boring cost control work — schedule on/off, clean up orphans, enforce tags, right-size resources. Here's how to cut 20-40% off your cloud spend.

Quick Wins: Scheduling and Cleanup

Schedule Dev/Test Instances (Save 65%)

Dev environments don't need to run nights and weekends. Stopping them saves ~65% immediately.

---
- name: Stop dev instances outside business hours
  hosts: localhost
  connection: local
  vars:
    region: us-east-1
  tasks:
    - name: Find running dev instances
      amazon.aws.ec2_instance_info:
        region: "{{ region }}"
        filters:
          "tag:Environment": "development"
          "instance-state-name": "running"
      register: dev_instances

    - name: Stop dev instances
      amazon.aws.ec2_instance:
        instance_ids: "{{ dev_instances.instances | map(attribute='instance_id') | list }}"
        region: "{{ region }}"
        state: stopped
      when: dev_instances.instances | length > 0

    - name: Report savings
      ansible.builtin.debug:
        msg: "Stopped {{ dev_instances.instances | length }} dev instances"

---
- name: Start dev instances at business hours
  hosts: localhost
  connection: local
  tasks:
    - name: Find stopped dev instances
      amazon.aws.ec2_instance_info:
        region: "{{ region }}"
        filters:
          "tag:Environment": "development"
          "tag:AutoStart": "true"
          "instance-state-name": "stopped"
      register: stopped_instances

    - name: Start dev instances
      amazon.aws.ec2_instance:
        instance_ids: "{{ stopped_instances.instances | map(attribute='instance_id') | list }}"
        region: "{{ region }}"
        state: running
      when: stopped_instances.instances | length > 0

Schedule with cron or AAP:

# Stop at 7 PM, start at 8 AM (weekdays only)
0 19 * * 1-5 ansible-playbook stop-dev.yml
0 8  * * 1-5 ansible-playbook start-dev.yml

Clean Up Orphaned Resources

---
- name: Clean up orphaned AWS resources
  hosts: localhost
  connection: local
  vars:
    region: us-east-1
    dry_run: true    # Set to false to actually delete
  tasks:
    # Unattached EBS volumes
    - name: Find unattached EBS volumes
      amazon.aws.ec2_vol_info:
        region: "{{ region }}"
        filters:
          status: available
      register: unattached_volumes

    - name: Calculate wasted EBS cost
      ansible.builtin.set_fact:
        ebs_waste_gb: "{{ unattached_volumes.volumes | map(attribute='size') | sum }}"

    - name: Report unattached volumes
      ansible.builtin.debug:
        msg: "Found {{ unattached_volumes.volumes | length }} unattached volumes ({{ ebs_waste_gb }} GB) — ~${{ (ebs_waste_gb | int * 0.10) | round(2) }}/month"

    - name: Delete unattached volumes (older than 30 days)
      amazon.aws.ec2_vol:
        id: "{{ item.id }}"
        region: "{{ region }}"
        state: absent
      loop: "{{ unattached_volumes.volumes }}"
      when:
        - not dry_run
        - (ansible_date_time.epoch | int) - (item.create_time | to_datetime('%Y-%m-%dT%H:%M:%S') | int) > 2592000

    # Unused Elastic IPs
    - name: Find unused Elastic IPs
      amazon.aws.ec2_eip_info:
        region: "{{ region }}"
      register: all_eips

    - name: Identify unassociated EIPs
      ansible.builtin.set_fact:
        unused_eips: "{{ all_eips.addresses | selectattr('association_id', 'undefined') | list }}"

    - name: Report unused EIPs
      ansible.builtin.debug:
        msg: "Found {{ unused_eips | length }} unused EIPs — ${{ (unused_eips | length * 3.65) | round(2) }}/month"

    - name: Release unused EIPs
      amazon.aws.ec2_eip:
        public_ip: "{{ item.public_ip }}"
        region: "{{ region }}"
        state: absent
      loop: "{{ unused_eips }}"
      when: not dry_run

    # Old snapshots
    - name: Find old snapshots (>90 days)
      amazon.aws.ec2_snapshot_info:
        region: "{{ region }}"
        filters:
          "owner-id": "{{ aws_account_id }}"
      register: all_snapshots

    - name: Identify old snapshots
      ansible.builtin.set_fact:
        old_snapshots: >-
          {{ all_snapshots.snapshots
             | selectattr('start_time', 'lt', (ansible_date_time.epoch | int - 7776000) | string)
             | list }}

    - name: Report old snapshots
      ansible.builtin.debug:
        msg: "Found {{ old_snapshots | length }} snapshots older than 90 days"

    # Summary
    - name: Cost savings summary
      ansible.builtin.debug:
        msg: |
          === Cost Savings Opportunity ===
          Unattached EBS: {{ unattached_volumes.volumes | length }} volumes ({{ ebs_waste_gb }} GB)
          Unused EIPs: {{ unused_eips | length }}
          Old snapshots: {{ old_snapshots | length }}
          Estimated monthly savings: ${{ ((ebs_waste_gb | int * 0.10) + (unused_eips | length * 3.65)) | round(2) }}

Right-Sizing

Identify Oversized Instances

---
- name: Right-sizing analysis
  hosts: localhost
  connection: local
  tasks:
    - name: Get CloudWatch CPU metrics (last 14 days)
      amazon.aws.cloudwatch_metric_statistics:
        namespace: AWS/EC2
        metric_name: CPUUtilization
        dimensions:
          - name: InstanceId
            value: "{{ item }}"
        start_time: "{{ '%Y-%m-%dT%H:%M:%S' | strftime(ansible_date_time.epoch | int - 1209600) }}"
        end_time: "{{ ansible_date_time.iso8601 }}"
        period: 3600
        statistics: ['Average', 'Maximum']
        region: "{{ region }}"
      register: cpu_metrics
      loop: "{{ instance_ids }}"

    - name: Identify underutilized instances
      ansible.builtin.debug:
        msg: >
          Instance {{ item.item }}:
          Avg CPU: {{ item.datapoints | map(attribute='average') | list | average | round(1) }}%,
          Max CPU: {{ item.datapoints | map(attribute='maximum') | list | max | round(1) }}%
          → OVERSIZED (consider downsizing)
      loop: "{{ cpu_metrics.results }}"
      when:
        - item.datapoints | length > 0
        - (item.datapoints | map(attribute='maximum') | list | max) < 30

Automated Right-Sizing

- name: Downsize underutilized instances
  hosts: localhost
  connection: local
  vars:
    downsize_map:
      t3.xlarge: t3.large
      t3.large: t3.medium
      m5.xlarge: m5.large
      m5.large: t3.large
      r5.xlarge: r5.large
  tasks:
    - name: Stop instance for resize
      amazon.aws.ec2_instance:
        instance_ids: ["{{ instance_id }}"]
        region: "{{ region }}"
        state: stopped
      when: current_type in downsize_map

    - name: Change instance type
      amazon.aws.ec2_instance:
        instance_ids: ["{{ instance_id }}"]
        region: "{{ region }}"
        instance_type: "{{ downsize_map[current_type] }}"
      when: current_type in downsize_map

    - name: Start instance
      amazon.aws.ec2_instance:
        instance_ids: ["{{ instance_id }}"]
        region: "{{ region }}"
        state: running
      when: current_type in downsize_map

Tag Enforcement

Audit Missing Tags

---
- name: Enforce tagging policy
  hosts: localhost
  connection: local
  vars:
    required_tags:
      - Environment
      - Owner
      - Project
      - CostCenter
    region: us-east-1
  tasks:
    - name: Get all EC2 instances
      amazon.aws.ec2_instance_info:
        region: "{{ region }}"
      register: all_instances

    - name: Find untagged instances
      ansible.builtin.set_fact:
        untagged: >-
          {{ all_instances.instances
             | rejectattr('state.name', 'equalto', 'terminated')
             | selectattr('tags', 'undefined')
             | list }}

    - name: Find instances missing required tags
      ansible.builtin.set_fact:
        missing_tags: []

    - name: Check each instance for required tags
      ansible.builtin.set_fact:
        missing_tags: >-
          {{ missing_tags + [{
            'id': item.instance_id,
            'name': item.tags.Name | default('unnamed'),
            'missing': required_tags | difference(item.tags.keys() | list)
          }] }}
      loop: "{{ all_instances.instances }}"
      when:
        - item.state.name != 'terminated'
        - item.tags is defined
        - required_tags | difference(item.tags.keys() | list) | length > 0

    - name: Report compliance
      ansible.builtin.debug:
        msg: |
          === Tagging Compliance Report ===
          Total instances: {{ all_instances.instances | rejectattr('state.name', 'equalto', 'terminated') | list | length }}
          Fully tagged: {{ all_instances.instances | rejectattr('state.name', 'equalto', 'terminated') | list | length - missing_tags | length - untagged | length }}
          Missing tags: {{ missing_tags | length }}
          No tags at all: {{ untagged | length }}

    - name: Auto-tag with defaults
      amazon.aws.ec2_tag:
        resource: "{{ item.id }}"
        region: "{{ region }}"
        tags:
          Owner: "unknown"
          Environment: "unknown"
          CostCenter: "unallocated"
        state: present
      loop: "{{ untagged }}"
      when: auto_tag | default(false)

Multi-Cloud Cost Report

---
- name: Generate multi-cloud cost report
  hosts: localhost
  connection: local
  tasks:
    # AWS
    - name: Get AWS cost (last 30 days)
      ansible.builtin.command: >
        aws ce get-cost-and-usage
        --time-period Start={{ start_date }},End={{ end_date }}
        --granularity MONTHLY
        --metrics BlendedCost
        --group-by Type=TAG,Key=Environment
      register: aws_costs

    # Azure
    - name: Get Azure resource groups
      azure.azcollection.azure_rm_resourcegroup_info:
      register: azure_rgs

    - name: Count Azure resources by group
      ansible.builtin.debug:
        msg: "{{ item.name }}: {{ item.tags | default({}) }}"
      loop: "{{ azure_rgs.resourcegroups }}"

    # Summary
    - name: Generate cost report
      ansible.builtin.template:
        src: cost-report.md.j2
        dest: "/reports/cost-report-{{ ansible_date_time.date }}.md"
      delegate_to: localhost

Automated Savings Policies

Stop Idle Resources

- name: Stop instances with no SSH connections for 7 days
  hosts: all
  become: true
  gather_facts: true
  tasks:
    - name: Check last SSH login
      ansible.builtin.command: last -1 --time-format iso
      register: last_login
      changed_when: false

    - name: Flag idle instances
      ansible.builtin.set_fact:
        is_idle: true
      when: >
        last_login.stdout == '' or
        (ansible_date_time.epoch | int) - (last_login.stdout.split()[0] | to_datetime | int) > 604800

    - name: Report idle instances
      ansible.builtin.debug:
        msg: "{{ inventory_hostname }} is idle (no login in 7+ days)"
      when: is_idle | default(false)

Delete Old AMIs/Images

- name: Clean up old AMIs
  hosts: localhost
  connection: local
  tasks:
    - name: Find AMIs older than 90 days
      amazon.aws.ec2_ami_info:
        owners: self
        region: "{{ region }}"
      register: all_amis

    - name: Deregister old AMIs
      amazon.aws.ec2_ami:
        image_id: "{{ item.image_id }}"
        region: "{{ region }}"
        state: absent
        delete_snapshot: true
      loop: "{{ all_amis.images }}"
      when:
        - not dry_run
        - item.tags.Permanent is not defined
        - (ansible_date_time.epoch | int) - (item.creation_date | to_datetime('%Y-%m-%dT%H:%M:%S') | int) > 7776000

FAQ

How much can Ansible automation save on cloud costs?

Typical savings: 20-40% through scheduling (65% on dev instances), cleanup (5-10% from orphaned resources), and right-sizing (10-20% from oversized instances). The biggest single win is usually dev/test scheduling.

Should I use Ansible or a dedicated FinOps tool?

Use both. FinOps tools (CloudHealth, Spot.io, Kubecost) provide visibility and recommendations. Ansible executes the actual changes — stopping instances, resizing, deleting resources. Ansible is the action layer; FinOps tools are the intelligence layer.

How do I prevent teams from bypassing cost controls?

Combine Ansible enforcement with AWS SCPs (Service Control Policies), Azure Policy, or GCP Organization Policies. Ansible handles the automation; cloud-native policies prevent workarounds.

What about Reserved Instances and Savings Plans?

Ansible can generate utilization reports to identify RI/SP candidates. The actual purchase should be a human decision, but Ansible automates the data collection and analysis that informs it.

How often should I run cost optimization playbooks?

Scheduling (start/stop): Daily via cron
Cleanup (orphaned resources): Weekly
Right-sizing analysis: Monthly
Tag compliance: Daily
Cost reports: Weekly

Conclusion

Cloud cost optimization isn't a one-time project — it's a continuous process. Ansible automates the repetitive parts: scheduling dev environments, cleaning up orphaned resources, enforcing tagging policies, and generating cost reports. Start with instance scheduling (biggest immediate savings), add cleanup automation, then build toward continuous right-sizing. The playbooks pay for themselves in the first month.

Category: containers-kubernetes

Browse all Ansible tutorials · AnsiblePilot Home

AnsiblePilot — Master Ansible Automation

Popular Topics

About Luca Berton