AAP 2.6 Troubleshooting Guide: Common Issues and Solutions

By Luca Berton · Published 2024-01-01 · Category: installation

Troubleshoot common AAP 2.6 issues: job failures, connectivity problems, performance bottlenecks, database issues, mesh errors, EE problems, and upgrade failures. Diagnostic commands and solutions for every component.

Troubleshooting Approach

When something goes wrong in AAP, follow this diagnostic hierarchy: Check the job output — most issues are visible in stdout Check service health — are all components running? Check logs — system logs reveal infrastructure issues Check capacity — is the platform overloaded? Check connectivity — can components reach each other?

Service Health Checks

Quick Health Check

Containerized Deployment

RPM Deployment

Operator Deployment (OpenShift)

Job Failures

"Error creating pod" / EE Image Pull Failure

Causes and fixes: • EE image doesn't exist in registry → Push image to Hub • Registry authentication failed → Update Container Registry credential • Network issue → Check execution node can reach Hub • Image tag wrong → Verify exact image name and tag

"Host key verification failed"

Fix: Add host_key_checking = False in project's ansible.cfg or set the environment variable:

"Permission denied (publickey,password)"

Causes: • Wrong credential assigned to job template • SSH key doesn't match authorized_keys on target • User doesn't exist on target host • Password expired

"No hosts matched"

Causes: • Inventory doesn't contain the host group • Dynamic inventory sync failed • Host limit pattern wrong • Inventory not updated since last host change

Fix: Check inventory sync status, verify host patterns.

Timeout Errors

Causes: • become_password not set or wrong • Slow network to target host • Target host under heavy load • Become method incompatible

"Module not found"

Fix: Module is not in the Execution Environment. Either: Build a custom EE with the required collection Add the collection to the project's collections/requirements.yml

Database Issues

Connection Errors

Check:

Database Full / Out of Space

Fix: Configure cleanup jobs: • Administration → Management Jobs → Cleanup Job Details — purge old job records • Cleanup Activity Stream — purge old activity log entries • Set retention to 90-180 days for production

Slow Queries

Automation Mesh Issues

Execution Node Unreachable

Diagnostic steps:

Node Shows Zero Capacity

Common causes: • Execution node out of memory • Receptor not connected • Node disabled in settings

Jobs Stuck in "Waiting"

Performance Issues

Slow Job Execution

Diagnostic checklist: Check forks setting — increase for more parallelism Check execution node capacity — add more nodes Check network latency to managed hosts Check if gather_facts is needed (disable if not) Check for slow modules (use callback_whitelist = timer for task timing)

Platform Gateway Slow

High Memory Usage

Upgrade Issues

Pre-Upgrade Checklist

"Migration failed" During Upgrade

Services Won't Start After Upgrade

Useful CLI Commands

awx-manage (Controller)

receptorctl (Mesh)

FAQ

How do I enable debug logging?

For Controller: Set AWX_TASK_LOG_LEVEL=DEBUG in settings. For Gateway: Set GATEWAY_LOG_LEVEL=DEBUG. For Receptor: Set --log-level debug in receptor configuration. Remember to revert after debugging — debug logging is verbose and impacts performance.

How do I recover from a corrupted database?

Restore from the most recent backup. If no backup exists, check for PostgreSQL WAL files that might allow point-in-time recovery. As a last resort, the installer can rebuild the database (losing all data).

Why do jobs succeed in CLI but fail in Controller?

Common causes: different user context (Controller runs as awx), different Python environment (jobs run in EEs), different working directory, missing environment variables, or credential injection differences.

How do I reset the admin password?

Where are the log files?

| Component | Containerized | RPM | |-----------|---------------|-----| | Controller | podman logs automation-controller- | /var/log/tower/ | | Gateway | podman logs automation-gateway | /var/log/automation-gateway/ | | Hub | podman logs automation-hub- | /var/log/pulp/ | | EDA | podman logs automation-eda-* | /var/log/automation-eda/ | | Receptor | podman logs receptor | /var/log/receptor/ |

Conclusion

Effective troubleshooting in AAP 2.6 requires understanding the architecture — knowing which component owns which functionality and where to look for errors. Start with the job output, work through service health and logs, and use the diagnostic commands in this guide to pinpoint issues quickly.

Category: installation

Browse all Ansible tutorials · AnsiblePilot Home

AnsiblePilot — Master Ansible Automation

Popular Topics

About Luca Berton

AAP 2.6 Troubleshooting Guide: Common Issues and Solutions