From f7bc2d75df4575ff11448d87338df4e575b3fc7a Mon Sep 17 00:00:00 2001 From: nahmad23 <105396355+nahmad23@users.noreply.github.com> Date: Mon, 20 Apr 2026 14:41:32 +0530 Subject: [PATCH 1/4] Add Ansible Tower open-source alternatives analysis for production Comprehensive comparison of AWX, Semaphore UI, Rundeck, StackStorm, and Foreman as zero-cost replacements for Ansible Tower. Includes security scoring, production readiness ratings, deployment architecture, and hardware requirements. https://claude.ai/code/session_012amH3WuqHRb4nzYGPZTYt1 --- ANSIBLE_TOWER_ALTERNATIVES_ANALYSIS.md | 350 +++++++++++++++++++++++++ 1 file changed, 350 insertions(+) create mode 100644 ANSIBLE_TOWER_ALTERNATIVES_ANALYSIS.md diff --git a/ANSIBLE_TOWER_ALTERNATIVES_ANALYSIS.md b/ANSIBLE_TOWER_ALTERNATIVES_ANALYSIS.md new file mode 100644 index 0000000000..07530c4b6b --- /dev/null +++ b/ANSIBLE_TOWER_ALTERNATIVES_ANALYSIS.md @@ -0,0 +1,350 @@ +# Ansible Tower vs Open-Source Alternatives: Production Infrastructure Analysis + +## 1. What is Ansible Tower (Now "Red Hat Ansible Automation Platform")? + +Ansible Tower is Red Hat's **commercial, licensed** web-based UI and REST API for Ansible. It provides: + +- Visual dashboard for job status and inventory +- Role-based access control (RBAC) +- Job scheduling and workflow orchestration +- Centralized logging and auditing +- Credential management +- Multi-tenant support +- REST API for integration + +### Licensing Cost (Why We Need Alternatives) + +| Tier | Cost (Approx.) | +|------|----------------| +| Standard | ~$13,000/year (up to 100 nodes) | +| Premium | ~$17,500/year (up to 100 nodes) | +| Enterprise (unlimited) | Custom pricing ($50,000+) | + +**Verdict:** Too expensive for teams without budget for licensing. + +--- + +## 2. Open-Source Alternatives Comparison + +### 2.1 AWX (Ansible AWX) + +**What:** The upstream open-source project behind Ansible Tower. Essentially Tower without the Red Hat support/license. + +| Category | Details | +|----------|--------| +| License | Apache 2.0 (Fully Free) | +| Maintained By | Red Hat / Community | +| Deployment | Kubernetes (via AWX Operator), Docker | +| UI | Full web dashboard (identical to Tower) | +| RBAC | Yes (full role-based access control) | +| API | Full REST API | +| Job Scheduling | Yes | +| Credential Vault | Yes (integrates with HashiCorp Vault, CyberArk) | +| Notifications | Slack, Email, Webhook, PagerDuty | +| Audit/Logging | Full job logging with centralized history | +| Scalability | Horizontal (Kubernetes-native) | +| Community | Very active (GitHub: 14k+ stars) | + +**Security Level: HIGH** +- Built-in credential encryption (AES-256) +- RBAC with granular permissions +- LDAP/SAML/OAuth2 authentication +- Audit trail for all operations +- Secret management integration +- Container-isolated job execution + +**Production Readiness: 8.5/10** +- Same codebase as Tower +- No commercial SLA or support (community only) +- Frequent releases (can be unstable between versions) + +--- + +### 2.2 Semaphore UI + +**What:** A lightweight, modern open-source alternative to Ansible Tower/AWX. + +| Category | Details | +|----------|--------| +| License | MIT (Fully Free) | +| Maintained By | Community | +| Deployment | Single binary, Docker, or package manager | +| UI | Clean, modern web dashboard | +| RBAC | Yes (team-based access) | +| API | REST API | +| Job Scheduling | Yes (cron-based) | +| Credential Vault | Built-in key store | +| Notifications | Slack, Email, Telegram, Microsoft Teams | +| Audit/Logging | Task execution history | +| Scalability | Vertical (single instance) | +| Community | Active (GitHub: 10k+ stars) | + +**Security Level: MEDIUM-HIGH** +- Encrypted credential storage +- Team-based access control +- LDAP authentication support +- Audit logging +- No native secret manager integration (limited vs AWX) +- Smaller attack surface (simpler architecture) + +**Production Readiness: 7/10** +- Very lightweight and easy to maintain +- Less feature-rich than AWX +- Excellent for small-to-medium infrastructure +- Limited horizontal scaling + +--- + +### 2.3 Rundeck (PagerDuty Process Automation - Community Edition) + +**What:** A general-purpose operations automation platform (not Ansible-specific but supports Ansible as a plugin). + +| Category | Details | +|----------|--------| +| License | Apache 2.0 (Community Edition is Free) | +| Maintained By | PagerDuty / Community | +| Deployment | Java application (WAR/Docker/RPM/DEB) | +| UI | Full web dashboard | +| RBAC | Yes (ACL policies) | +| API | Full REST API | +| Job Scheduling | Yes (cron + event-based triggers) | +| Credential Vault | Yes (KeyStorage with plugins for Vault, AWS SSM, Thycotic) | +| Notifications | Email, Slack, Webhook, PagerDuty, custom plugins | +| Audit/Logging | Full audit trail with centralized logging | +| Scalability | Horizontal (clustering in enterprise) | +| Community | Mature (GitHub: 5k+ stars, 10+ years) | + +**Security Level: HIGH** +- ACL-based fine-grained access control +- Key storage with encryption +- LDAP/Active Directory/OAuth integration +- Full audit trail +- Plugin-based secret management +- SSH key management +- Node filtering with security context + +**Production Readiness: 8/10** +- Very mature and battle-tested +- Supports Ansible, Terraform, scripts, and more +- Enterprise features available in community edition +- Requires Java (JVM overhead) + +--- + +### 2.4 StackStorm (ST2) + +**What:** An event-driven automation platform with powerful workflow orchestration. + +| Category | Details | +|----------|--------| +| License | Apache 2.0 (Fully Free) | +| Maintained By | StackStorm / Community | +| Deployment | Docker, Kubernetes, packages | +| UI | Web UI (st2web) | +| RBAC | Yes (with enterprise features in open-source) | +| API | Full REST API | +| Job Scheduling | Yes + Event-driven triggers | +| Credential Vault | Datastore with encryption | +| Notifications | Slack, Email, Webhook, ChatOps (native) | +| Audit/Logging | Full execution history and audit | +| Scalability | Horizontal (microservices architecture) | +| Community | Active (GitHub: 6k+ stars) | + +**Security Level: HIGH** +- RBAC with fine-grained permissions +- Encrypted datastore for secrets +- LDAP/PAM authentication +- API key + token-based auth +- Audit trail +- HTTPS enforcement +- HashiCorp Vault integration + +**Production Readiness: 7.5/10** +- Excellent for event-driven automation +- More complex to set up than AWX +- Strong ChatOps integration +- Steeper learning curve + +--- + +### 2.5 Foreman + Ansible Plugin + +**What:** A complete lifecycle management tool with native Ansible integration. + +| Category | Details | +|----------|--------| +| License | GPL v3 (Fully Free) | +| Maintained By | Red Hat / Community | +| Deployment | RPM/DEB packages, containerized | +| UI | Full web dashboard | +| RBAC | Yes (roles and permissions) | +| API | Full REST API | +| Job Scheduling | Yes (via Remote Execution plugin) | +| Credential Vault | Smart Proxy-based | +| Notifications | Email, Webhook | +| Audit/Logging | Full audit trail | +| Scalability | Horizontal (Smart Proxies) | +| Community | Mature (10+ years) | + +**Security Level: HIGH** +- Fine-grained RBAC +- LDAP/IPA/AD authentication +- Full audit logging +- SSL/TLS everywhere +- Smart Proxy for distributed security +- Puppet CA integration + +**Production Readiness: 8/10** +- Very mature for provisioning + configuration management +- Heavier to install and maintain +- Best when you also need provisioning/lifecycle management + +--- + +## 3. Security Comparison Matrix + +| Feature | AWX | Semaphore | Rundeck | StackStorm | Foreman | +|---------|-----|-----------|---------|------------|--------| +| Credential Encryption | AES-256 | Yes | Yes | Yes | Yes | +| RBAC | Full | Team-based | ACL-based | Full | Full | +| LDAP/AD Auth | Yes | Yes | Yes | Yes | Yes | +| SAML/SSO | Yes | No | Enterprise | No | Yes | +| OAuth2 | Yes | No | Yes | No | Yes | +| Audit Trail | Full | Basic | Full | Full | Full | +| Vault Integration | Yes | No | Yes (plugins) | Yes | Limited | +| API Auth (Token) | Yes | Yes | Yes | Yes | Yes | +| Container Isolation | Yes | No | Plugin | Yes | No | +| CVE History | Low | Very Low | Low | Low | Medium | +| Compliance Ready | Yes | Limited | Yes | Yes | Yes | + +### Security Rating Summary + +| Tool | Security Score | Notes | +|------|---------------|-------| +| **AWX** | 9/10 | Enterprise-grade security, same as Tower | +| **Rundeck** | 8.5/10 | Mature security model, ACL-based | +| **StackStorm** | 8/10 | Strong but complex to configure | +| **Foreman** | 8/10 | Excellent with IPA/LDAP integration | +| **Semaphore** | 7/10 | Good for size, lacks advanced SSO | + +--- + +## 4. Final Recommendation for Production + +### PRIMARY RECOMMENDATION: AWX + +**Why AWX is the best choice for your production infrastructure:** + +1. **Identical to Ansible Tower** - Same codebase, same features, zero migration cost if you ever move to Tower +2. **Highest security** - AES-256 encryption, full RBAC, SAML/SSO, Vault integration +3. **Kubernetes-native** - Deploy via AWX Operator on K8s/OpenShift (aligns with your DO180 containerization path) +4. **Full REST API** - Integrate with CI/CD pipelines, monitoring, and other tools +5. **Active community** - Rapid bug fixes and security patches +6. **Credential management** - Secure handling of SSH keys, cloud credentials, vault passwords +7. **Job isolation** - Each playbook runs in an isolated container +8. **Scalable** - Horizontal scaling on Kubernetes + +### SECONDARY RECOMMENDATION: Semaphore UI + +**When to choose Semaphore instead:** + +- Small team (< 10 people) +- Fewer than 50 managed nodes +- Need simplicity over features +- Limited Kubernetes expertise +- Want minimal maintenance overhead + +### TERTIARY RECOMMENDATION: Rundeck + +**When to choose Rundeck instead:** + +- Multi-tool environment (not just Ansible, also Terraform, scripts, etc.) +- Need mature, battle-tested solution +- Java/JVM is acceptable in your stack +- Need fine-grained ACL policies + +--- + +## 5. Recommended AWX Deployment Architecture (Production) + +``` + +-------------------+ + | Load Balancer | + | (HAProxy/Nginx) | + +--------+----------+ + | + +--------------+--------------+ + | | + +--------v--------+ +---------v-------+ + | AWX Web Pod | | AWX Web Pod | + | (Django + UI) | | (Django + UI) | + +--------+---------+ +--------+--------+ + | | + +--------v-----------------------------v--------+ + | AWX Task Pods | + | (Ansible Job Runners) | + | [Container-isolated execution] | + +------------------------+----------------------+ + | + +---------------+---------------+ + | | | + +--------v---+ +--------v---+ +--------v---+ + | PostgreSQL | | Redis | | Receptor | + | (Database) | | (Cache/ | | (Mesh for | + | | | Queue) | | remote) | + +-------------+ +------------+ +------------+ +``` + +### Minimum Hardware Requirements (Production) + +| Component | CPU | RAM | Storage | +|-----------|-----|-----|--------| +| AWX Web (x2) | 2 vCPU | 4 GB | 20 GB | +| AWX Task (x2) | 4 vCPU | 8 GB | 40 GB | +| PostgreSQL | 2 vCPU | 4 GB | 100 GB | +| Redis | 1 vCPU | 2 GB | 10 GB | + +--- + +## 6. Quick Start - AWX on Kubernetes + +```bash +# 1. Install AWX Operator +kubectl apply -f https://raw.githubusercontent.com/ansible/awx-operator/main/deploy/awx-operator.yaml + +# 2. Create AWX instance +cat < Date: Mon, 20 Apr 2026 21:10:10 +0530 Subject: [PATCH 2/4] Add AWX production deployment manifests and guide for 150 servers Includes: - Kubernetes manifests (namespace, operator, AWX instance, TLS, backups) - Inventory template for 150 servers organized by role - Automated deployment scripts (K3s install, AWX deploy, health check) - Full deployment guide with architecture, RBAC, LDAP, and troubleshooting https://claude.ai/code/session_012amH3WuqHRb4nzYGPZTYt1 --- awx-deployment/docs/DEPLOYMENT_GUIDE.md | 255 ++++++++++++++++++ awx-deployment/inventory/group_vars/all.yaml | 40 +++ .../inventory/production-inventory.yaml | 73 +++++ awx-deployment/manifests/00-namespace.yaml | 7 + awx-deployment/manifests/01-awx-operator.yaml | 52 ++++ awx-deployment/manifests/02-postgres-pvc.yaml | 15 ++ .../manifests/03-awx-production.yaml | 81 ++++++ awx-deployment/manifests/04-ingress-tls.yaml | 35 +++ .../manifests/05-backup-cronjob.yaml | 63 +++++ awx-deployment/scripts/deploy-awx.sh | 80 ++++++ awx-deployment/scripts/health-check.sh | 63 +++++ awx-deployment/scripts/install-k3s.sh | 48 ++++ 12 files changed, 812 insertions(+) create mode 100644 awx-deployment/docs/DEPLOYMENT_GUIDE.md create mode 100644 awx-deployment/inventory/group_vars/all.yaml create mode 100644 awx-deployment/inventory/production-inventory.yaml create mode 100644 awx-deployment/manifests/00-namespace.yaml create mode 100644 awx-deployment/manifests/01-awx-operator.yaml create mode 100644 awx-deployment/manifests/02-postgres-pvc.yaml create mode 100644 awx-deployment/manifests/03-awx-production.yaml create mode 100644 awx-deployment/manifests/04-ingress-tls.yaml create mode 100644 awx-deployment/manifests/05-backup-cronjob.yaml create mode 100644 awx-deployment/scripts/deploy-awx.sh create mode 100644 awx-deployment/scripts/health-check.sh create mode 100644 awx-deployment/scripts/install-k3s.sh diff --git a/awx-deployment/docs/DEPLOYMENT_GUIDE.md b/awx-deployment/docs/DEPLOYMENT_GUIDE.md new file mode 100644 index 0000000000..82d28f1f3e --- /dev/null +++ b/awx-deployment/docs/DEPLOYMENT_GUIDE.md @@ -0,0 +1,255 @@ +# AWX Production Deployment Guide — 150 Servers + +## Overview + +This guide deploys AWX (open-source Ansible Tower) to manage your 150-server infrastructure. + +**Savings:** ~$14,000/year vs Ansible Tower licensing. + +--- + +## Architecture + +``` + +-------------------+ + | Load Balancer | + | (Nginx) | + +--------+----------+ + | + +--------------+--------------+ + | | + +--------v--------+ +---------v-------+ + | AWX Web (x2) | | AWX Web (x2) | + | 2vCPU / 4GB | | 2vCPU / 4GB | + +--------+---------+ +--------+--------+ + | | + +--------v-----------------------------v--------+ + | AWX Task Runners (x2) | + | 4vCPU / 8GB each | + | [Runs playbooks on 150 servers] | + +------------------------+----------------------+ + | + +---------------+---------------+ + | | | + +--------v---+ +--------v---+ +--------v---+ + | PostgreSQL | | Redis | | Receptor | + | 4GB/100GB | | 2GB/10GB | | (mesh) | + +-------------+ +------------+ +------------+ + | + +--------------------+--------------------+ + | | | | | + [30 web] [50 app] [15 db] [25 worker] [30 other] +``` + +## Hardware Requirements + +| Component | CPU | RAM | Storage | Monthly Cost | +|-----------|-----|-----|---------|-------------| +| AWX Web x2 | 2 vCPU | 4 GB | 20 GB | ~$60 | +| AWX Task x2 | 4 vCPU | 8 GB | 40 GB | ~$120 | +| PostgreSQL | 2 vCPU | 4 GB | 100 GB | ~$40 | +| Redis | 1 vCPU | 2 GB | 10 GB | ~$15 | +| **Total** | | | | **~$235/mo** | + +--- + +## Deployment Steps + +### Prerequisites + +- Linux server (Ubuntu 22.04 / RHEL 9 recommended) +- Minimum 16 GB RAM, 4 vCPU on the AWX host +- Network access to all 150 managed servers (SSH port 22) +- DNS record pointing to AWX server (e.g., awx.yourdomain.com) +- TLS certificate for HTTPS + +### Step 1: Install Kubernetes (K3s) + +```bash +chmod +x scripts/install-k3s.sh +sudo ./scripts/install-k3s.sh +``` + +### Step 2: Deploy AWX + +```bash +chmod +x scripts/deploy-awx.sh +./scripts/deploy-awx.sh +``` + +### Step 3: Configure TLS + +1. Edit `manifests/04-ingress-tls.yaml` +2. Replace `REPLACE_WITH_BASE64_ENCODED_CERT` with your cert: + ```bash + cat your-cert.crt | base64 -w0 + cat your-key.key | base64 -w0 + ``` +3. Apply: `kubectl apply -f manifests/04-ingress-tls.yaml` + +### Step 4: Login to AWX + +1. Open https://awx.yourdomain.com +2. Username: `admin` +3. Password: Get with: + ```bash + kubectl -n awx get secret awx-production-admin-password \ + -o jsonpath="{.data.password}" | base64 --decode + ``` + +--- + +## Post-Deployment Configuration + +### 1. Add Your 150 Servers + +In AWX UI: +1. **Inventories** → Add → Name: "Production-150" +2. **Sources** → Add → Source: "Sourced from a Project" +3. Point to `inventory/production-inventory.yaml` + +Or import via CLI: +```bash +awx-manage inventory_import --source=inventory/production-inventory.yaml \ + --inventory-name="Production-150" +``` + +### 2. Configure LDAP Authentication + +In AWX UI → Settings → Authentication → LDAP: + +``` +LDAP Server URI: ldaps://ldap.yourdomain.com:636 +LDAP Bind DN: cn=awx-service,ou=services,dc=yourdomain,dc=com +LDAP User Search: ou=users,dc=yourdomain,dc=com +LDAP Group Search: ou=groups,dc=yourdomain,dc=com +``` + +### 3. Set Up Credentials + +Create these credential types in AWX: +- **Machine** — SSH keys for your 150 servers +- **Source Control** — Git access for playbook repos +- **Vault** — HashiCorp Vault token (if using) +- **Cloud** — AWS/Azure/GCP (if applicable) + +### 4. Organize with RBAC + +Suggested roles: +| Role | Access | +|------|--------| +| Admin | Full access to all resources | +| Ops Engineer | Execute jobs, view all inventories | +| Developer | Execute jobs on dev/staging only | +| Viewer | Read-only dashboard access | + +### 5. Set Up Notifications + +Configure in AWX → Notifications: +- **Slack** — Job failures and critical alerts +- **Email** — Daily job summary reports +- **Webhook** — Integration with monitoring (PagerDuty, OpsGenie) + +--- + +## Server Grouping Strategy (150 Servers) + +| Group | Count | Purpose | +|-------|-------|--------| +| webservers | 30 | Nginx/Apache front-end | +| appservers | 50 | Application backends | +| databases | 15 | PostgreSQL/MySQL | +| cache_servers | 10 | Redis/Memcached | +| monitoring | 5 | Prometheus/Grafana | +| loadbalancers | 5 | HAProxy/Nginx LB | +| workers | 25 | Background job processors | +| storage | 10 | NFS/object storage | +| **Total** | **150** | | + +--- + +## Security Hardening + +### AWX Platform Security + +- [x] HTTPS enforced (TLS 1.2+) +- [x] Session cookies marked Secure + HttpOnly +- [x] CSRF protection enabled +- [x] Session timeout: 30 minutes +- [x] Max 3 sessions per user +- [ ] LDAP/SSO authentication (configure post-deploy) +- [ ] Network policies applied (manifests/04-ingress-tls.yaml) +- [ ] Backup CronJob active (manifests/05-backup-cronjob.yaml) + +### Managed Server Security + +- Use SSH keys only (no passwords) +- Dedicated `deploy` user with sudo on all 150 servers +- Firewall rules: only allow SSH from AWX IP +- Regular credential rotation via AWX Credential Type + +--- + +## Maintenance + +### Daily Backup (Automated) + +The `05-backup-cronjob.yaml` runs daily at 2:00 AM: +- PostgreSQL full dump +- Retains 7 days of backups +- Storage: 50 GB PVC + +### Health Checks + +```bash +chmod +x scripts/health-check.sh +./scripts/health-check.sh +``` + +### Upgrading AWX + +```bash +# Update operator image tag in 01-awx-operator.yaml +kubectl -n awx set image deployment/awx-operator-controller-manager \ + awx-manager=quay.io/ansible/awx-operator:NEW_VERSION + +# AWX instance upgrades automatically after operator update +``` + +--- + +## Troubleshooting + +| Issue | Command | +|-------|--------| +| Pods not starting | `kubectl -n awx describe pods` | +| Check AWX logs | `kubectl -n awx logs -l app.kubernetes.io/name=awx-production -f` | +| Database connection | `kubectl -n awx exec -it deploy/awx-production-postgres -- psql` | +| Restart AWX | `kubectl -n awx rollout restart deployment awx-production-web` | +| Check disk usage | `kubectl -n awx exec -it deploy/awx-production-postgres -- df -h` | +| Reset admin password | `kubectl -n awx exec -it deploy/awx-production-web -- awx-manage changepassword admin` | + +--- + +## File Structure + +``` +awx-deployment/ +├── docs/ +│ └── DEPLOYMENT_GUIDE.md # This file +├── manifests/ +│ ├── 00-namespace.yaml # AWX namespace +│ ├── 01-awx-operator.yaml # AWX Operator deployment +│ ├── 02-postgres-pvc.yaml # Database storage +│ ├── 03-awx-production.yaml # Main AWX instance (150-server config) +│ ├── 04-ingress-tls.yaml # TLS + Network Policy +│ └── 05-backup-cronjob.yaml # Automated daily backups +├── inventory/ +│ ├── production-inventory.yaml # 150 server inventory template +│ └── group_vars/ +│ └── all.yaml # Global variables +└── scripts/ + ├── install-k3s.sh # Kubernetes installation + ├── deploy-awx.sh # One-command AWX deployment + └── health-check.sh # Production health monitoring +``` diff --git a/awx-deployment/inventory/group_vars/all.yaml b/awx-deployment/inventory/group_vars/all.yaml new file mode 100644 index 0000000000..e843db7b36 --- /dev/null +++ b/awx-deployment/inventory/group_vars/all.yaml @@ -0,0 +1,40 @@ +--- +# Global variables for all 150 servers + +# NTP Configuration +ntp_servers: + - 0.pool.ntp.org + - 1.pool.ntp.org + +# DNS Configuration +dns_servers: + - 8.8.8.8 + - 8.8.4.4 + +# Security Baseline +security_ssh_permit_root_login: "no" +security_ssh_password_authentication: "no" +security_ssh_port: 22 +security_fail2ban_enabled: true + +# Monitoring Agent +monitoring_agent: node_exporter +monitoring_agent_port: 9100 + +# Log Forwarding +syslog_server: mon01.yourdomain.com +syslog_port: 514 + +# Common Packages +common_packages: + - vim + - htop + - curl + - wget + - net-tools + - python3 + - chrony + +# Firewall +firewall_enabled: true +firewall_default_policy: deny diff --git a/awx-deployment/inventory/production-inventory.yaml b/awx-deployment/inventory/production-inventory.yaml new file mode 100644 index 0000000000..43eb16188a --- /dev/null +++ b/awx-deployment/inventory/production-inventory.yaml @@ -0,0 +1,73 @@ +all: + vars: + ansible_user: deploy + ansible_ssh_private_key_file: /var/lib/awx/credentials/production_key + ansible_become: true + ansible_become_method: sudo + + children: + webservers: + hosts: + web[01:30].yourdomain.com: + vars: + http_port: 80 + https_port: 443 + + appservers: + hosts: + app[01:50].yourdomain.com: + vars: + app_port: 8080 + + databases: + hosts: + db[01:15].yourdomain.com: + vars: + db_port: 5432 + backup_enabled: true + + cache_servers: + hosts: + cache[01:10].yourdomain.com: + vars: + redis_port: 6379 + + monitoring: + hosts: + mon[01:05].yourdomain.com: + vars: + prometheus_port: 9090 + grafana_port: 3000 + + loadbalancers: + hosts: + lb[01:05].yourdomain.com: + vars: + haproxy_stats_port: 8404 + + workers: + hosts: + worker[01:25].yourdomain.com: + vars: + worker_concurrency: 4 + + storage: + hosts: + storage[01:10].yourdomain.com: + vars: + nfs_export_path: /data/shared + + # Logical groupings + production: + children: + webservers: + appservers: + databases: + cache_servers: + loadbalancers: + + infrastructure: + children: + monitoring: + workers: + storage: diff --git a/awx-deployment/manifests/00-namespace.yaml b/awx-deployment/manifests/00-namespace.yaml new file mode 100644 index 0000000000..13683e45a6 --- /dev/null +++ b/awx-deployment/manifests/00-namespace.yaml @@ -0,0 +1,7 @@ +apiVersion: v1 +kind: Namespace +metadata: + name: awx + labels: + app.kubernetes.io/name: awx + app.kubernetes.io/part-of: awx-production diff --git a/awx-deployment/manifests/01-awx-operator.yaml b/awx-deployment/manifests/01-awx-operator.yaml new file mode 100644 index 0000000000..f7214489f5 --- /dev/null +++ b/awx-deployment/manifests/01-awx-operator.yaml @@ -0,0 +1,52 @@ +apiVersion: v1 +kind: ServiceAccount +metadata: + name: awx-operator-controller-manager + namespace: awx +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: awx-operator-controller-manager-rolebinding +subjects: + - kind: ServiceAccount + name: awx-operator-controller-manager + namespace: awx +roleRef: + kind: ClusterRole + name: cluster-admin + apiGroup: rbac.authorization.k8s.io +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: awx-operator-controller-manager + namespace: awx + labels: + control-plane: controller-manager +spec: + replicas: 1 + selector: + matchLabels: + control-plane: controller-manager + template: + metadata: + labels: + control-plane: controller-manager + spec: + serviceAccountName: awx-operator-controller-manager + containers: + - name: awx-manager + image: quay.io/ansible/awx-operator:latest + args: + - --leader-elect + env: + - name: ANSIBLE_GATHERING + value: explicit + resources: + limits: + cpu: 500m + memory: 512Mi + requests: + cpu: 100m + memory: 256Mi diff --git a/awx-deployment/manifests/02-postgres-pvc.yaml b/awx-deployment/manifests/02-postgres-pvc.yaml new file mode 100644 index 0000000000..d15eb6acd9 --- /dev/null +++ b/awx-deployment/manifests/02-postgres-pvc.yaml @@ -0,0 +1,15 @@ +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: awx-postgres-pvc + namespace: awx + labels: + app.kubernetes.io/name: awx-postgres + app.kubernetes.io/part-of: awx-production +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 100Gi + storageClassName: standard diff --git a/awx-deployment/manifests/03-awx-production.yaml b/awx-deployment/manifests/03-awx-production.yaml new file mode 100644 index 0000000000..6c8c7887d4 --- /dev/null +++ b/awx-deployment/manifests/03-awx-production.yaml @@ -0,0 +1,81 @@ +apiVersion: awx.ansible.com/v1beta1 +kind: AWX +metadata: + name: awx-production + namespace: awx +spec: + # Ingress Configuration + service_type: ClusterIP + ingress_type: Ingress + ingress_annotations: | + nginx.ingress.kubernetes.io/proxy-body-size: "50m" + nginx.ingress.kubernetes.io/proxy-read-timeout: "600" + nginx.ingress.kubernetes.io/proxy-send-timeout: "600" + hostname: awx.yourdomain.com + ingress_tls_secret: awx-tls-secret + + # PostgreSQL Configuration + postgres_storage_class: standard + postgres_storage_requirements: + requests: + storage: 100Gi + postgres_resource_requirements: + requests: + cpu: "1" + memory: "2Gi" + limits: + cpu: "2" + memory: "4Gi" + + # Project Persistence + projects_persistence: true + projects_storage_size: 20Gi + projects_storage_class: standard + + # Web (UI/API) Resource Configuration + web_replicas: 2 + web_resource_requirements: + requests: + cpu: "1" + memory: "2Gi" + limits: + cpu: "2" + memory: "4Gi" + + # Task Runner Resource Configuration (runs playbooks) + task_replicas: 2 + task_resource_requirements: + requests: + cpu: "2" + memory: "4Gi" + limits: + cpu: "4" + memory: "8Gi" + + # Redis Configuration + redis_resource_requirements: + requests: + cpu: "250m" + memory: "512Mi" + limits: + cpu: "1" + memory: "2Gi" + + # Security Settings + security_context_settings: + runAsGroup: 0 + runAsUser: 0 + fsGroup: 0 + + # Extra Settings (LDAP, session, etc.) + extra_settings: + - setting: SESSION_COOKIE_SECURE + value: "True" + - setting: CSRF_COOKIE_SECURE + value: "True" + - setting: SOCIAL_AUTH_REDIRECT_IS_HTTPS + value: "True" + - setting: SESSIONS_PER_USER + value: "3" + - setting: AUTH_TOKEN_EXPIRATION + value: "1800" diff --git a/awx-deployment/manifests/04-ingress-tls.yaml b/awx-deployment/manifests/04-ingress-tls.yaml new file mode 100644 index 0000000000..bd1144440d --- /dev/null +++ b/awx-deployment/manifests/04-ingress-tls.yaml @@ -0,0 +1,35 @@ +apiVersion: v1 +kind: Secret +metadata: + name: awx-tls-secret + namespace: awx +type: kubernetes.io/tls +data: + # Replace with your base64-encoded TLS certificate and key + # Generate with: cat tls.crt | base64 -w0 + # Generate with: cat tls.key | base64 -w0 + tls.crt: REPLACE_WITH_BASE64_ENCODED_CERT + tls.key: REPLACE_WITH_BASE64_ENCODED_KEY +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: awx-network-policy + namespace: awx +spec: + podSelector: + matchLabels: + app.kubernetes.io/part-of: awx-production + policyTypes: + - Ingress + - Egress + ingress: + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: ingress-nginx + ports: + - protocol: TCP + port: 8052 + egress: + - to: [] diff --git a/awx-deployment/manifests/05-backup-cronjob.yaml b/awx-deployment/manifests/05-backup-cronjob.yaml new file mode 100644 index 0000000000..8c65e48852 --- /dev/null +++ b/awx-deployment/manifests/05-backup-cronjob.yaml @@ -0,0 +1,63 @@ +apiVersion: batch/v1 +kind: CronJob +metadata: + name: awx-postgres-backup + namespace: awx +spec: + schedule: "0 2 * * *" + successfulJobsHistoryLimit: 7 + failedJobsHistoryLimit: 3 + jobTemplate: + spec: + template: + spec: + containers: + - name: postgres-backup + image: postgres:15 + env: + - name: PGHOST + value: awx-production-postgres-15 + - name: PGUSER + valueFrom: + secretKeyRef: + name: awx-production-postgres-configuration + key: username + - name: PGPASSWORD + valueFrom: + secretKeyRef: + name: awx-production-postgres-configuration + key: password + - name: PGDATABASE + valueFrom: + secretKeyRef: + name: awx-production-postgres-configuration + key: database + command: + - /bin/bash + - -c + - | + TIMESTAMP=$(date +%Y%m%d_%H%M%S) + pg_dump -Fc > /backups/awx_backup_${TIMESTAMP}.dump + # Keep only last 7 days of backups + find /backups -name "awx_backup_*.dump" -mtime +7 -delete + volumeMounts: + - name: backup-storage + mountPath: /backups + volumes: + - name: backup-storage + persistentVolumeClaim: + claimName: awx-backup-pvc + restartPolicy: OnFailure +--- +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: awx-backup-pvc + namespace: awx +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 50Gi + storageClassName: standard diff --git a/awx-deployment/scripts/deploy-awx.sh b/awx-deployment/scripts/deploy-awx.sh new file mode 100644 index 0000000000..49f7b45b45 --- /dev/null +++ b/awx-deployment/scripts/deploy-awx.sh @@ -0,0 +1,80 @@ +#!/bin/bash +# AWX Production Deployment Script +# Deploys AWX on Kubernetes for managing 150 servers + +set -euo pipefail + +MANIFESTS_DIR="$(dirname "$0")/../manifests" + +echo "=== AWX Production Deployment ===" +echo "Target: 150 server infrastructure" +echo "" + +# Check prerequisites +if ! command -v kubectl &> /dev/null; then + echo "ERROR: kubectl not found. Install Kubernetes first." + echo "Run: ./install-k3s.sh" + exit 1 +fi + +# Verify cluster is ready +if ! kubectl cluster-info &> /dev/null; then + echo "ERROR: Cannot connect to Kubernetes cluster." + exit 1 +fi + +echo "[1/5] Creating namespace..." +kubectl apply -f "${MANIFESTS_DIR}/00-namespace.yaml" + +echo "[2/5] Deploying AWX Operator..." +kubectl apply -f "${MANIFESTS_DIR}/01-awx-operator.yaml" + +echo "Waiting for AWX Operator to be ready..." +kubectl wait --namespace awx \ + --for=condition=available deployment/awx-operator-controller-manager \ + --timeout=300s + +echo "[3/5] Creating persistent storage..." +kubectl apply -f "${MANIFESTS_DIR}/02-postgres-pvc.yaml" + +echo "[4/5] Deploying AWX instance..." +kubectl apply -f "${MANIFESTS_DIR}/03-awx-production.yaml" + +echo "Waiting for AWX to be ready (this may take 5-10 minutes)..." +echo "You can monitor progress with: kubectl -n awx get pods -w" + +# Wait for AWX web pod to be ready +for i in {1..60}; do + if kubectl -n awx get pods | grep -q "awx-production-web.*Running"; then + break + fi + echo " Still deploying... (${i}/60)" + sleep 10 +done + +echo "[5/5] Setting up backup schedule..." +kubectl apply -f "${MANIFESTS_DIR}/05-backup-cronjob.yaml" + +echo "" +echo "=== Deployment Complete ===" +echo "" + +# Get admin password +ADMIN_PASSWORD=$(kubectl -n awx get secret awx-production-admin-password \ + -o jsonpath="{.data.password}" 2>/dev/null | base64 --decode) + +if [[ -n "$ADMIN_PASSWORD" ]]; then + echo "AWX Admin Credentials:" + echo " Username: admin" + echo " Password: ${ADMIN_PASSWORD}" + echo "" +fi + +echo "AWX URL: https://awx.yourdomain.com" +echo "" +echo "Next steps:" +echo " 1. Configure TLS: Update manifests/04-ingress-tls.yaml with your certificates" +echo " 2. Add your 150 servers to the inventory" +echo " 3. Configure LDAP authentication" +echo " 4. Set up credential vaults" +echo " 5. Import playbooks from your Git repositories" diff --git a/awx-deployment/scripts/health-check.sh b/awx-deployment/scripts/health-check.sh new file mode 100644 index 0000000000..3476349e40 --- /dev/null +++ b/awx-deployment/scripts/health-check.sh @@ -0,0 +1,63 @@ +#!/bin/bash +# AWX Production Health Check Script +# Run this periodically to verify AWX is healthy + +set -euo pipefail + +echo "=== AWX Production Health Check ===" +echo "Date: $(date)" +echo "" + +ERRORS=0 + +# Check pods +echo "[Pods Status]" +kubectl -n awx get pods -o wide +echo "" + +# Check if all pods are running +NOT_RUNNING=$(kubectl -n awx get pods --no-headers | grep -v "Running\|Completed" | wc -l) +if [[ $NOT_RUNNING -gt 0 ]]; then + echo "WARNING: ${NOT_RUNNING} pod(s) not in Running state" + ERRORS=$((ERRORS + 1)) +fi + +# Check PVC usage +echo "[Storage Status]" +kubectl -n awx get pvc +echo "" + +# Check AWX web endpoint +echo "[Web Endpoint]" +AWX_SVC=$(kubectl -n awx get svc awx-production-service -o jsonpath='{.spec.clusterIP}' 2>/dev/null) +if [[ -n "$AWX_SVC" ]]; then + HTTP_CODE=$(curl -sk -o /dev/null -w "%{http_code}" "http://${AWX_SVC}:80/api/v2/ping/" 2>/dev/null || echo "000") + if [[ "$HTTP_CODE" == "200" ]]; then + echo " API Status: HEALTHY (HTTP ${HTTP_CODE})" + else + echo " API Status: UNHEALTHY (HTTP ${HTTP_CODE})" + ERRORS=$((ERRORS + 1)) + fi +else + echo " AWX service not found" + ERRORS=$((ERRORS + 1)) +fi +echo "" + +# Check recent jobs +echo "[Recent Job Status]" +kubectl -n awx logs -l app.kubernetes.io/component=awx-task --tail=20 2>/dev/null | grep -i "error\|fail" | tail -5 || echo " No recent errors" +echo "" + +# Check resource usage +echo "[Resource Usage]" +kubectl -n awx top pods 2>/dev/null || echo " Metrics server not available" +echo "" + +# Summary +echo "=== Summary ===" +if [[ $ERRORS -eq 0 ]]; then + echo "Status: ALL HEALTHY" +else + echo "Status: ${ERRORS} ISSUE(S) DETECTED" +fi diff --git a/awx-deployment/scripts/install-k3s.sh b/awx-deployment/scripts/install-k3s.sh new file mode 100644 index 0000000000..02e617760e --- /dev/null +++ b/awx-deployment/scripts/install-k3s.sh @@ -0,0 +1,48 @@ +#!/bin/bash +# AWX Production - K3s Installation Script +# This script installs a lightweight Kubernetes cluster for AWX + +set -euo pipefail + +echo "=== AWX Production: Installing K3s ===" + +# Check if running as root +if [[ $EUID -ne 0 ]]; then + echo "This script must be run as root" + exit 1 +fi + +# System requirements check +TOTAL_MEM=$(free -g | awk '/^Mem:/{print $2}') +if [[ $TOTAL_MEM -lt 8 ]]; then + echo "WARNING: Minimum 8 GB RAM recommended. Found: ${TOTAL_MEM} GB" + read -p "Continue anyway? (y/n) " -n 1 -r + echo + if [[ ! $REPLY =~ ^[Yy]$ ]]; then + exit 1 + fi +fi + +# Install K3s +curl -sfL https://get.k3s.io | sh -s - \ + --write-kubeconfig-mode 644 \ + --disable traefik \ + --disable servicelb + +# Wait for K3s to be ready +echo "Waiting for K3s to be ready..." +kubectl wait --for=condition=Ready nodes --all --timeout=120s + +# Install NGINX Ingress Controller +kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/cloud/deploy.yaml + +echo "Waiting for ingress controller..." +kubectl wait --namespace ingress-nginx \ + --for=condition=ready pod \ + --selector=app.kubernetes.io/component=controller \ + --timeout=120s + +echo "" +echo "=== K3s Installation Complete ===" +echo "Kubeconfig: /etc/rancher/k3s/k3s.yaml" +echo "Next step: Run ./deploy-awx.sh" From 31b26277f145c7e1f796794ec0a87f4d54c0b9b9 Mon Sep 17 00:00:00 2001 From: nahmad23 <105396355+nahmad23@users.noreply.github.com> Date: Tue, 21 Apr 2026 01:09:56 +0530 Subject: [PATCH 3/4] Fix AWX operator CrashLoopBackOff: add leader-election-id The operator was failing with 'LeaderElectionID must be configured'. Added --leader-election-id flag and environment variables required by newer awx-operator versions. https://claude.ai/code/session_012amH3WuqHRb4nzYGPZTYt1 --- awx-deployment/manifests/01-awx-operator.yaml | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/awx-deployment/manifests/01-awx-operator.yaml b/awx-deployment/manifests/01-awx-operator.yaml index f7214489f5..e46dd692e7 100644 --- a/awx-deployment/manifests/01-awx-operator.yaml +++ b/awx-deployment/manifests/01-awx-operator.yaml @@ -40,9 +40,14 @@ spec: image: quay.io/ansible/awx-operator:latest args: - --leader-elect + - --leader-election-id=awx-operator env: - name: ANSIBLE_GATHERING value: explicit + - name: ANSIBLE_OPERATOR_LEADER_ELECTION_ENABLED + value: "true" + - name: ANSIBLE_OPERATOR_LEADER_ELECTION_ID + value: "awx-operator" resources: limits: cpu: 500m From 4dfa80d7702fed55dd4e464ecd974877c65e5b25 Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 20 Apr 2026 19:59:58 +0000 Subject: [PATCH 4/4] Fix storage class: change 'standard' to 'local-path' for K3s K3s uses 'local-path' as its default StorageClass, not 'standard'. Updated all PVC and AWX manifests to match the actual cluster config. https://claude.ai/code/session_012amH3WuqHRb4nzYGPZTYt1 --- awx-deployment/manifests/02-postgres-pvc.yaml | 2 +- awx-deployment/manifests/03-awx-production.yaml | 4 ++-- awx-deployment/manifests/05-backup-cronjob.yaml | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/awx-deployment/manifests/02-postgres-pvc.yaml b/awx-deployment/manifests/02-postgres-pvc.yaml index d15eb6acd9..ddd58042c4 100644 --- a/awx-deployment/manifests/02-postgres-pvc.yaml +++ b/awx-deployment/manifests/02-postgres-pvc.yaml @@ -12,4 +12,4 @@ spec: resources: requests: storage: 100Gi - storageClassName: standard + storageClassName: local-path diff --git a/awx-deployment/manifests/03-awx-production.yaml b/awx-deployment/manifests/03-awx-production.yaml index 6c8c7887d4..24e4f87496 100644 --- a/awx-deployment/manifests/03-awx-production.yaml +++ b/awx-deployment/manifests/03-awx-production.yaml @@ -15,7 +15,7 @@ spec: ingress_tls_secret: awx-tls-secret # PostgreSQL Configuration - postgres_storage_class: standard + postgres_storage_class: local-path postgres_storage_requirements: requests: storage: 100Gi @@ -30,7 +30,7 @@ spec: # Project Persistence projects_persistence: true projects_storage_size: 20Gi - projects_storage_class: standard + projects_storage_class: local-path # Web (UI/API) Resource Configuration web_replicas: 2 diff --git a/awx-deployment/manifests/05-backup-cronjob.yaml b/awx-deployment/manifests/05-backup-cronjob.yaml index 8c65e48852..5eb46ddd70 100644 --- a/awx-deployment/manifests/05-backup-cronjob.yaml +++ b/awx-deployment/manifests/05-backup-cronjob.yaml @@ -60,4 +60,4 @@ spec: resources: requests: storage: 50Gi - storageClassName: standard + storageClassName: local-path