Skip to content

Conversation

@butler54
Copy link
Collaborator

No description provided.

Signed-off-by: Chris Butler <chris.butler@redhat.com>
Signed-off-by: Chris Butler <chris.butler@redhat.com>
Signed-off-by: Chris Butler <chris.butler@redhat.com>
BREAKING CHANGE: Replaced ACR with bastion-hosted podman registry for truly self-contained deployment

## Major Changes

### 1. Bastion-Hosted Container Registry
- Replace Azure Container Registry with podman registry on bastion (port 5000)
- Eliminates ACR, private endpoint, and private DNS complexity
- Truly self-contained: all images served from bastion
- Auto-configured by cloud-init (registry.service systemd unit)
- Storage: /var/cache/oc-mirror/registry/data (500GB data disk)

### 2. Maximum Terraform Automation
- Cloud-init now 100% self-contained (passes Azure creds, git URL via Terraform)
- Auto-generates SSH key on bastion
- Auto-clones pattern repository
- Auto-starts all three HTTP servers (registry, git, ignition)
- deploy-cluster.sh auto-runs mirroring if needed (eliminates manual step)

### 3. Network Security Updates
- Added AllowAzureCloudAPIs NSG rule for cluster VM provisioning
- Updated AllowBastionServices to include port 5000 (registry)
- Removed service endpoints (no longer using ACR or Azure Storage from cluster)
- Cluster: NO internet, YES Azure APIs, ALL content from bastion

### 4. Documentation Consolidation
- Created master ARCHITECTURE.md (single source of truth)
- Archived 7 iterative docs to docs/archive-20251113/
- Updated README.md to reference ARCHITECTURE.md
- Clear deployment flow with automation details

### 5. Terraform-First Refactoring
- Moved deprecated shell-heavy wrappers to deprecated-scripts-20251113/
- Created terraform-rhcos-image/ module for RHCOS prep
- Created terraform-upi-complete/ module for full UPI
- deploy-cluster.sh is minimal orchestration (247 lines vs 463)

## Files Changed

### Infrastructure (Terraform)
- terraform/main.tf: Remove ACR, add registry NSG rules, add AzureCloud API rule
- terraform/cloud-init.yaml: Add registry service, update .envrc for REGISTRY_URL
- terraform/variables.tf: Remove acr_sku, add git_remote_url/git_branch
- terraform/outputs.tf: Remove ACR outputs, add bastion_registry_url

### Scripts
- bastion/mirror.sh: Target localhost:5000 instead of ACR
- bastion/deploy-cluster.sh: Remove ACR_LOGIN_SERVER, add auto-mirroring, use REGISTRY_URL
- configure-bastion.sh: Remove ACR retrieval, add registry verification
- provision.sh: Auto-detect git remote/branch, pass to Terraform

### Documentation
- ARCHITECTURE.md: NEW - Comprehensive single-source architecture guide
- README.md: Link to ARCHITECTURE.md
- docs/archive-20251113/: Archived 7 iterative docs with README

### New Modules
- terraform-rhcos-image/: Terraform module for RHCOS image preparation
- terraform-upi-complete/: Complete UPI deployment with DNS, LBs, VMs
- deprecated-scripts-20251113/: Backup of old shell-heavy wrappers

## Fresh Deployment Flow (Simplified)

1. `./provision.sh eastasia` - Terraform creates infra, cloud-init configures bastion (15 min)
2. `scp ~/pull-secret.json azureuser@<bastion-ip>:~/` - Copy pull secret (instant)
3. `ssh azureuser@<bastion-ip> 'cd ~/coco-pattern && ./rhdp-isolated/bastion/deploy-cluster.sh eastasia'` - Deploy (2.5-5 hrs first time)

All configuration automated. No manual steps except pull secret copy.

## Verified Assumptions

1. ✅ Cluster cannot access internet (DenyInternetOutbound NSG)
2. ✅ Cluster CAN access Azure APIs (AllowAzureCloudAPIs NSG)
3. ✅ All images mirrored to bastion registry
4. ✅ Bastion runs oc-mirror (auto in deploy-cluster.sh)
5. ✅ Bastion hosts git (port 8080)
6. ✅ Bastion hosts ignition (port 8081)
7. ✅ Bastion hosts registry (port 5000)
8. ✅ Blob storage only used by bastion for RHCOS VHD (not by cluster)
9. ✅ NSG isolates cluster from internet, allows Azure APIs

## Benefits

- 37% code reduction (663 → 417 lines)
- Zero manual bastion configuration
- One-command deployment
- Bastion registry simpler than ACR
- Terraform state management
- Built-in idempotency
- Fresh deployments work automatically
- Replace ACR_LOGIN_SERVER/ACR_NAME outputs with REGISTRY_URL
- Update infrastructure-outputs.env to use bastion registry
- Fix provision.sh completion message to show registry instead of ACR
…_LOGIN_SERVER

- Verification now checks for REGISTRY_URL (bastion registry) instead of ACR_LOGIN_SERVER
- Adds helpful error message showing expected vs found variables
- Remove `oc-mirror version` command that triggers deprecation warning
- Script already uses --v2 flag for actual mirroring
- Just verify command exists, skip version display
- Add registries.conf.d/bastion-registry.conf to mark 10.0.1.4:5000 and localhost:5000 as insecure
- Allows oc-mirror and podman to use HTTP registry without TLS
- Fixes "http: server gave HTTP response to HTTPS client" error
- Change oc-mirror flag from --dest-skip-tls to --dest-tls-verify=false (correct v2 syntax)
- Add MachineConfig manifests to configure insecure registry on all cluster nodes
- Update deploy-cluster.sh to copy MachineConfig before generating ignition
- Add imageContentSources to install-config for bastion registry
- Ensures cluster nodes can pull images from HTTP registry at 10.0.1.4:5000
- Remove azure-cli from yum packages (causes distutils ModuleNotFoundError on RHEL 10)
- Install Azure CLI via pip3 with --break-system-packages flag
- Fixes "No module named 'distutils'" error in RHCOS image preparation
- RHEL 10 yum azure-cli package uses Python 3.6 which lacks distutils
- pip-installed azure-cli uses system Python 3.12 which works correctly
…CR_LOGIN_SERVER

- Change rhdp-cluster-define-disconnected.py to use REGISTRY_URL environment variable
- Remove ACR certificate retrieval (bastion registry uses HTTP, no TLS)
- Fixes KeyError: 'ACR_LOGIN_SERVER' in deploy-cluster.sh Step 2
- Set additionalTrustBundle to empty string (HTTP registry, no cert needed)
- Remove imageContentSources (conflicts with imageDigestSources from IDMS)
- Remove bootstrapExternalStaticGateway (unknown field, causes warning)
- Keep bootstrapExternalStaticIP for UPI static bootstrap IP

Fixes:
- "invalid block" error for additionalTrustBundle
- "cannot set imageContentSources and imageDigestSources at the same time" error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant