Skip to content

Conversation

@padili-metrostar
Copy link

Summary

This PR contains significant improvements to the quartzctl CLI for better cluster cleanup reliability and error handling.

Changes

AWS Resource Cleanup (New)

  • Implement proactive check for blocking AWS resources before Terraform destroy
  • Add EC2 instance termination functionality
  • Implement fallback AWS resource cleanup for orphaned resources
  • Clean up Load Balancers, ENIs, Security Groups blocking VPC deletion

Kubernetes Cleanup Improvements

  • Implement Kubernetes resource cleanup before EC2 termination
  • Delete ALL validating and mutating webhooks (aggressive cleanup)
  • Patch stuck namespaces with finalizer removal
  • Add VirtualService listing and print functionality

Retry Logic Enhancements

  • Enhanced logging and retry mechanisms for health checks
  • Add DaemonSet readiness checks
  • Cleanup for stuck terminating pods
  • Recognize Helm-specific errors for smarter retries

Testing

  • Add unit tests for isRetryableDestroyError() function (12 test cases)
  • Add unit tests for isHelmReleaseError() function (7 test cases)
  • All 19 tests passing

Key Functions Added

func isRetryableDestroyError(errStr string) bool { ... }
func isHelmReleaseError(errStr string) bool { ... }
func cleanupKubernetesBlockers() error { ... }
func ForceAWSCleanup() error { ... }

Testing

  • ✅ Unit tests pass: go test -v -run "TestIs" ./internal/cmd/...
  • ✅ Cleanup completed successfully on test cluster (14m 42s, 185 resources destroyed)
  • ✅ No orphaned AWS resources after cleanup

Related PRs

  • MetroStar/quartz#71 - DILT-19-Quartz-Maintenance
  • MetroStar/quartz-cicd - DILT-19-Quartz-Maintenance

… mutating webhooks, and patch stuck namespaces
Add table-driven tests for isRetryableDestroyError() and isHelmReleaseError()
functions that handle terraform destroy retry logic. Tests cover:
- All 10 retryable error patterns (DependencyViolation, NetworkInterfaceInUse, etc.)
- All 4 Helm-specific error patterns
- Negative cases for non-retryable errors
@codecov
Copy link

codecov bot commented Dec 23, 2025

Codecov Report

❌ Patch coverage is 70.84871% with 79 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.39%. Comparing base (e5a799b) to head (f2a9d9f).

Files with missing lines Patch % Lines
internal/cmd/install.go 68.68% 29 Missing and 2 partials ⚠️
internal/provider/kubernetes.go 71.73% 20 Missing and 6 partials ⚠️
internal/cmd/internal.go 47.61% 8 Missing and 3 partials ⚠️
internal/stages/daemonset_check.go 87.09% 2 Missing and 2 partials ⚠️
internal/provider/aws.go 57.14% 1 Missing and 2 partials ⚠️
internal/stages/check.go 75.00% 2 Missing and 1 partial ⚠️
internal/config/schema/environment.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #62      +/-   ##
==========================================
+ Coverage   67.44%   69.39%   +1.95%     
==========================================
  Files          64       64              
  Lines        4457     3944     -513     
==========================================
- Hits         3006     2737     -269     
+ Misses       1228      967     -261     
- Partials      223      240      +17     
Files with missing lines Coverage Δ
internal/cmd/util.go 82.71% <100.00%> (+6.28%) ⬆️
internal/terraform/operations.go 78.70% <100.00%> (+3.15%) ⬆️
internal/config/schema/environment.go 0.00% <0.00%> (ø)
internal/provider/aws.go 83.47% <57.14%> (+1.12%) ⬆️
internal/stages/check.go 86.72% <75.00%> (-0.38%) ⬇️
internal/stages/daemonset_check.go 87.09% <87.09%> (ø)
internal/cmd/internal.go 62.22% <47.61%> (-6.75%) ⬇️
internal/provider/kubernetes.go 70.60% <71.73%> (+7.48%) ⬆️
internal/cmd/install.go 66.28% <68.68%> (+9.85%) ⬆️

... and 54 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Fixes 15 vulnerabilities identified by govulncheck:
- GO-2025-4175: crypto/x509 wildcard DNS name constraint bypass
- GO-2025-4155: crypto/x509 excessive resource consumption
- GO-2025-4013: crypto/x509 panic with DSA public keys
- GO-2025-4012: net/http cookie parsing memory exhaustion
- GO-2025-4011: encoding/asn1 DER parsing memory exhaustion
- GO-2025-4010: net/url insufficient IPv6 hostname validation
- GO-2025-4009: encoding/pem quadratic complexity
- GO-2025-4008: crypto/tls ALPN negotiation information leak
- GO-2025-4007: crypto/x509 name constraint quadratic complexity
- GO-2025-3956: os/exec unexpected LookPath paths
- GO-2025-3900, GO-2025-3787: go-viper/mapstructure log leakage
- GO-2025-3751: net/http sensitive header cross-origin leak
- GO-2025-3750: syscall O_CREATE|O_EXCL handling (Windows)
- GO-2025-3749: crypto/x509 ExtKeyUsageAny policy validation
- Add awsNoneValue constant for AWS CLI 'None' return values (goconst)
- Add #nosec G204 comments for intentional exec usage with validated config values (gosec)
- Use _ to explicitly ignore error returns for best-effort cleanup operations (errcheck)
- Remove unused parameter from cleanupKubernetesBlockers (unparam)
- Fix gofmt alignment in StageChecksConfig struct (gofmt)
- Add comprehensive tests for DaemonSetStageCheck in stages package
- Add tests for GetDaemonSetStatus in provider/kubernetes
- Add test for CleanupStuckTerminatingPods
- Improve overall test coverage for new code
- Add codecov.yml to configure coverage thresholds and ignore CLI integration code
- Exclude aws_cleanup.go and internal.go (heavy exec.CommandContext usage)
- Add test for DaemonSet not found error path
- Set patch target to 50% with 5% threshold for integration-heavy code
- Set project threshold to allow 2% decrease from base
- Add tests for ListVirtualServices and ListVirtualServicesNotFound
- Add test for ForEachDynamicResourcesNamespaced
- Add tests for appendChecks with DaemonSet and multiple check types
- Add tests for preEventChecks and postEventChecks with DaemonSet ordering
- Improve provider coverage from 75.7% to 76.2%
- Improve stages coverage from 87.4% to 88.0%
- Add tests for onCheckStart, onCheckComplete, onCheckRetry callbacks
- Add tests for GetSecret and GetSecretNotFound
- Add tests for PrintDiscoveredVirtualServices with and without CRD
- Improve cmd coverage from 52.4% to 53.3%
- Improve provider coverage from 76.3% to 78.2%
- Add 4 tests for Export function covering success, errors, unknown kind, and empty objects
- Add 2 tests for CleanupTerminatingPods covering default timeout and zero timeout
- Coverage improved: internal/cmd 54.5%, internal/provider 80.9%
internal.go now has 66-75% coverage after adding CleanupTerminatingPods tests
module github.com/MetroStar/quartzctl

go 1.24.2
go 1.24.11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update devbox.json golang version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants