Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ repos:
entry: just test-cmd
language: system
pass_filenames: false
files: ^cmd/.*\.go$
files: ^(cmd/.*\.go|cmd/go\.(mod|sum)|lib/go\.(mod|sum))$

- id: test-lib
name: Test lib
entry: just test-lib
language: system
pass_filenames: false
files: ^lib/.*\.go$
files: ^lib/(.*\.go|go\.(mod|sum))$
8 changes: 7 additions & 1 deletion Justfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ check: check-python-pulumi

# Format all
[group('format')]
format: format-python-pulumi
format: format-python-pulumi format-go

alias fmt := format

Expand Down Expand Up @@ -186,3 +186,9 @@ validate-config-sync:
[group('format')]
format-python-pulumi:
cd {{ justfile_directory() }}/python-pulumi && just format

# Format Go code
[group('format')]
format-go:
cd {{ justfile_directory() }}/lib && go fmt ./...
cd {{ justfile_directory() }}/cmd && go fmt ./...
2 changes: 1 addition & 1 deletion cmd/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ replace github.com/posit-dev/ptd/lib => ../lib
require (
github.com/charmbracelet/log v0.4.2
github.com/posit-dev/ptd/lib v0.0.0-00010101000000-000000000000
github.com/spf13/cobra v1.9.1
github.com/spf13/cobra v1.10.1
github.com/spf13/viper v1.20.1
github.com/stretchr/testify v1.11.1
golang.org/x/term v0.39.0
Expand Down
6 changes: 3 additions & 3 deletions cmd/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -365,9 +365,9 @@ github.com/spf13/afero v1.14.0 h1:9tH6MapGnn/j0eb0yIXiLjERO8RB6xIVZRDCX7PtqWA=
github.com/spf13/afero v1.14.0/go.mod h1:acJQ8t0ohCGuMN3O+Pv0V0hgMxNYDlvdk+VTfyZmbYo=
github.com/spf13/cast v1.9.2 h1:SsGfm7M8QOFtEzumm7UZrZdLLquNdzFYfIbEXntcFbE=
github.com/spf13/cast v1.9.2/go.mod h1:jNfB8QC9IA6ZuY2ZjDp0KtFO2LZZlg4S/7bzP6qqeHo=
github.com/spf13/cobra v1.9.1 h1:CXSaggrXdbHK9CF+8ywj8Amf7PBRmPCOJugH954Nnlo=
github.com/spf13/cobra v1.9.1/go.mod h1:nDyEzZ8ogv936Cinf6g1RU9MRY64Ir93oCnqb9wxYW0=
github.com/spf13/pflag v1.0.6/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
github.com/spf13/cobra v1.10.1 h1:lJeBwCfmrnXthfAupyUTzJ/J4Nc1RsHC/mSRU2dll/s=
github.com/spf13/cobra v1.10.1/go.mod h1:7SmJGaTHFVBY0jW4NXGluQoLvhqFQM+6XSKD+P4XaB0=
github.com/spf13/pflag v1.0.9/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
github.com/spf13/pflag v1.0.10 h1:4EBh2KAYBwaONj6b2Ye1GiHfwjqyROoF4RwYO+vPwFk=
github.com/spf13/pflag v1.0.10/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
github.com/spf13/viper v1.20.1 h1:ZMi+z/lvLyPSCoNtFCpqjy0S4kPbirhpTMwl8BkW9X4=
Expand Down
2 changes: 1 addition & 1 deletion cmd/workon_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -115,4 +115,4 @@ func TestFindCustomStep(t *testing.T) {

func boolPtr(b bool) *bool {
return &b
}
}
33 changes: 33 additions & 0 deletions docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ spec:
mp_instance_type: r6a.2xlarge
root_disk_size: 200
routing_weight: "100" # For blue/green: 0-255
force_maintenance: false # Enable to bypass PDBs during upgrades
components:
traefik_forward_auth_version: "0.0.14"

Expand Down Expand Up @@ -298,6 +299,38 @@ With 3 AZs and 2 nodes, there's no guarantee that nodes will cover all AZs where

**Note:** This setting only affects new VPCs. Changing this value on an existing workload will cause Pulumi to attempt to delete subnets in the removed AZ.

## Cluster Maintenance Options

### force_maintenance

The `force_maintenance` option enables cluster version upgrades to proceed even when they would normally be blocked by safety checks.

```yaml
clusters:
"20250115":
spec:
cluster_version: "1.33"
force_maintenance: true # Bypass upgrade-blocking checks
```

| Cloud Provider | Behavior |
|----------------|----------|
| AWS EKS | Sets `ForceUpdateVersion` on the cluster, which overrides upgrade-blocking readiness checks including EKS Insights validations (deprecated APIs, compatibility issues, cluster health checks) |
| Azure AKS | Sets `UpgradeSettings.OverrideSettings.ForceUpgrade` with a 24-hour expiration window, which bypasses PodDisruptionBudget (PDB) constraints and takes precedence over all other drain configurations |

**When to use:**
- During planned maintenance windows when you accept workload disruption
- When PDBs are blocking necessary security or version upgrades (Azure)
- When EKS upgrade insights are blocking an upgrade you've assessed as safe (AWS)
- When you need to force through an upgrade that has stalled

**Caution:**
- **Azure**: Bypasses PodDisruptionBudget protections, which may cause service disruption. Pods protected by PDBs may be evicted without respecting minimum availability guarantees.
- **AWS**: Bypasses pre-upgrade validation checks. Review EKS Insights warnings before forcing an upgrade to understand what issues are being overridden.
- Only enable temporarily during maintenance windows, then set back to `false`

**Default:** `false` (safety checks are respected during upgrades)

## See Also

- [Getting Started](GETTING_STARTED.md)
Expand Down
32 changes: 32 additions & 0 deletions docs/cli/PTD_CLI_REFERENCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -862,6 +862,38 @@ targets:
# Additional target-specific configuration
```

### Cluster Configuration Options

#### force_maintenance

The `force_maintenance` option enables cluster version upgrades to proceed even when they would normally be blocked by safety checks.

```yaml
clusters:
"20250115":
spec:
cluster_version: "1.33"
force_maintenance: true # Bypass upgrade-blocking checks
```

| Cloud Provider | Behavior |
|----------------|----------|
| AWS EKS | Sets `ForceUpdateVersion` on the cluster, which overrides upgrade-blocking readiness checks including EKS Insights validations (deprecated APIs, compatibility issues, cluster health checks) |
| Azure AKS | Sets `UpgradeSettings.OverrideSettings.ForceUpgrade` with a 24-hour expiration window, which bypasses PodDisruptionBudget (PDB) constraints and takes precedence over all other drain configurations |

**When to use:**
- During planned maintenance windows when you accept workload disruption
- When PDBs are blocking necessary security or version upgrades (Azure)
- When EKS upgrade insights are blocking an upgrade you've assessed as safe (AWS)
- When you need to force through an upgrade that has stalled

**Caution:**
- **Azure**: Bypasses PodDisruptionBudget protections, which may cause service disruption. Pods protected by PDBs may be evicted without respecting minimum availability guarantees.
- **AWS**: Bypasses pre-upgrade validation checks. Review EKS Insights warnings before forcing an upgrade to understand what issues are being overridden.
- Only enable temporarily during maintenance windows, then set back to `false`

**Default:** `false` (safety checks are respected during upgrades)

---

## Related Documentation
Expand Down
2 changes: 1 addition & 1 deletion lib/azure/aks.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ func GetKubeCredentials(ctx context.Context, creds *Credentials, subscriptionID,
}

return result.Kubeconfigs[0].Value, nil
}
}
2 changes: 1 addition & 1 deletion lib/kube/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -138,4 +138,4 @@ func AddProxyToKubeConfig(filePath string, proxyURL string) error {
}

return nil
}
}
2 changes: 1 addition & 1 deletion lib/kube/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -104,4 +104,4 @@ func TestAddProxyToKubeConfig(t *testing.T) {
cluster := clusters[0].(map[interface{}]interface{})
clusterInfo := cluster["cluster"].(map[interface{}]interface{})
assert.Equal(t, "socks5://localhost:1080", clusterInfo["proxy-url"])
}
}
2 changes: 1 addition & 1 deletion lib/kube/proxy.go
Original file line number Diff line number Diff line change
Expand Up @@ -98,4 +98,4 @@ func GetCliPath(provider types.CloudProvider) string {
// Return empty string for unsupported providers
return ""
}
}
}
2 changes: 1 addition & 1 deletion lib/kube/setup.go
Original file line number Diff line number Diff line change
Expand Up @@ -165,4 +165,4 @@ func setupAzureKubeConfig(ctx context.Context, t types.Target, creds types.Crede
}

return kubeconfigPath, nil
}
}
4 changes: 2 additions & 2 deletions lib/proxy/preflight_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ func TestPreflightWithDualPids(t *testing.T) {

// Check results
assert.Equal(t, tc.expectActive, active)

if tc.expectError {
require.Error(t, err)
if tc.errorSubstring != "" {
Expand All @@ -91,4 +91,4 @@ func TestPreflightWithDualPids(t *testing.T) {
}
})
}
}
}
28 changes: 24 additions & 4 deletions lib/steps/aks.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"context"
"errors"
"fmt"
"time"

"github.com/pulumi/pulumi-azure-native-sdk/containerservice/v3"
"github.com/pulumi/pulumi/sdk/v3/go/auto"
Expand Down Expand Up @@ -184,6 +185,25 @@ func (s *AKSStep) deploy(ctx *pulumi.Context, target types.Target) error {
ignoreChanges = append(ignoreChanges, "agentPoolProfiles")
}

// Build cluster resource options
clusterOpts := []pulumi.ResourceOption{
pulumi.Protect(config.ProtectPersistentResources),
pulumi.IgnoreChanges(ignoreChanges),
}

// Configure upgrade settings - use ForceUpgrade to bypass PDBs during maintenance
var upgradeSettings *containerservice.ClusterUpgradeSettingsArgs
if clusterConfig.ForceMaintenance {
// ForceUpgrade bypasses PDB constraints during cluster upgrades
// The 'Until' field sets when the override expires (required for it to take effect)
upgradeSettings = &containerservice.ClusterUpgradeSettingsArgs{
OverrideSettings: &containerservice.UpgradeOverrideSettingsArgs{
ForceUpgrade: pulumi.Bool(true),
Until: pulumi.String(time.Now().Add(24 * time.Hour).Format(time.RFC3339)),
},
}
}

aksCluster, err := containerservice.NewManagedCluster(ctx, fmt.Sprintf("aksCluster-%s", release), &containerservice.ManagedClusterArgs{
AadProfile: &containerservice.ManagedClusterAADProfileArgs{
EnableAzureRBAC: pulumi.Bool(true),
Expand Down Expand Up @@ -264,10 +284,10 @@ func (s *AKSStep) deploy(ctx *pulumi.Context, target types.Target) error {
Enabled: pulumi.Bool(true),
},
},
SupportPlan: pulumi.String("KubernetesOfficial"),
Tags: buildResourceTags(config.ResourceTags),
}, pulumi.Protect(config.ProtectPersistentResources),
pulumi.IgnoreChanges(ignoreChanges))
SupportPlan: pulumi.String("KubernetesOfficial"),
UpgradeSettings: upgradeSettings,
Tags: buildResourceTags(config.ResourceTags),
}, clusterOpts...)
if err != nil {
return err
}
Expand Down
58 changes: 29 additions & 29 deletions lib/types/controlroom.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ package types

// EKSAccessEntriesConfig holds configuration for EKS Access Entries
type EKSAccessEntriesConfig struct {
Enabled bool `json:"enabled" yaml:"enabled"`
AdditionalEntries []map[string]interface{} `json:"additional_entries" yaml:"additional_entries"`
Enabled bool `json:"enabled" yaml:"enabled"`
AdditionalEntries []map[string]interface{} `json:"additional_entries" yaml:"additional_entries"`
IncludeSameAccountPoweruser bool `json:"include_same_account_poweruser" yaml:"include_same_account_poweruser"`
}

Expand All @@ -29,31 +29,31 @@ type AWSControlRoomConfig struct {
TrueName string `json:"true_name" yaml:"true_name"`
EksAccessEntries *EKSAccessEntriesConfig `json:"eks_access_entries" yaml:"eks_access_entries"`
DBAllocatedStorage int `json:"db_allocated_storage" yaml:"db_allocated_storage"`
DBEngineVersion string `json:"db_engine_version" yaml:"db_engine_version"`
DBInstanceClass string `json:"db_instance_class" yaml:"db_instance_class"`
EksK8sVersion *string `json:"eks_k8s_version" yaml:"eks_k8s_version"`
EksNodeGroupMax int `json:"eks_node_group_max" yaml:"eks_node_group_max"`
EksNodeGroupMin int `json:"eks_node_group_min" yaml:"eks_node_group_min"`
EksNodeInstanceType string `json:"eks_node_instance_type" yaml:"eks_node_instance_type"`
HostedZoneID *string `json:"hosted_zone_id" yaml:"hosted_zone_id"`
ManageEcrRepositories bool `json:"manage_ecr_repositories" yaml:"manage_ecr_repositories"`
ProtectPersistentResources bool `json:"protect_persistent_resources" yaml:"protect_persistent_resources"`
Region string `json:"region" yaml:"region"`
ResourceTags map[string]string `json:"resource_tags" yaml:"resource_tags"`
TraefikDeploymentReplicas int `json:"traefik_deployment_replicas" yaml:"traefik_deployment_replicas"`
TrustedUsers []TrustedUser `json:"trusted_users" yaml:"trusted_users"`
FrontDoor *string `json:"front_door" yaml:"front_door"`
AwsFsxOpenzfsCsiVersion string `json:"aws_fsx_openzfs_csi_version" yaml:"aws_fsx_openzfs_csi_version"`
AwsLbcVersion string `json:"aws_lbc_version" yaml:"aws_lbc_version"`
ExternalDnsVersion string `json:"external_dns_version" yaml:"external_dns_version"`
GrafanaVersion string `json:"grafana_version" yaml:"grafana_version"`
KubeStateMetricsVersion string `json:"kube_state_metrics_version" yaml:"kube_state_metrics_version"`
MetricsServerVersion string `json:"metrics_server_version" yaml:"metrics_server_version"`
MimirVersion string `json:"mimir_version" yaml:"mimir_version"`
SecretStoreCsiAwsProviderVersion string `json:"secret_store_csi_aws_provider_version" yaml:"secret_store_csi_aws_provider_version"`
SecretStoreCsiVersion string `json:"secret_store_csi_version" yaml:"secret_store_csi_version"`
TailscaleEnabled bool `json:"tailscale_enabled" yaml:"tailscale_enabled"`
TigeraOperatorVersion string `json:"tigera_operator_version" yaml:"tigera_operator_version"`
TraefikForwardAuthVersion string `json:"traefik_forward_auth_version" yaml:"traefik_forward_auth_version"`
TraefikVersion string `json:"traefik_version" yaml:"traefik_version"`
DBEngineVersion string `json:"db_engine_version" yaml:"db_engine_version"`
DBInstanceClass string `json:"db_instance_class" yaml:"db_instance_class"`
EksK8sVersion *string `json:"eks_k8s_version" yaml:"eks_k8s_version"`
EksNodeGroupMax int `json:"eks_node_group_max" yaml:"eks_node_group_max"`
EksNodeGroupMin int `json:"eks_node_group_min" yaml:"eks_node_group_min"`
EksNodeInstanceType string `json:"eks_node_instance_type" yaml:"eks_node_instance_type"`
HostedZoneID *string `json:"hosted_zone_id" yaml:"hosted_zone_id"`
ManageEcrRepositories bool `json:"manage_ecr_repositories" yaml:"manage_ecr_repositories"`
ProtectPersistentResources bool `json:"protect_persistent_resources" yaml:"protect_persistent_resources"`
Region string `json:"region" yaml:"region"`
ResourceTags map[string]string `json:"resource_tags" yaml:"resource_tags"`
TraefikDeploymentReplicas int `json:"traefik_deployment_replicas" yaml:"traefik_deployment_replicas"`
TrustedUsers []TrustedUser `json:"trusted_users" yaml:"trusted_users"`
FrontDoor *string `json:"front_door" yaml:"front_door"`
AwsFsxOpenzfsCsiVersion string `json:"aws_fsx_openzfs_csi_version" yaml:"aws_fsx_openzfs_csi_version"`
AwsLbcVersion string `json:"aws_lbc_version" yaml:"aws_lbc_version"`
ExternalDnsVersion string `json:"external_dns_version" yaml:"external_dns_version"`
GrafanaVersion string `json:"grafana_version" yaml:"grafana_version"`
KubeStateMetricsVersion string `json:"kube_state_metrics_version" yaml:"kube_state_metrics_version"`
MetricsServerVersion string `json:"metrics_server_version" yaml:"metrics_server_version"`
MimirVersion string `json:"mimir_version" yaml:"mimir_version"`
SecretStoreCsiAwsProviderVersion string `json:"secret_store_csi_aws_provider_version" yaml:"secret_store_csi_aws_provider_version"`
SecretStoreCsiVersion string `json:"secret_store_csi_version" yaml:"secret_store_csi_version"`
TailscaleEnabled bool `json:"tailscale_enabled" yaml:"tailscale_enabled"`
TigeraOperatorVersion string `json:"tigera_operator_version" yaml:"tigera_operator_version"`
TraefikForwardAuthVersion string `json:"traefik_forward_auth_version" yaml:"traefik_forward_auth_version"`
TraefikVersion string `json:"traefik_version" yaml:"traefik_version"`
}
4 changes: 4 additions & 0 deletions lib/types/workload.go
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,10 @@ type AzureWorkloadClusterConfig struct {

// Optional: Root disk size for system node pool in GB (defaults to 128)
SystemNodePoolRootDiskSize *int `yaml:"system_node_pool_root_disk_size,omitempty"`

// Optional: When true, enables AKS ForceUpgrade which bypasses PDB constraints during cluster upgrades.
// Use during maintenance windows when you accept disruption to workloads protected by PDBs.
ForceMaintenance bool `yaml:"force_maintenance,omitempty"`
}

type AzureWorkloadClusterComponentConfig struct {
Expand Down
Loading
Loading