Conversation
Add a new ClusterClass topology variable `karpenter-node-role-additional-inline-policies` (sister to the existing `karpenter-node-role-additional-policy-arns`) so cluster operators can declare IAM inline policies on the karpenter node Role end-to-end through the ACK Role CR's `spec.inlinePolicies` field. ACK + kro v0.8.4 then sees the policies as declarative state instead of treating externally-attached inline policies as drift to delete. Concretely, this eliminates the ~6.5h race observed on pmc-local where the IRSA-give-up bundle (cert-manager-route53-dns01-txt-only + external-dns-route53-record-management) attached by demo-ops/scripts/ 05-pmc-node-role-policies.sh was repeatedly removed by ACK reconcile, causing cert-manager DNS01 challenges to permanently stall. Changes: - constants.go: add `topologyNodeRoleAdditionalInlinePoliciesVariableName` next to its sister `*PolicyARNs*` const so future readers see the symmetric pair. - karpenter_bootstrapper_controller.go: read the topology variable as a map[string]string (new `readTopologyStringMap` helper, modelled on `readTopologyStringSlice`) and forward it to a widened `ensureIAMRoleForEC2(..., inlinePolicies map[string]string)`. The writer materializes `spec.inlinePolicies` only when the map is non-empty so existing clusters that do not opt in remain byte-identical at the ACK CR level. - karpenter_bootstrapper_controller_test.go: cover the three documented cases (no inline = absent on spec, inline declared = both entries preserved verbatim, managed and inline coexist independently) plus a round-trip test on the new map reader. - hack/capi/metadata.yaml: prepend a v0.4 series entry for the upcoming release cut (clusterctl requires descending order by major,minor). Production cluster specs that omit the new variable see no behavior change (the writer skips spec.inlinePolicies entirely when the map is empty), so this is a backward-compatible feature suitable for a v0.4.0 minor release. See design/apth-1704-... and impl/apth-1704-... in github.com/reoring/demo0521 for the full design + implementation hand-off. Linear: APTH-1704
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
karpenter-node-role-additional-inline-policies(sister tokarpenter-node-role-additional-policy-arns) so cluster operators can declare IAM inline policies on the karpenter node Role end-to-end through ACK Role CR'sspec.inlinePoliciespmc-localwhere ACK + kro v0.8.4 reconcile interpreted externally-attached inline policies as drift and issuedDeleteRolePolicy, breaking cert-manager DNS01 challengesspec.inlinePoliciesentirely when the map is empty) — production clusters are byte-identical at the ACK CR levelBackground
5/6 evening cert chain dive on
pmc-localrevealed thatdemo-ops/scripts/05-pmc-node-role-policies.shattaches the IRSA-give-up bundle (cert-manager-route53-dns01-txt-only+external-dns-route53-record-management) directly viaaws iam put-role-policy, but ACK reconcile silently removes them withDeleteRolePolicybecause the ACK Role CR'sspec.inlinePoliciesis empty. CloudTrailpmc-local-nodeevents confirm the cycle:userAgent =
aws-controllers-k8s/iam.services.k8s.aws-1.6.2 ManagedBy/kro KROVersion/v0.8.4The root fix is to make the inline policies declarative on the ACK Role CR through a new ClusterClass topology variable. See
design/apth-1704-...in demo0521 for the full design.Changes
constants.go: addtopologyNodeRoleAdditionalInlinePoliciesVariableNamenext to its sister*PolicyARNs*const so future readers see the symmetric pairkarpenter_bootstrapper_controller.go: read the topology variable asmap[string]string(newreadTopologyStringMaphelper, modelled onreadTopologyStringSlice) and forward to a widenedensureIAMRoleForEC2(..., inlinePolicies map[string]string). Writer materializesspec.inlinePoliciesonly when the map is non-empty.karpenter_bootstrapper_controller_test.go: cover the documented cases — (1) no inline = absent on spec, (2) inline declared = both entries preserved verbatim, (3) managed and inline coexist independently — plus a round-trip test on the new map readerhack/capi/metadata.yaml: prepend v0.4 series entry for the upcoming release cut (clusterctl requires descending order)Test plan
go build ./...(passing)go vet ./...(passing)go test ./...all package tests passing including envtestinternal/controller/infrastructureTestEKSKarpenterBootstrapperReconciler_EnsureIAMRoleForEC2_InlinePoliciesDeclaredOnSpec+TestReadTopologyStringMap_RoundTripsClusterClassVariablev0.4.0release tag after merge → release pipeline auto-publishes the 5 clusterctl-provider artifactsappthrust/platformPR atomically lands:kany8s-eks-byo-clusterclass/templates/clusterclass.yaml(new variable schema),packages-0/.../capi-kany8s-installation.yaml(v0.4.0 pin x4),hack/local/clusters/boot-single.sh(vars.yaml render with inline policies sourced fromlib/policies.sh)reoring/demo0521PR fordemo-ops/scripts/02-route53-zones.sh(zone ID file cache) +05-pmc-node-role-policies.shdeprecation headerDeleteRolePolicyevents onpmc-local-nodeRelated