Skip to content

Rebase to Kubernetes 1.35.1#3842

Merged
kcp-ci-bot merged 13 commits intokcp-dev:mainfrom
xmudrii:1.35.1-prep
Mar 12, 2026
Merged

Rebase to Kubernetes 1.35.1#3842
kcp-ci-bot merged 13 commits intokcp-dev:mainfrom
xmudrii:1.35.1-prep

Conversation

@xmudrii
Copy link
Member

@xmudrii xmudrii commented Feb 18, 2026

Summary

This PR rebases kcp to Kubernetes 1.35.1. The kcp-dev/kubernetes fork has been already updated to 1.35.1. Go has been updated to 1.25 with this rebase.

The rebase applied mostly cleanly. Some tests that were very close to the timeout started hitting that timeout with this rebase and Go update, so I had to increase timeouts for those tests. TestAPIExportAPIBindingsAccess had to be disabled because it became very flaky with this PR, however, it has been flaky before too (#3844). This test will be handled as a follow up.

#3897 should be merged either before or after this PR, it includes some additional fixes.

What Type of PR Is This?

/kind feature

Related Issue(s)

xref #3813

Release Notes

- Update kcp to Kubernetes 1.35.1
- Update Go to 1.25.7

@kcp-ci-bot kcp-ci-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. dco-signoff: yes Indicates the PR's author has signed the DCO. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 18, 2026
@xmudrii
Copy link
Member Author

xmudrii commented Feb 19, 2026

/test pull-kcp-verify

@kcp-dev kcp-dev deleted a comment from kcp-ci-bot Feb 19, 2026
@xmudrii
Copy link
Member Author

xmudrii commented Feb 19, 2026

/retest

@xmudrii xmudrii force-pushed the 1.35.1-prep branch 2 times, most recently from a7e6911 to bdee9e5 Compare February 23, 2026 13:58
@kcp-ci-bot kcp-ci-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2026
@kcp-ci-bot kcp-ci-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 4, 2026
@xmudrii xmudrii force-pushed the 1.35.1-prep branch 2 times, most recently from 032aad7 to e40997d Compare March 4, 2026 15:13
@xmudrii
Copy link
Member Author

xmudrii commented Mar 4, 2026

/retest

@xmudrii
Copy link
Member Author

xmudrii commented Mar 4, 2026

/test pull-kcp-test-e2e-sharded

1 similar comment
@xmudrii
Copy link
Member Author

xmudrii commented Mar 4, 2026

/test pull-kcp-test-e2e-sharded

@xmudrii
Copy link
Member Author

xmudrii commented Mar 4, 2026

/test pull-kcp-test-e2e-multiple-runs

@xmudrii
Copy link
Member Author

xmudrii commented Mar 4, 2026

/test pull-kcp-test-e2e-sharded

1 similar comment
@xmudrii
Copy link
Member Author

xmudrii commented Mar 4, 2026

/test pull-kcp-test-e2e-sharded

@xmudrii xmudrii force-pushed the 1.35.1-prep branch 3 times, most recently from 427cde2 to 4c4f54b Compare March 6, 2026 13:52
@xmudrii
Copy link
Member Author

xmudrii commented Mar 6, 2026

/test pull-kcp-test-e2e

@xmudrii
Copy link
Member Author

xmudrii commented Mar 9, 2026

/retest

@xmudrii
Copy link
Member Author

xmudrii commented Mar 9, 2026

/test pull-kcp-test-integration

1 similar comment
@xmudrii
Copy link
Member Author

xmudrii commented Mar 9, 2026

/test pull-kcp-test-integration

xmudrii added 3 commits March 11, 2026 08:45
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
@xmudrii xmudrii marked this pull request as draft March 11, 2026 08:02
@kcp-ci-bot kcp-ci-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 11, 2026
xmudrii added 4 commits March 12, 2026 12:08
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
@xmudrii xmudrii changed the title [WIP] Rebase to Kubernetes 1.35.1 Rebase to Kubernetes 1.35.1 Mar 12, 2026
@xmudrii xmudrii marked this pull request as ready for review March 12, 2026 11:28
@kcp-ci-bot kcp-ci-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Mar 12, 2026
Copy link

@mjudeikis-bot mjudeikis-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Kubernetes 1.35.1 Rebase

Overall the rebase looks clean. Most changes are mechanical API adaptations. A few things worth calling out:


🔴 controller.go — Silent error swallowing in processLoop

The old code:

utilruntime.HandleErrorWithContext(ctx, err, "Failed to process object from queue")

is removed. Non-ErrFIFOClosed errors are now silently swallowed. If the queue returns a processing error (e.g. a user-provided ProcessFunc errors out), it will be completely invisible.

The ShouldResync branch had no actual behavior (just a comment), so losing that is fine. But losing the error log is a regression — at minimum errors should still be logged.

Also: the obj == nil exit condition from the old Pop return is gone. Need to confirm upstream guarantees the new c.config.Pop path never needs the nil check (likely fine if API changed, but worth a comment).


🟡 nodedeclaredfeatures admission plugin — new default-on

nodedeclaredfeatures.PluginName added to defaultOnKubePluginsInKube. This validates node-declared feature gates. In kcp, nodes are virtual — need to confirm this plugin is a no-op when no Node objects exist, otherwise it could fail admission for node-touching requests. If kcp does not expose Node resources in virtual workspaces, this is probably fine.


🟡 processDeltasInBatch — partial state on TransactionError

err := txnStore.Transaction(txns...)
if err != nil {
    for _, i := range err.SuccessfulIndices {
        callbacks[i]()  // only fires for successful txns
    }
    return fmt.Errorf(...)
}

If a batch partially fails, the store has partial state but only the successful callbacks fire. Event handlers will see some adds/updates but not others. Depending on whether TransactionStore is truly atomic or partially applied, this could lead to inconsistency between store state and what handlers observed. Worth a comment or test.


✅ Things that look good

  • etcd separate context (etcdCtx in both cmd/kcp/kcp.go and cmd/cache-server/main.go): Correct fix — etcd now outlives the apiserver during graceful shutdown, preventing CRD watch cache lazy init from blocking on a dead etcd connection.
  • NewRealFIFOWithOptions: Mechanical API change, correct.
  • WatchList semantics check: Good defensive check — disables WatchList if the ListerWatcher does not support the semantics rather than silently misbehaving.
  • responsewriters.Forbidden ctx removal: Mechanical upstream API change, applied consistently in all call sites.
  • MustBaseEnvSet boolean param removal: Upstream API change, correct.
  • min-compatibility-version flag allowlisted: New upstream flag, benign.
  • NewServiceAccountsController logger param: Upstream API change, correct.

Questions

  1. The processLoop now uses a non-blocking select { case <-ctx.Done(): return; default: ... } — context cancellation is only checked between pops, not mid-block. Is this intentional upstream behavior, or should the queue implement context-aware blocking?
  2. Is nodedeclaredfeatures confirmed to be a no-op in kcp's virtual workspace model before merging?

/cc @mjudeikis

@kcp-ci-bot kcp-ci-bot requested a review from mjudeikis March 12, 2026 11:44
Copy link

@mjudeikis-bot mjudeikis-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline review comments on key changes

return
}
}
utilruntime.HandleErrorWithContext(ctx, err, "Failed to process object from queue")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Silent error swallowing regression

The previous code called utilruntime.HandleErrorWithContext(ctx, err, "Failed to process object from queue") here for any non-ErrFIFOClosed error. That line has been removed.

Now all non-FIFO errors (e.g. a ProcessFunc returning an error) are silently dropped with no log, no metric, no panic. This hides real problems and makes debugging very difficult.

Suggestion: add the error logging back:

if err != nil {
    if errors.Is(err, cache.ErrFIFOClosed) {
        return
    }
    utilruntime.HandleErrorWithContext(ctx, err, "Failed to process object from queue")
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link! Confirmed — aligning with upstream here is the right call. The error handling in upstream controller.go matches this pattern, so the concern is resolved. ✅

if err != nil {
if err == cache.ErrFIFOClosed {
return
select {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Context cancellation is non-blocking between pops only

The select { case <-ctx.Done(): return; default: ... } check is non-blocking — it only fires at the top of the loop, not while blocked inside Pop/PopBatch. If the queue is empty and Pop is blocking, a ctx cancellation will not be noticed until the next item is popped.

This changes the shutdown behaviour compared to the old code where cancellation could theoretically be handled differently. Worth a comment explaining this is intentional and the queue is expected to unblock on close.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copy code, we need to check how its aligned with upstream

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is copied code, I wouldn't touch it as of now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough — if it is a verbatim copy of upstream code, keeping divergence low is the right trade-off. Worth opening a tracking issue upstream if the non-blocking ctx cancellation is ever a problem in practice. ✅

})
} else {
txn := cache.Transaction{
Type: cache.TransactionTypeAdd,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Partial state on TransactionError — potential store/handler inconsistency

When txnStore.Transaction(txns...) returns a TransactionError, only callbacks for SuccessfulIndices are fired. This is correct if the TransactionStore is truly atomic (all-or-nothing). But if it's partially applied (some items written, some not), event handlers will be notified about some items but the store has a different set — leading to inconsistency between store state and handler view.

Please add a comment clarifying the atomicity contract expected of TransactionStore.Transaction(), or add a test covering the partial-failure path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to verify with upstream lib if this same issue in there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has been completely copied from the upstream.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — if this is a straight upstream copy, then the partial-state concern applies equally to upstream. Worth verifying via upstream issue tracker rather than diverging here. ✅

// This prevents CRD watch cache lazy initialization from blocking on a dead etcd.
etcdCtx, etcdCancel := context.WithCancel(context.Background())
defer etcdCancel()
if completedConfig.EmbeddedEtcd.Config != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Good fix — etcd now outlives the apiserver during graceful shutdown via a separate context.Background()-rooted context. This prevents CRD watch cache lazy initialization from blocking on a dead etcd connection during shutdown. The defer etcdCancel() correctly cleans up when the command handler returns.

@mjudeikis mjudeikis requested a review from mjudeikis-bot March 12, 2026 11:50
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
@mjudeikis
Copy link
Contributor

/lgtm
/approve

lets see if it now blow into our faces :D :D thanks @xmudrii

@kcp-ci-bot kcp-ci-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 12, 2026
@kcp-ci-bot
Copy link
Contributor

LGTM label has been added.

DetailsGit tree hash: 23e6892c6df728d5c3cb8418eacdfacf66c699d1

@kcp-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mjudeikis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 12, 2026
@kcp-ci-bot kcp-ci-bot merged commit 758c435 into kcp-dev:main Mar 12, 2026
15 checks passed
@xmudrii xmudrii deleted the 1.35.1-prep branch March 12, 2026 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has signed the DCO. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants