WIP: OCPBUGS-57474: Authorization Cache V2#530
WIP: OCPBUGS-57474: Authorization Cache V2#530jacobsee wants to merge 6 commits intoopenshift:mainfrom
Conversation
…orizationCacheV2 implementation and switch all internal usage to an interface. Move the original resource-version-based cache to cachev1.go, update wiring and tests, and ensure all consumers use the new interface type.
|
Skipping CI for Draft Pull Request. |
|
/test all |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jacobsee The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/retest |
…eV2 for event-driven cache updates
|
/retest |
|
/retest |
1 similar comment
|
/retest |
…uthorizationCacheV2 to use it. Switch to Workqueue instead of custom queueing logic. Use LastSyncResourceVersion in global rbac cache to short circuit hashing.
|
/retest |
|
/retest |
benluddy
left a comment
There was a problem hiding this comment.
Thanks for all your work on this, it looks really promising!
| @@ -0,0 +1,645 @@ | |||
| package auth | |||
There was a problem hiding this comment.
Putting the cache.go/cache_test.go renames in a clean commit would make it obvious that there are no little changes mixed in.
| informers.RoleBindings().Informer(), | ||
| } | ||
|
|
||
| globalRBACCache := NewGlobalRBACCache(scrLister, scrbLister) |
There was a problem hiding this comment.
Does it make sense to pass this in rather than constructing it here? You'd be able to write tests that simulate varying the global RBAC without setting up phony informers.
| } | ||
|
|
||
| // Run begins watching and synchronizing the cache | ||
| func (ac *AuthorizationCacheV2) Run(period time.Duration) { |
There was a problem hiding this comment.
Can the periodic "syncs" be entirely removed in favor of the informer event handlers with periodic resync intervals?
| // Start periodic synchronization | ||
| go func() { | ||
| // Initial sync | ||
| rac.synchronize() | ||
|
|
||
| ticker := time.NewTicker(period) | ||
| defer ticker.Stop() | ||
|
|
||
| for { | ||
| select { | ||
| case <-ticker.C: | ||
| rac.synchronize() | ||
| case <-rac.stopCh: | ||
| return | ||
| } | ||
| } | ||
| }() |
There was a problem hiding this comment.
Why is this still needed?
| const ( | ||
| globalSyncKey = "global" | ||
| namespaceKeyPrefix = "namespace:" | ||
| ) |
There was a problem hiding this comment.
IMO, it would be at least as clear to use "" as a sentinel key for all namespaces and get rid of the string prefixing.
| refs = append(refs, rbacResourceRef{ | ||
| uid: string(cr.UID), | ||
| resourceVersion: cr.ResourceVersion, | ||
| kind: "ClusterRole", |
There was a problem hiding this comment.
Is kind necessary? We shouldn't have any two objects with the same UID.
| // If we don't have previous versions cached, consider it changed | ||
| if g.lastClusterRoleVersion == "" || g.lastClusterRoleBindingVersion == "" { | ||
| return true | ||
| } |
There was a problem hiding this comment.
This looks redundant. When would we return true from this path but not the path that immediately follows?
| // GetQueueStatus returns information about the queue | ||
| func (rac *ReactiveAuthorizationCacheV2) GetQueueStatus() (queued int) { | ||
| return rac.queue.Len() | ||
| } |
There was a problem hiding this comment.
It seems brittle to have tests that directly depend on this... can't we test the observable results of the workqueue processor without knowing about it?
| } | ||
|
|
||
| // GetCurrentHash returns the current global RBAC hash | ||
| func (g *GlobalRBACCache) GetCurrentHash() string { |
There was a problem hiding this comment.
Do callers need to be responsible for detecting that this hash might be stale and for updating it? I think correctness depends on always getting the latest hash. Caching based on the last sync RV makes sense as an optimization, but I think it's an implementation detail and not something callers care about.
| @@ -129,7 +54,7 @@ func TestSyncNamespace(t *testing.T) { | |||
| nsIndexer := cache.NewIndexer(cache.MetaNamespaceKeyFunc, cache.Indexers{}) | |||
| nsLister := corev1listers.NewNamespaceLister(nsIndexer) | |||
|
|
|||
| authorizationCache := NewAuthorizationCache( | |||
There was a problem hiding this comment.
Are there tests that can be run against both implementations? Or a shim that runs both implementations for real and detects behavior differences?
|
@jacobsee: This pull request references Jira Issue OCPBUGS-57474, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Excluded labels (none allowed) (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
|
@jacobsee: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
AuthorizationCachefromcache.gotoAuthorizationCacheV1incachev1.go(no longer instantiated anywhere currently)AuthorizationCachetocache.goAuthorizationCacheV2, which uses periodic cluster and namespace-specific hashing to track changes and invalidate the cache/notify watchersReactiveAuthorizationCacheV2, which wrapsAuthorizationCacheV2in event-handling so that heavy sync checks can happen less frequently and changes are still processed in a timely mannerReactiveAuthorizationCacheV2Motivated by the fact that the current auth cache does not appear to be notifying properly when permissions have been removed during incremental synchronizations (https://issues.redhat.com/browse/OCPBUGS-57474), and it looks to be an old issue in a system that is rather difficult to reason around.