-
Notifications
You must be signed in to change notification settings - Fork 21
feat: add status back-reporting supprt (2/, new controller) #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add status back-reporting supprt (2/, new controller) #329
Conversation
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
|
Note: as this controller requires other components to fully function; it is NOT currently enabled. |
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| } | ||
|
|
||
| // Validate the scheduling policy of the placement object. | ||
| schedulingPolicy := placementObj.GetPlacementSpec().Policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you do these validation in the CRP validator when someone specifies the report back strategy? If so does it make sense to move some of these validations into the common validator package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Wei! We are switching to CEL-based validation as I understand (a separate PR will be prepared to add the CEL rule for this part). The CRP validator methods are currently only used for the webhooks as I understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though honestly speaking I do not have a very strong opinion on this -> adding the logic to the webhook part as well definitely helps, though because of the reason that webhooks are not 100% bullet-proof we still need to do some sanity check here.
| // formatResourceIdentifier formats a ResourceIdentifier object to a string for keying purposes. | ||
| // | ||
| // The format in use is `[API-GROUP]/[API-VERSION]/[API-KIND]/[NAMESPACE]/[NAME]`, e.g., `/v1/Namespace//work`. | ||
| func formatResourceIdentifier(resourceIdentifier *placementv1beta1.ResourceIdentifier) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be useful beyond this package? I am wondering if this belongs to the pkg/utils
similarly formatWorkResourceIdentifier might also belong there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess... though at this moment I think only this controller and the work applier need the formatter (the work applier only uses the work resource identifier one); and for the work applier the logic is bit different (it needs to handle cases where manifests cannot be decoded).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a TO-DO comment to note the possible factoring opportunity should the situation materializes.
| // improves, the statuses back-reporter should read the label instead for CRP/RP tracking here. | ||
| placementObj, err := r.validatePlacementObjectForOriginalResourceStatusBackReporting(ctx, work) | ||
| if err != nil { | ||
| klog.ErrorS(err, "Failed to validate the placement object associated with the Work object for back-reporting statuses to original resources", "work", workRef) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a user facing error that we should report using EventRecorder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Wei! For this method there are some errors (e.g., API GET errors) that are not user-facing; and for the sanity check part the plan is to have CEL expressions in place so users are going to see errors when they create the CRP object.
| } | ||
| ) | ||
|
|
||
| // createWorkObject creates a new Work object with the given name, manifests, and apply strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy/paste error from workapplier/controller_integration_test.go? Should we consolidate the 2 functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let me fix the comment. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the consolidation part, we do not have a test utilities pkg yet and due to the usage of Gomega/Ginkgo deps such pkgs are going to look a bit tricky -> allow me a bit more time to think this through if it is OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some edits that do some wrapping; it's a bit overkilling but I think it is better than introducing Gomega pkg into regular code.
| return statusBackReportingWrapperData, nil | ||
| } | ||
|
|
||
| func workObjectRemovedActual(workName string) func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also a duplicated utility from workapplier/controller_integration_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved them to a separate pkg though I have to say I have some minor reservation about this. It's nothing major, obviously, but this feels like adding another layer of redirection just for the sake of adding redirection. It just brings so little benefits (not really making the code more maintainable/readable) and at this moment does not seem to be a well-used pattern.
|
|
||
| // Prepare a child context. | ||
| // Cancel the child context anyway to avoid leaks. | ||
| childCtx, cancel := context.WithCancel(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why create a childCtx instead of just passing ctx into ParallelizeUntil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Wei! The child context is created to allow this parallelization to be cancelled without impacting the main context (the reconciliation loop context) (and the parent context's cancellation will still propagate to the child context). Though I totally understand your concern that currently the cancel method is not invoked explicitly...
ryanzhang-oss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we need an e2e to try something like a job
| // Note that if the applied resource has a status sub-resource, it is usually safe for us to assume that | ||
| // the original resource should also have a status sub-resource of the same format. | ||
| unstructured.Object["status"] = statusWrapper["status"] | ||
| _, err = r.hubDynamicClient.Resource(gvr).Namespace(nsName).UpdateStatus(ctx, unstructured, metav1.UpdateOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder this will trigger resource change dector to trigger another placement reconcile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will trigger a change detector action but will be ignored by placement controller when it finds out that there is no meaningful diff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it will trigger the placement controller for once. We could update the onResourceUpdated method, but it would need a deep equal (current the method uses resource versions), which might be a bit too resource-consuming considering that this method is on the critical path? I could add a TO-DO here just in case we need to evaluate this possibility.
| // TO-DO (chenyu1): check for cases where the API definition is inconsistent between the member cluster | ||
| // side and the hub cluster side, and single out the errors as user errors instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, it seems that we probably should start small? like only start with job/deployment i.e. something with stable API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! Yeah, there will be a whitelist based mechanism that controls which APIs can have their status back-reported; as mentioned to Wei earlier this will be submitted as a separate PR.
|
|
||
| // Note that if the applied resource has a status sub-resource, it is usually safe for us to assume that | ||
| // the original resource should also have a status sub-resource of the same format. | ||
| unstructured.Object["status"] = statusWrapper["status"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we not check the GKV that we put in the statusWrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! The GVK/GVR we put in the wrapper are there only for completeness reasons (otherwise the serialization/deserialization would fail) -> the info are directly copied from the work resource identifier, which we already use in this controller.
Hi Ryan! Yeah; this controller has not been enabled yet -> will add E2Es in a separate PR. |
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
…o feat/status-backreporting-controller
Description of your changes
This PR adds a new controller that enables mirroring of status between the hub cluster side and the member cluster side.
It is a part of a series of PRs to enable status back-reporting in KubeFleet.
I have:
make reviewableto ensure this PR is ready for review.How has this code been tested
Special notes for your reviewer
To control the PR size, additional integration tests will be submitted in separate PRs.