Skip to content

Conversation

@michaelawyu
Copy link
Collaborator

Description of your changes

This PR adds a new controller that enables mirroring of status between the hub cluster side and the member cluster side.

It is a part of a series of PRs to enable status back-reporting in KubeFleet.

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

  • Unit tests
  • Integration tests (limited)

Special notes for your reviewer

To control the PR size, additional integration tests will be submitted in separate PRs.

Signed-off-by: michaelawyu <chenyu1@microsoft.com>
@michaelawyu
Copy link
Collaborator Author

Note: as this controller requires other components to fully function; it is NOT currently enabled.

Signed-off-by: michaelawyu <chenyu1@microsoft.com>
@codecov
Copy link

codecov bot commented Nov 13, 2025

Codecov Report

❌ Patch coverage is 72.91667% with 39 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pkg/controllers/statusbackreporter/controller.go 72.91% 32 Missing and 7 partials ⚠️

📢 Thoughts on this report? Let us know!

}

// Validate the scheduling policy of the placement object.
schedulingPolicy := placementObj.GetPlacementSpec().Policy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you do these validation in the CRP validator when someone specifies the report back strategy? If so does it make sense to move some of these validations into the common validator package?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Wei! We are switching to CEL-based validation as I understand (a separate PR will be prepared to add the CEL rule for this part). The CRP validator methods are currently only used for the webhooks as I understand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though honestly speaking I do not have a very strong opinion on this -> adding the logic to the webhook part as well definitely helps, though because of the reason that webhooks are not 100% bullet-proof we still need to do some sanity check here.

// formatResourceIdentifier formats a ResourceIdentifier object to a string for keying purposes.
//
// The format in use is `[API-GROUP]/[API-VERSION]/[API-KIND]/[NAMESPACE]/[NAME]`, e.g., `/v1/Namespace//work`.
func formatResourceIdentifier(resourceIdentifier *placementv1beta1.ResourceIdentifier) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be useful beyond this package? I am wondering if this belongs to the pkg/utils

similarly formatWorkResourceIdentifier might also belong there?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess... though at this moment I think only this controller and the work applier need the formatter (the work applier only uses the work resource identifier one); and for the work applier the logic is bit different (it needs to handle cases where manifests cannot be decoded).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a TO-DO comment to note the possible factoring opportunity should the situation materializes.

// improves, the statuses back-reporter should read the label instead for CRP/RP tracking here.
placementObj, err := r.validatePlacementObjectForOriginalResourceStatusBackReporting(ctx, work)
if err != nil {
klog.ErrorS(err, "Failed to validate the placement object associated with the Work object for back-reporting statuses to original resources", "work", workRef)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a user facing error that we should report using EventRecorder?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Wei! For this method there are some errors (e.g., API GET errors) that are not user-facing; and for the sanity check part the plan is to have CEL expressions in place so users are going to see errors when they create the CRP object.

}
)

// createWorkObject creates a new Work object with the given name, manifests, and apply strategy.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy/paste error from workapplier/controller_integration_test.go? Should we consolidate the 2 functions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let me fix the comment. Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the consolidation part, we do not have a test utilities pkg yet and due to the usage of Gomega/Ginkgo deps such pkgs are going to look a bit tricky -> allow me a bit more time to think this through if it is OK

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some edits that do some wrapping; it's a bit overkilling but I think it is better than introducing Gomega pkg into regular code.

return statusBackReportingWrapperData, nil
}

func workObjectRemovedActual(workName string) func() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also a duplicated utility from workapplier/controller_integration_test.go

Copy link
Collaborator Author

@michaelawyu michaelawyu Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved them to a separate pkg though I have to say I have some minor reservation about this. It's nothing major, obviously, but this feels like adding another layer of redirection just for the sake of adding redirection. It just brings so little benefits (not really making the code more maintainable/readable) and at this moment does not seem to be a well-used pattern.


// Prepare a child context.
// Cancel the child context anyway to avoid leaks.
childCtx, cancel := context.WithCancel(ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why create a childCtx instead of just passing ctx into ParallelizeUntil?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Wei! The child context is created to allow this parallelization to be cancelled without impacting the main context (the reconciliation loop context) (and the parent context's cancellation will still propagate to the child context). Though I totally understand your concern that currently the cancel method is not invoked explicitly...

Copy link
Contributor

@ryanzhang-oss ryanzhang-oss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need an e2e to try something like a job

// Note that if the applied resource has a status sub-resource, it is usually safe for us to assume that
// the original resource should also have a status sub-resource of the same format.
unstructured.Object["status"] = statusWrapper["status"]
_, err = r.hubDynamicClient.Resource(gvr).Namespace(nsName).UpdateStatus(ctx, unstructured, metav1.UpdateOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder this will trigger resource change dector to trigger another placement reconcile?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will trigger a change detector action but will be ignored by placement controller when it finds out that there is no meaningful diff

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it will trigger the placement controller for once. We could update the onResourceUpdated method, but it would need a deep equal (current the method uses resource versions), which might be a bit too resource-consuming considering that this method is on the critical path? I could add a TO-DO here just in case we need to evaluate this possibility.

Comment on lines +190 to +191
// TO-DO (chenyu1): check for cases where the API definition is inconsistent between the member cluster
// side and the hub cluster side, and single out the errors as user errors instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, it seems that we probably should start small? like only start with job/deployment i.e. something with stable API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ryan! Yeah, there will be a whitelist based mechanism that controls which APIs can have their status back-reported; as mentioned to Wei earlier this will be submitted as a separate PR.


// Note that if the applied resource has a status sub-resource, it is usually safe for us to assume that
// the original resource should also have a status sub-resource of the same format.
unstructured.Object["status"] = statusWrapper["status"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not check the GKV that we put in the statusWrapper?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ryan! The GVK/GVR we put in the wrapper are there only for completeness reasons (otherwise the serialization/deserialization would fail) -> the info are directly copied from the work resource identifier, which we already use in this controller.

@michaelawyu
Copy link
Collaborator Author

i think we need an e2e to try something like a job

Hi Ryan! Yeah; this controller has not been enabled yet -> will add E2Es in a separate PR.

Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants