-
Notifications
You must be signed in to change notification settings - Fork 21
feat: skip work applier status updates if possible #375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: skip work applier status updates if possible #375
Conversation
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
|
Note: as anticipated after this PR the status update logic in work applier has reached the linter code complexity limit; to unblock progress that specific method is now exempted from the linter, will submit separate PRs to refactor the method. |
|
This PR precedes #118. |
ryanzhang-oss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we make sure that a work's status in cache is actually the same in etcd? If we just compare with the cache and stop updating, we run the risk of the status never gets updated to the correct state (although the chance is small).
Hi Ryan! For the work applier we already requeue periodically (with back-off) so unless the client-side cache becomes significantly out-of-sync with the API server (in this case all writes will be rejected with optimistic locks), any inconsistent status updates will be overwritten within at max. 15 minutes. |
| return controller.NewAPIServerError(false, err) | ||
|
|
||
| // Skip the status update if no change found. | ||
| if equality.Semantic.DeepEqual(originalStatus, &appliedWork.Status) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have an idea of how many calls this generates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! Do you mean the # of status updates this setup will skip? Or do you mean the time complexity about the DeepEqual calls?
For the former, in the target perf test environment each member agent was generating roughly 950K status updates per 24h (though this number might be a bit biased due to the agents being restarted occasionally).
For the latter, deep-equaling is usu. expensive, esp. when the object is complex. I don't have specifics on deep-equaling Work object status data now, but I could do a mini benchmarking if you are interested.
| } | ||
|
|
||
| func shouldSkipStatusUpdate(isDriftedOrDiffed, isStatusBackReportingOn bool, originalStatus, currentStatus *fleetv1beta1.WorkStatus) bool { | ||
| if isDriftedOrDiffed || isStatusBackReportingOn { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will reduce the effectiveness of this PR by a lot, is there anyway to soften the blow a bit further?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! We might need a flag to omit observation timestamps if needed? Though with exponential backoff, as long as the number of drifted/diffed placements + status back-reporting placements is low, and the system does not restart often, the # of calls should be relatively limited (~96 writes per placement) when everything stablizes.
Description of your changes
As a performance improvement, set the work applier to skip status updates if possible.
I have:
make reviewableto ensure this PR is ready for review.How has this code been tested
Special notes for your reviewer