Evict pod based on GRPC response instead of force check on every call#1055
Merged
Conversation
✅ Deploy Preview for kpt-porch ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refactors the pod evaluator to stop proactively health-checking cached pods on every request, and instead evict pods reactively when gRPC calls fail with transient errors—improving request-path latency by avoiding repeated Kubernetes API calls.
Changes:
- Removed
removeUnhealthyPodsfrom the pod cache manager’s per-request path. - Added an eviction channel so callers can request cache eviction when gRPC calls fail transiently, and updated
EvaluateFunctionto retry until success or context deadline. - Updated a pod cache manager test expectation to match the new request-path logging/behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
func/internal/podevaluator.go |
Adds retry+eviction logic based on gRPC status codes; adjusts default waitlist length fallback. |
func/internal/podcachemanager.go |
Introduces eviction channel handling in the cache manager event loop to remove dead pods from cache. |
func/internal/podevaluator_podcachemanager_test.go |
Updates expected log output to align with new request-path behavior. |
3bf6677 to
39526e1
Compare
8b47a67 to
2d3347e
Compare
95ed6c3 to
3a83275
Compare
rendre-greyling
previously approved these changes
Jun 22, 2026
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
…from function runner deployment Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
5ccfb47
6e82268 to
5ccfb47
Compare
Signed-off-by: Kushal Harish Naidu <kushal.harish.naidu@ericsson.com>
|
rendre-greyling
approved these changes
Jun 22, 2026
efiacor
approved these changes
Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Title
Evict pod based on GRPC response instead of force check on every call
Description
What changed: Removed removeUnhealthyPods from the pod cache manager event loop request path. Added an eviction channel so dead pods are removed from cache reactively when grpc calls fail with status.Code(err) == codes.Unavailable.
EvaluateFunction now loops — evicting dead pods and with a configurable retry mechanism.
Why it’s needed: removeUnhealthyPods made 3 Kubernetes API calls per cached pod on every incoming request, blocking the single-goroutine event loop for ~700ms with 11 pods. This serialized all function evaluations and caused increased latency per render.
How it works: The event loop dispatches instantly (no health checks). When grpc reports with status.Code(err) == codes.Unavailable, the caller sends a podEvictionRequest to the event loop which removes that pod from cache and clears it from the cluster, then requests a new pod and retries. Real function errors return immediately. The periodic GC still catches any remaining stale pods.
Related Issue(s)
Type of Change
Checklist
Testing Instructions (Optional)
Additional Notes (Optional)
AI Disclosure
If so, please describe how:
Microsoft Copilot to analyse the code.
Kiro to generate eviction channel code.