fix: sync agent's kernel-registry to actual container periodically#2179
Closed
fregataa wants to merge 2 commits intotopic/05-23-fix_enhanced_kernel_termination_handlingfrom
Conversation
Your org has enabled the Graphite merge queue for merging into mainAdd the label “flow:merge-queue” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “flow:hotfix” to add to the merge queue as a hot fix. You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link. |
Member
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This was referenced May 26, 2024
50ab7d0 to
964c9f3
Compare
7 tasks
6698e2a to
11f42db
Compare
7ca0eb4 to
58002f0
Compare
11f42db to
c1cd4fa
Compare
58002f0 to
09893dc
Compare
c1cd4fa to
20a9b6c
Compare
09893dc to
b351f26
Compare
Member
Author
|
Close this PR since the idea of it is not confirmed |
7 tasks
This was referenced Jun 20, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Intro
The status information of the container is divided into three types in BackendAI system: DB on the manager side, agent's kernel registry, and actual container. This PR is about agent's kernel registry and actual container.
Problem
In the current implementation, kernel data is inserted and removed from the agent's kernel registry in the task of creating and destroying containers. In the case of a container creation, when any unhandled error occurs, the kernel data inserted into the kernel registry is removed. Such removing is not reliable and any other unpredictable errors can cause mismatch between kernel registry and actual container state.
So, let's sync kernel registry to the actual container state in a periodic loop.
Checklist: (if applicable)