feat: agent resource sync API#2180
Closed
fregataa wants to merge 6 commits intotopic/07-22-feat_schedule_function_returns_kernel-agent_bindingfrom
Closed
Conversation
Your org has enabled the Graphite merge queue for merging into mainAdd the label “flow:merge-queue” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “flow:hotfix” to add to the merge queue as a hot fix. You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link. |
1 task
Member
Author
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
1 task
50ab7d0 to
964c9f3
Compare
e8d2106 to
5dd7136
Compare
7 tasks
5dd7136 to
94d419d
Compare
7ca0eb4 to
58002f0
Compare
94d419d to
fdd53c9
Compare
58002f0 to
09893dc
Compare
fdd53c9 to
a771f22
Compare
97d9340 to
57a88aa
Compare
3446b86 to
62fd88d
Compare
57a88aa to
0d31308
Compare
62fd88d to
4866bb1
Compare
0d31308 to
e561e92
Compare
4866bb1 to
6af3d2c
Compare
e561e92 to
fd52d56
Compare
3 tasks
6af3d2c to
37fe73f
Compare
fd52d56 to
7ca2baf
Compare
37fe73f to
0165d2e
Compare
7ca2baf to
9cd1b18
Compare
0165d2e to
566a43c
Compare
9cd1b18 to
2e3f8bf
Compare
566a43c to
e7268d5
Compare
Merged
2 tasks
This was referenced Jul 22, 2024
Member
Author
|
Closed this PR because we will implement kernel heartbeat and revamp agent heartbeat, which leads to resolve this issue |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

resolves #2142 https://github.com/lablup/giftbox/issues/262
Agents's
sync-and-get-kernels()APIThe API that synchronizes agent's kernels to kernel information specified by API parameters (preparing_kernels, pulling_kernels, running_kernels, terminating_kernels). It assumes that the kernel information given by the parameter is the "truth".
If any of kernel information mismatch between
kernel_registryandrunning_kernels(orterminating_kernels), agent injects termination event to terminate the kernel.sync-and-get-kernels()API returns actual { running, terminating, terminated } kernels (which is not used for now).actual_terminated_kernelscontains terminated kernels specified asrunning_kernelsby API parameter.How to use
POST /session/_/sync-agent-resourcemanager APIafter-scheduling,before-kernel-creationandon-creation-failure].on-creation-failure: Set by default. Call resource sync when kernel creation failed byInsufficientResourceexceptionafter-scheduling: Call resource sync right after scheduling on a scaling groupbefore-kernel-creation: Call resource sync before calling create-kernels agent APINote
on-creation-failureoption cannot not handle ExceptionGroup including multipleInsufficientResourceexceptions, which is raised by creation failure of multi kernel session. It covers only creation failure of single kernel session.This will be resolved after merge lablup/callosum#30
Checklist: (if applicable)
docsdirectory📚 Documentation preview 📚: https://sorna--2180.org.readthedocs.build/en/2180/
📚 Documentation preview 📚: https://sorna-ko--2180.org.readthedocs.build/ko/2180/