fix: Omit cleaning containerless kernels which are still creating its container#2317
Conversation
Your org has enabled the Graphite merge queue for merging into mainAdd the label “flow:merge-queue” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “flow:hotfix” to add to the merge queue as a hot fix. You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link. |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
f0985d1 to
17b9bfa
Compare
bd7d1e4 to
25e70f3
Compare
552b746 to
59cbbc9
Compare
17b9bfa to
dcca90e
Compare
c8c6916 to
a648eb3
Compare
| def __setstate__(self, props) -> None: | ||
| # Used when a `Kernel` object is loaded from pickle data. | ||
| if "state" not in props: | ||
| props["state"] = KernelLifecycleStatus.RUNNING |
There was a problem hiding this comment.
When we shutdown and restart an agent to update its version, kernel_registry is dumped as a pickle file and the agent loads the pickle file when it restarts. Old Kernel objects that are dumped before the version update do not have state field when we restart agent.
We need to insert the state value to the old Kernel objects.
… container (#2317) Co-authored-by: Kyujin Cho <kyujin.cho@lablup.com> Backported-from: main (24.09) Backported-to: 24.03 Backport-of: 2317

When we create a kernel, agent registers the kernel to
kernel_registrybefore creating an actual container.sync_container_lifecycles()task scanskernel_registryand deregister kernels that do not have an actual container.If the task scans at the moment before the container is created after the kernel is registered, the kernel being created gets removed, which is a malfunction.
Let's add
statefield toKernelobject so that the task can omit to clean kernels being created.Checklist: (if applicable)