Resolve hosted Kubernetes agent slugs to pod targets#140
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the Kubernetes runtime by improving pod target resolution and user handling for exec and attach operations. It introduces logic to dynamically determine the target execution user based on pod annotations and security contexts, and adds a helper to select the appropriate container for logs. Feedback focuses on optimizing performance by reducing redundant API calls when probing the current user, improving error handling and logic simplification in pod resolution, and removing redundant annotation checks in the attach flow.
| currentUser, err := r.currentExecUser(ctx, namespace, podName) | ||
| if err == nil && currentUser != "" { | ||
| return buildExecCommandForUser(currentUser, targetUser, cmd), nil | ||
| } |
There was a problem hiding this comment.
Probing the current user via currentExecUser (which performs an exec into the pod) on every Exec call introduces significant latency and doubles the number of API interactions for every command. Consider optimizing this by relying on the SecurityContext from the pod spec as the primary source of truth, or caching the effective user after the first probe.
| agents, listErr := r.List(ctx, map[string]string{"scion.name": id}) | ||
| if listErr == nil { | ||
| for i := range agents { | ||
| candidate := agents[i] | ||
| if candidate.ContainerID != id && candidate.Name != id && len(agents) > 1 { | ||
| continue | ||
| } | ||
| namespace, podName, ok := podTargetFromAgent(candidate) | ||
| if ok { | ||
| return namespace, podName, &candidate, nil | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The error returned by r.List is ignored, and the named return parameter err is never assigned a value, which makes error checks in callers ineffective. Additionally, the conditional block on lines 218-220 is redundant because r.List already filters by the scion.name label. Propagating the error and simplifying the loop improves both robustness and clarity.
agents, err := r.List(ctx, map[string]string{"scion.name": id})
if err != nil {
return "", "", nil, fmt.Errorf("failed to list agents for resolution: %w", err)
}
for i := range agents {
candidate := agents[i]
namespace, podName, ok := podTargetFromAgent(candidate)
if ok {
return namespace, podName, &candidate, nil
}
}| if agent != nil && agent.Annotations != nil { | ||
| if u := strings.TrimSpace(agent.Annotations["scion.username"]); u != "" && validExecUsername.MatchString(u) { | ||
| username = u | ||
| } | ||
| } |
There was a problem hiding this comment.
This block is redundant. execTargetUsername(pod) (called on line 1849) already retrieves and validates the scion.username annotation from the pod metadata. Since the agent object's annotations are identical to the pod's annotations, this manual check duplicates logic. Removing this block would also allow for the removal of the unused agent variable returned by resolvePodTarget on line 1825.
de24339 to
45dcfd8
Compare
|
Addressed the latest review feedback on this branch. Changes in the new head (
Validation:
|
Summary
This fixes hosted Kubernetes runtime operations that still assumed the incoming agent identifier was the pod name.
Changes
resolvePodTarget(...)inpkg/runtime/k8s_runtime.goList()metadata when availableGetLogs,Attach, andExecnamespace/podinput working as-ispkg/runtime/k8s_runtime_test.goWhy
Hosted agents use a slug like
k8s-reverify-otelenv-233355, while the actual pod name is prefixed, for exampleglobal--k8s-reverify-otelenv-233355. Before this change, broker-backed logs and related runtime operations could fail with pod-not-found errors because the runtime tried to use the slug as the pod name.Validation
go test ./pkg/runtime -run 'TestKubernetesRuntime_(List|List_TerminalPhases|ResolvePodTargetFromHostedSlug)$|TestSelectLogContainer$' -count=1go test ./pkg/runtimebroker -run 'Test(MethodNotAllowed|AgentLogsAllowsGet)$' -count=1scion-intby rerunningscion logs <slug> --global --hub http://127.0.0.1:18080against an existing hosted agent that previously failed withpods "<slug>" not found