Resolve hosted Kubernetes agent slugs to pod targets by mfreeman451 · Pull Request #140 · GoogleCloudPlatform/scion

mfreeman451 · 2026-04-12T05:38:27Z

Summary

This fixes hosted Kubernetes runtime operations that still assumed the incoming agent identifier was the pod name.

Changes

add resolvePodTarget(...) in pkg/runtime/k8s_runtime.go
resolve agent slug to the actual Kubernetes pod name/namespace using List() metadata when available
use that resolver in GetLogs, Attach, and Exec
keep namespace/pod input working as-is
add focused test coverage for hosted slug -> pod target resolution in pkg/runtime/k8s_runtime_test.go

Why

Hosted agents use a slug like k8s-reverify-otelenv-233355, while the actual pod name is prefixed, for example global--k8s-reverify-otelenv-233355. Before this change, broker-backed logs and related runtime operations could fail with pod-not-found errors because the runtime tried to use the slug as the pod name.

Validation

go test ./pkg/runtime -run 'TestKubernetesRuntime_(List|List_TerminalPhases|ResolvePodTargetFromHostedSlug)$|TestSelectLogContainer$' -count=1
go test ./pkg/runtimebroker -run 'Test(MethodNotAllowed|AgentLogsAllowsGet)$' -count=1
verified live on scion-int by rerunning scion logs <slug> --global --hub http://127.0.0.1:18080 against an existing hosted agent that previously failed with pods "<slug>" not found

gemini-code-assist

Code Review

This pull request enhances the Kubernetes runtime by improving pod target resolution and user handling for exec and attach operations. It introduces logic to dynamically determine the target execution user based on pod annotations and security contexts, and adds a helper to select the appropriate container for logs. Feedback focuses on optimizing performance by reducing redundant API calls when probing the current user, improving error handling and logic simplification in pod resolution, and removing redundant annotation checks in the attach flow.

gemini-code-assist · 2026-04-12T05:40:44Z

+	currentUser, err := r.currentExecUser(ctx, namespace, podName)
+	if err == nil && currentUser != "" {
+		return buildExecCommandForUser(currentUser, targetUser, cmd), nil
+	}


Probing the current user via currentExecUser (which performs an exec into the pod) on every Exec call introduces significant latency and doubles the number of API interactions for every command. Consider optimizing this by relying on the SecurityContext from the pod spec as the primary source of truth, or caching the effective user after the first probe.

gemini-code-assist · 2026-04-12T05:40:44Z

+	agents, listErr := r.List(ctx, map[string]string{"scion.name": id})
+	if listErr == nil {
+		for i := range agents {
+			candidate := agents[i]
+			if candidate.ContainerID != id && candidate.Name != id && len(agents) > 1 {
+				continue
+			}
+			namespace, podName, ok := podTargetFromAgent(candidate)
+			if ok {
+				return namespace, podName, &candidate, nil
+			}
+		}
+	}


The error returned by r.List is ignored, and the named return parameter err is never assigned a value, which makes error checks in callers ineffective. Additionally, the conditional block on lines 218-220 is redundant because r.List already filters by the scion.name label. Propagating the error and simplifying the loop improves both robustness and clarity.

agents, err := r.List(ctx, map[string]string{"scion.name": id}) if err != nil { return "", "", nil, fmt.Errorf("failed to list agents for resolution: %w", err) } for i := range agents { candidate := agents[i] namespace, podName, ok := podTargetFromAgent(candidate) if ok { return namespace, podName, &candidate, nil } }

gemini-code-assist · 2026-04-12T05:40:45Z

+	if agent != nil && agent.Annotations != nil {
+		if u := strings.TrimSpace(agent.Annotations["scion.username"]); u != "" && validExecUsername.MatchString(u) {
+			username = u
+		}
 	}


This block is redundant. execTargetUsername(pod) (called on line 1849) already retrieves and validates the scion.username annotation from the pod metadata. Since the agent object's annotations are identical to the pod's annotations, this manual check duplicates logic. Removing this block would also allow for the removal of the unused agent variable returned by resolvePodTarget on line 1825.

mfreeman451 · 2026-04-12T05:50:34Z

Addressed the latest review feedback on this branch.

Changes in the new head (45dcfd8b):

commandForExec(...) now uses the pod security context as the cheap primary path and only probes the live pod user when the spec does not already tell us the container runs as non-root
resolvePodTarget(...) now propagates List(...) failures instead of silently discarding them, and the redundant post-filtering is gone
Attach(...) now relies on execTargetUsername(pod) directly and no longer re-validates the same scion.username annotation from the resolved agent metadata
added focused tests for the non-root exec fast path and the new resolvePodTarget(...) error propagation behavior

Validation:

go test ./pkg/runtime -run 'TestKubernetesRuntime_(ResolvePodTargetFromHostedSlug|ResolvePodTarget_PropagatesListError|CommandForExec_SkipsProbeForNonRootPod|List|List_TerminalPhases)$|TestSelectLogContainer$'\n- go test ./pkg/runtimebroker -run 'Test(MethodNotAllowed|AgentLogsAllowsGet)$'

gemini-code-assist Bot reviewed Apr 12, 2026

View reviewed changes

Resolve hosted Kubernetes agent slugs to pod targets

45dcfd8

mfreeman451 force-pushed the fix/k8s-runtime-pod-target-resolution branch from de24339 to 45dcfd8 Compare April 12, 2026 05:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve hosted Kubernetes agent slugs to pod targets#140

Resolve hosted Kubernetes agent slugs to pod targets#140
mfreeman451 wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
mfreeman451:fix/k8s-runtime-pod-target-resolution

mfreeman451 commented Apr 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Uh oh!

mfreeman451 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mfreeman451 commented Apr 12, 2026

Summary

Changes

Why

Validation

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

mfreeman451 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant