Skip to content

Resolve hosted Kubernetes agent slugs to pod targets#140

Open
mfreeman451 wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
mfreeman451:fix/k8s-runtime-pod-target-resolution
Open

Resolve hosted Kubernetes agent slugs to pod targets#140
mfreeman451 wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
mfreeman451:fix/k8s-runtime-pod-target-resolution

Conversation

@mfreeman451
Copy link
Copy Markdown
Contributor

Summary

This fixes hosted Kubernetes runtime operations that still assumed the incoming agent identifier was the pod name.

Changes

  • add resolvePodTarget(...) in pkg/runtime/k8s_runtime.go
  • resolve agent slug to the actual Kubernetes pod name/namespace using List() metadata when available
  • use that resolver in GetLogs, Attach, and Exec
  • keep namespace/pod input working as-is
  • add focused test coverage for hosted slug -> pod target resolution in pkg/runtime/k8s_runtime_test.go

Why

Hosted agents use a slug like k8s-reverify-otelenv-233355, while the actual pod name is prefixed, for example global--k8s-reverify-otelenv-233355. Before this change, broker-backed logs and related runtime operations could fail with pod-not-found errors because the runtime tried to use the slug as the pod name.

Validation

  • go test ./pkg/runtime -run 'TestKubernetesRuntime_(List|List_TerminalPhases|ResolvePodTargetFromHostedSlug)$|TestSelectLogContainer$' -count=1
  • go test ./pkg/runtimebroker -run 'Test(MethodNotAllowed|AgentLogsAllowsGet)$' -count=1
  • verified live on scion-int by rerunning scion logs <slug> --global --hub http://127.0.0.1:18080 against an existing hosted agent that previously failed with pods "<slug>" not found

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the Kubernetes runtime by improving pod target resolution and user handling for exec and attach operations. It introduces logic to dynamically determine the target execution user based on pod annotations and security contexts, and adds a helper to select the appropriate container for logs. Feedback focuses on optimizing performance by reducing redundant API calls when probing the current user, improving error handling and logic simplification in pod resolution, and removing redundant annotation checks in the attach flow.

Comment on lines +157 to +160
currentUser, err := r.currentExecUser(ctx, namespace, podName)
if err == nil && currentUser != "" {
return buildExecCommandForUser(currentUser, targetUser, cmd), nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Probing the current user via currentExecUser (which performs an exec into the pod) on every Exec call introduces significant latency and doubles the number of API interactions for every command. Consider optimizing this by relying on the SecurityContext from the pod spec as the primary source of truth, or caching the effective user after the first probe.

Comment thread pkg/runtime/k8s_runtime.go Outdated
Comment on lines +214 to +226
agents, listErr := r.List(ctx, map[string]string{"scion.name": id})
if listErr == nil {
for i := range agents {
candidate := agents[i]
if candidate.ContainerID != id && candidate.Name != id && len(agents) > 1 {
continue
}
namespace, podName, ok := podTargetFromAgent(candidate)
if ok {
return namespace, podName, &candidate, nil
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error returned by r.List is ignored, and the named return parameter err is never assigned a value, which makes error checks in callers ineffective. Additionally, the conditional block on lines 218-220 is redundant because r.List already filters by the scion.name label. Propagating the error and simplifying the loop improves both robustness and clarity.

	agents, err := r.List(ctx, map[string]string{"scion.name": id})
	if err != nil {
		return "", "", nil, fmt.Errorf("failed to list agents for resolution: %w", err)
	}
	for i := range agents {
		candidate := agents[i]
		namespace, podName, ok := podTargetFromAgent(candidate)
		if ok {
			return namespace, podName, &candidate, nil
		}
	}

Comment thread pkg/runtime/k8s_runtime.go Outdated
Comment on lines 1850 to 1854
if agent != nil && agent.Annotations != nil {
if u := strings.TrimSpace(agent.Annotations["scion.username"]); u != "" && validExecUsername.MatchString(u) {
username = u
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block is redundant. execTargetUsername(pod) (called on line 1849) already retrieves and validates the scion.username annotation from the pod metadata. Since the agent object's annotations are identical to the pod's annotations, this manual check duplicates logic. Removing this block would also allow for the removal of the unused agent variable returned by resolvePodTarget on line 1825.

@mfreeman451 mfreeman451 force-pushed the fix/k8s-runtime-pod-target-resolution branch from de24339 to 45dcfd8 Compare April 12, 2026 05:50
@mfreeman451
Copy link
Copy Markdown
Contributor Author

Addressed the latest review feedback on this branch.

Changes in the new head (45dcfd8b):

  • commandForExec(...) now uses the pod security context as the cheap primary path and only probes the live pod user when the spec does not already tell us the container runs as non-root
  • resolvePodTarget(...) now propagates List(...) failures instead of silently discarding them, and the redundant post-filtering is gone
  • Attach(...) now relies on execTargetUsername(pod) directly and no longer re-validates the same scion.username annotation from the resolved agent metadata
  • added focused tests for the non-root exec fast path and the new resolvePodTarget(...) error propagation behavior

Validation:

  • go test ./pkg/runtime -run 'TestKubernetesRuntime_(ResolvePodTargetFromHostedSlug|ResolvePodTarget_PropagatesListError|CommandForExec_SkipsProbeForNonRootPod|List|List_TerminalPhases)$|TestSelectLogContainer$'\n- go test ./pkg/runtimebroker -run 'Test(MethodNotAllowed|AgentLogsAllowsGet)$'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant