Skip to content

feat: add imagePullSecrets support for container-based skills#1725

Open
ppeau wants to merge 3 commits intokagent-dev:mainfrom
ppeau:feat/skills-with-imagepullsecrets
Open

feat: add imagePullSecrets support for container-based skills#1725
ppeau wants to merge 3 commits intokagent-dev:mainfrom
ppeau:feat/skills-with-imagepullsecrets

Conversation

@ppeau
Copy link
Copy Markdown

@ppeau ppeau commented Apr 21, 2026

Closes #1222

Problem

Container-based skills using krane to pull OCI images had no way to authenticate against private registries (Artifactory, ACR, ECR, etc.). The imagePullSecrets defined on the agent deployment were not passed to the skills-init init container, causing authentication failures like:
No matching credentials were found for "docker.artifactory.dev.example.com"
Error: pulling ...: Authentication is required

Solution

Follows the approach discussed in #1222 by @s10gopal:

  1. Added an imagePullSecrets field under spec.skills accepting a list of kubernetes.io/dockerconfigjson secrets
  2. When imagePullSecrets is set, a new docker-auth-init init container is prepended — it merges all referenced secrets into a single config.json using jq
  3. The skills-init container reads that merged config via the DOCKER_CONFIG env var, which krane picks up automatically when pulling skill images

Changes

  • go/api/v1alpha2/agent_types.go: add ImagePullSecrets []corev1.LocalObjectReference to SkillForAgent struct
  • go/api/v1alpha2/zz_generated.deepcopy.go: regenerated DeepCopy for new field
  • go/core/internal/controller/translator/agent/adk_api_translator.go: buildSkillsInitContainer now returns []Container, prepends docker-auth-init when imagePullSecrets are present
  • docker/skills-init/Dockerfile: add jq to the Alpine base image
  • .gitattributes: enforce LF line endings on *.sh.tmpl files (prevents shell script breakage on Windows contributors)

Usage

apiVersion: kagent.dev/v1alpha2
kind: Agent
spec:
  skills:
    refs:
      - private-registry.example.com/my-org/my-skill:v1
    imagePullSecrets:
      - name: my-registry-secret  # kubernetes.io/dockerconfigjson secret

Testing

Validated end-to-end on a local Kubernetes cluster with a private registry protected by htpasswd authentication:

Skill image hosted on the private registry, inaccessible without credentials
Agent configured with imagePullSecrets referencing a dockerconfigjson secret
docker-auth-init merged the credentials, skills-init pulled the image successfully via krane
Skill was correctly loaded and executed by the agent

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class support for authenticating OCI pulls for container-based skills by allowing agents to reference kubernetes.io/dockerconfigjson secrets and wiring those credentials into the skills-init workflow.

Changes:

  • Extend the Agent/SandboxAgent spec.skills schema and Go types with imagePullSecrets.
  • Update the agent manifest translation to optionally prepend a docker-auth-init initContainer that merges multiple dockerconfigjson secrets into a single Docker config.json, and set DOCKER_CONFIG for skills-init.
  • Update the skills-init image to include jq, and add unit/e2e coverage for the new behavior.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
helm/kagent-crds/templates/kagent.dev_sandboxagents.yaml Exposes spec.skills.imagePullSecrets in the Helm-rendered SandboxAgent CRD schema.
helm/kagent-crds/templates/kagent.dev_agents.yaml Exposes spec.skills.imagePullSecrets in the Helm-rendered Agent CRD schema.
go/core/test/e2e/invoke_api_test.go Adds an e2e test verifying docker-auth-init is injected and the agent still functions end-to-end.
go/core/internal/controller/translator/agent/manifest_builder.go Passes ImagePullSecrets through and adapts to buildSkillsInitContainer returning multiple init containers.
go/core/internal/controller/translator/agent/git_skills_test.go Adds translator unit tests validating volumes/mounts/env for imagePullSecrets.
go/core/internal/controller/translator/agent/adk_api_translator.go Implements docker-auth-init, merge script generation, volume/mount wiring, and DOCKER_CONFIG env injection.
go/api/v1alpha2/zz_generated.deepcopy.go Regenerates DeepCopy to include the new ImagePullSecrets field.
go/api/v1alpha2/agent_types.go Adds ImagePullSecrets []LocalObjectReference to SkillForAgent.
go/api/config/crd/bases/kagent.dev_sandboxagents.yaml Updates the base CRD schema for SandboxAgent to include imagePullSecrets.
go/api/config/crd/bases/kagent.dev_agents.yaml Updates the base CRD schema for Agent to include imagePullSecrets.
docker/skills-init/Dockerfile Installs jq so the merge init container can build a combined Docker config.
.gitattributes Forces LF endings for *.sh.tmpl to avoid cross-platform template/script breakage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 381 to +396
container, skillsVolumes, err := buildSkillsInitContainer(
gitRefs,
spec.Skills.GitAuthSecretRef,
skills,
spec.Skills.InsecureSkipVerify,
manifestCtx.deployment.SecurityContext,
initEnv,
getDefaultResources(initResources),
spec.Skills.ImagePullSecrets,
)
if err != nil {
return nil, fmt.Errorf("failed to build skills init container: %w", err)
}

*volumes = append(*volumes, skillsVolumes...)
return []corev1.Container{container}, nil
return container, nil
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildSkillsInitContainer now returns a slice of containers, but the receiving variable is still named container, which makes the call site harder to read. Consider renaming it to containers (or similar) to reflect the type and avoid confusion.

Copilot uses AI. Check for mistakes.
Comment on lines +1325 to +1334
for _, secret := range imagePullSecrets {
volName := "pull-secret-" + secret.Name
volumes = append(volumes, corev1.Volume{
Name: volName,
VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: secret.Name,
},
},
})
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Volume names are derived from the Secret name ("pull-secret-" + secret.Name). Kubernetes Secret names may contain characters (notably '.') and/or length that are invalid for Pod volume names, which can make the generated Deployment fail admission. Consider generating a safe volume name (e.g., index-based or hashed) and keep the SecretName field pointing at the original secret.

Copilot uses AI. Check for mistakes.
Comment on lines +1313 to +1356
// Shared EmptyDir volume for the merged Docker config.
volumes = append(volumes, corev1.Volume{
Name: "kagent-docker-config",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
})

// Mount each imagePullSecret as a read-only directory under /docker-secrets/<name>.
authInitVolumeMounts := []corev1.VolumeMount{
{Name: "kagent-docker-config", MountPath: "/docker-config-out"},
}
for _, secret := range imagePullSecrets {
volName := "pull-secret-" + secret.Name
volumes = append(volumes, corev1.Volume{
Name: volName,
VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: secret.Name,
},
},
})
authInitVolumeMounts = append(authInitVolumeMounts, corev1.VolumeMount{
Name: volName,
MountPath: "/docker-secrets/" + secret.Name,
ReadOnly: true,
})
}

mergeScript := buildDockerAuthMergeScript(imagePullSecrets)
dockerAuthInitContainer := corev1.Container{
Name: "docker-auth-init",
Image: DefaultSkillsInitImageConfig.Image(),
Command: []string{"/bin/sh", "-c", mergeScript},
VolumeMounts: authInitVolumeMounts,
}
containers = append(containers, dockerAuthInitContainer)

// Mount the merged config into skills-init so krane picks it up via DOCKER_CONFIG.
volumeMounts = append(volumeMounts, corev1.VolumeMount{
Name: "kagent-docker-config",
MountPath: "/.kagent/.docker",
ReadOnly: true,
})
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imagePullSecrets entries with an empty Name (LocalObjectReference allows this) or repeated names will currently produce invalid SecretVolumeSource.SecretName values and/or duplicate volume names. It would be safer to validate that every reference has a non-empty name and to de-duplicate (or error) before creating volumes/mounts.

Suggested change
// Shared EmptyDir volume for the merged Docker config.
volumes = append(volumes, corev1.Volume{
Name: "kagent-docker-config",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
})
// Mount each imagePullSecret as a read-only directory under /docker-secrets/<name>.
authInitVolumeMounts := []corev1.VolumeMount{
{Name: "kagent-docker-config", MountPath: "/docker-config-out"},
}
for _, secret := range imagePullSecrets {
volName := "pull-secret-" + secret.Name
volumes = append(volumes, corev1.Volume{
Name: volName,
VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: secret.Name,
},
},
})
authInitVolumeMounts = append(authInitVolumeMounts, corev1.VolumeMount{
Name: volName,
MountPath: "/docker-secrets/" + secret.Name,
ReadOnly: true,
})
}
mergeScript := buildDockerAuthMergeScript(imagePullSecrets)
dockerAuthInitContainer := corev1.Container{
Name: "docker-auth-init",
Image: DefaultSkillsInitImageConfig.Image(),
Command: []string{"/bin/sh", "-c", mergeScript},
VolumeMounts: authInitVolumeMounts,
}
containers = append(containers, dockerAuthInitContainer)
// Mount the merged config into skills-init so krane picks it up via DOCKER_CONFIG.
volumeMounts = append(volumeMounts, corev1.VolumeMount{
Name: "kagent-docker-config",
MountPath: "/.kagent/.docker",
ReadOnly: true,
})
validImagePullSecrets := make([]corev1.LocalObjectReference, 0, len(imagePullSecrets))
seenImagePullSecrets := make(map[string]struct{}, len(imagePullSecrets))
for _, secret := range imagePullSecrets {
if secret.Name == "" {
continue
}
if _, seen := seenImagePullSecrets[secret.Name]; seen {
continue
}
seenImagePullSecrets[secret.Name] = struct{}{}
validImagePullSecrets = append(validImagePullSecrets, secret)
}
if len(validImagePullSecrets) > 0 {
// Shared EmptyDir volume for the merged Docker config.
volumes = append(volumes, corev1.Volume{
Name: "kagent-docker-config",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
})
// Mount each imagePullSecret as a read-only directory under /docker-secrets/<name>.
authInitVolumeMounts := []corev1.VolumeMount{
{Name: "kagent-docker-config", MountPath: "/docker-config-out"},
}
for _, secret := range validImagePullSecrets {
volName := "pull-secret-" + secret.Name
volumes = append(volumes, corev1.Volume{
Name: volName,
VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: secret.Name,
},
},
})
authInitVolumeMounts = append(authInitVolumeMounts, corev1.VolumeMount{
Name: volName,
MountPath: "/docker-secrets/" + secret.Name,
ReadOnly: true,
})
}
mergeScript := buildDockerAuthMergeScript(validImagePullSecrets)
dockerAuthInitContainer := corev1.Container{
Name: "docker-auth-init",
Image: DefaultSkillsInitImageConfig.Image(),
Command: []string{"/bin/sh", "-c", mergeScript},
VolumeMounts: authInitVolumeMounts,
}
containers = append(containers, dockerAuthInitContainer)
// Mount the merged config into skills-init so krane picks it up via DOCKER_CONFIG.
volumeMounts = append(volumeMounts, corev1.VolumeMount{
Name: "kagent-docker-config",
MountPath: "/.kagent/.docker",
ReadOnly: true,
})
}

Copilot uses AI. Check for mistakes.
Comment on lines +1344 to +1347
Name: "docker-auth-init",
Image: DefaultSkillsInitImageConfig.Image(),
Command: []string{"/bin/sh", "-c", mergeScript},
VolumeMounts: authInitVolumeMounts,
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker-auth-init is created without SecurityContext or resource requirements, while skills-init uses the pod/deployment securityContext and configured resources. This can cause PodSecurity admission failures or unexpected resource usage differences. Consider applying the same initSecCtx and resources (or a deliberate minimal set) to docker-auth-init as well.

Suggested change
Name: "docker-auth-init",
Image: DefaultSkillsInitImageConfig.Image(),
Command: []string{"/bin/sh", "-c", mergeScript},
VolumeMounts: authInitVolumeMounts,
Name: "docker-auth-init",
Image: DefaultSkillsInitImageConfig.Image(),
Command: []string{"/bin/sh", "-c", mergeScript},
VolumeMounts: authInitVolumeMounts,
SecurityContext: initSecCtx,
Resources: resources,

Copilot uses AI. Check for mistakes.
@ppeau ppeau force-pushed the feat/skills-with-imagepullsecrets branch from 2129a46 to 58d8a73 Compare April 21, 2026 19:33
Copy link
Copy Markdown
Contributor

@EItanya EItanya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this makes sense, but before we go down the road of adding a new API, is there anyway we can re-use the ImagePullSecrets which already get used for image pulling, or do those remain on the node and never mounted into the pod themselves?

@ppeau
Copy link
Copy Markdown
Author

ppeau commented Apr 29, 2026

Hi @EItanya, great question. We actually explored reusing the existing deployment imagePullSecrets first before adding a new field.

On the technical side, imagePullSecrets defined on a pod spec are consumed exclusively by the kubelet to pull container images. They are never mounted into the pod or made accessible to running containers. This means krane, executing inside the skills-init init container, has no way to read those credentials. We hit this wall directly during testing.

Even if it were technically possible, there is a design reason why it would not be the right approach in enterprise environments. The registry used to deploy the kagent system and the registry hosting skill images are typically owned by completely different teams with different security boundaries. The platform/ops team manages the kagent deployment and its registry credentials, while the line-of-business team produces and owns the skill images, hosted in their own private registry (Artifactory, ACR, ECR, etc.). Reusing the deployment imagePullSecrets would couple these two security contexts together, violate the principle of least privilege, and make skills effectively unusable for any team whose registry differs from the one used to deploy kagent. That is the common case at scale.

The new imagePullSecrets field under spec.skills directly mirrors the standard Kubernetes pattern where a pod can reference multiple imagePullSecrets for different registries. It introduces no new concept, just applies the same model at the skill level.

Happy to discuss further if needed!

@EItanya
Copy link
Copy Markdown
Contributor

EItanya commented Apr 30, 2026

Hi @EItanya, great question. We actually explored reusing the existing deployment imagePullSecrets first before adding a new field.

On the technical side, imagePullSecrets defined on a pod spec are consumed exclusively by the kubelet to pull container images. They are never mounted into the pod or made accessible to running containers. This means krane, executing inside the skills-init init container, has no way to read those credentials. We hit this wall directly during testing.

Even if it were technically possible, there is a design reason why it would not be the right approach in enterprise environments. The registry used to deploy the kagent system and the registry hosting skill images are typically owned by completely different teams with different security boundaries. The platform/ops team manages the kagent deployment and its registry credentials, while the line-of-business team produces and owns the skill images, hosted in their own private registry (Artifactory, ACR, ECR, etc.). Reusing the deployment imagePullSecrets would couple these two security contexts together, violate the principle of least privilege, and make skills effectively unusable for any team whose registry differs from the one used to deploy kagent. That is the common case at scale.

The new imagePullSecrets field under spec.skills directly mirrors the standard Kubernetes pattern where a pod can reference multiple imagePullSecrets for different registries. It introduces no new concept, just applies the same model at the skill level.

Happy to discuss further if needed!

Ok I buy that logic. What do you think about renaming the field to PullSecrets instead ImagePullSecrets since these aren't really images

@ppeau
Copy link
Copy Markdown
Author

ppeau commented Apr 30, 2026

Hi @EItanya, great question. We actually explored reusing the existing deployment imagePullSecrets first before adding a new field.
On the technical side, imagePullSecrets defined on a pod spec are consumed exclusively by the kubelet to pull container images. They are never mounted into the pod or made accessible to running containers. This means krane, executing inside the skills-init init container, has no way to read those credentials. We hit this wall directly during testing.
Even if it were technically possible, there is a design reason why it would not be the right approach in enterprise environments. The registry used to deploy the kagent system and the registry hosting skill images are typically owned by completely different teams with different security boundaries. The platform/ops team manages the kagent deployment and its registry credentials, while the line-of-business team produces and owns the skill images, hosted in their own private registry (Artifactory, ACR, ECR, etc.). Reusing the deployment imagePullSecrets would couple these two security contexts together, violate the principle of least privilege, and make skills effectively unusable for any team whose registry differs from the one used to deploy kagent. That is the common case at scale.
The new imagePullSecrets field under spec.skills directly mirrors the standard Kubernetes pattern where a pod can reference multiple imagePullSecrets for different registries. It introduces no new concept, just applies the same model at the skill level.
Happy to discuss further if needed!

Ok I buy that logic. What do you think about renaming the field to PullSecrets instead ImagePullSecrets since these aren't really images

Good point, I can see both sides here.

For keeping imagePullSecrets: Skills are stored and pulled as OCI artifacts using the exact same kubernetes.io/dockerconfigjson secrets as normal container images. The name imagePullSecrets is the standard Kubernetes convention, so it feels familiar right away. Anyone who’s used Kubernetes already knows what it means and how to set it up.

For renaming to pullSecrets: Skills aren’t executed as containers, they’re more like content or configuration that gets pulled. The original imagePullSecrets name is specifically tied to pulling runnable container images at the pod level, so using it in this context could feel a bit overloaded. pullSecrets is more neutral and probably more accurate here.

Both options are valid.
I’m happy to go with pullSecrets if you prefer it.

Just say the word and I’ll rename the field right away. 👍

@EItanya
Copy link
Copy Markdown
Contributor

EItanya commented May 1, 2026

Hi @EItanya, great question. We actually explored reusing the existing deployment imagePullSecrets first before adding a new field.
On the technical side, imagePullSecrets defined on a pod spec are consumed exclusively by the kubelet to pull container images. They are never mounted into the pod or made accessible to running containers. This means krane, executing inside the skills-init init container, has no way to read those credentials. We hit this wall directly during testing.
Even if it were technically possible, there is a design reason why it would not be the right approach in enterprise environments. The registry used to deploy the kagent system and the registry hosting skill images are typically owned by completely different teams with different security boundaries. The platform/ops team manages the kagent deployment and its registry credentials, while the line-of-business team produces and owns the skill images, hosted in their own private registry (Artifactory, ACR, ECR, etc.). Reusing the deployment imagePullSecrets would couple these two security contexts together, violate the principle of least privilege, and make skills effectively unusable for any team whose registry differs from the one used to deploy kagent. That is the common case at scale.
The new imagePullSecrets field under spec.skills directly mirrors the standard Kubernetes pattern where a pod can reference multiple imagePullSecrets for different registries. It introduces no new concept, just applies the same model at the skill level.
Happy to discuss further if needed!

Ok I buy that logic. What do you think about renaming the field to PullSecrets instead ImagePullSecrets since these aren't really images

Good point, I can see both sides here.

For keeping imagePullSecrets: Skills are stored and pulled as OCI artifacts using the exact same kubernetes.io/dockerconfigjson secrets as normal container images. The name imagePullSecrets is the standard Kubernetes convention, so it feels familiar right away. Anyone who’s used Kubernetes already knows what it means and how to set it up.

For renaming to pullSecrets: Skills aren’t executed as containers, they’re more like content or configuration that gets pulled. The original imagePullSecrets name is specifically tied to pulling runnable container images at the pod level, so using it in this context could feel a bit overloaded. pullSecrets is more neutral and probably more accurate here.

Both options are valid. I’m happy to go with pullSecrets if you prefer it.

Just say the word and I’ll rename the field right away. 👍

Ok I buy that logic, let's stick with it for now. Just resolve merge conflicts and we'll get this merged

@ppeau ppeau force-pushed the feat/skills-with-imagepullsecrets branch from acde26f to 2689d6c Compare May 1, 2026 14:40
@ppeau
Copy link
Copy Markdown
Author

ppeau commented May 1, 2026

Hi @EItanya, great question. We actually explored reusing the existing deployment imagePullSecrets first before adding a new field.
On the technical side, imagePullSecrets defined on a pod spec are consumed exclusively by the kubelet to pull container images. They are never mounted into the pod or made accessible to running containers. This means krane, executing inside the skills-init init container, has no way to read those credentials. We hit this wall directly during testing.
Even if it were technically possible, there is a design reason why it would not be the right approach in enterprise environments. The registry used to deploy the kagent system and the registry hosting skill images are typically owned by completely different teams with different security boundaries. The platform/ops team manages the kagent deployment and its registry credentials, while the line-of-business team produces and owns the skill images, hosted in their own private registry (Artifactory, ACR, ECR, etc.). Reusing the deployment imagePullSecrets would couple these two security contexts together, violate the principle of least privilege, and make skills effectively unusable for any team whose registry differs from the one used to deploy kagent. That is the common case at scale.
The new imagePullSecrets field under spec.skills directly mirrors the standard Kubernetes pattern where a pod can reference multiple imagePullSecrets for different registries. It introduces no new concept, just applies the same model at the skill level.
Happy to discuss further if needed!

Ok I buy that logic. What do you think about renaming the field to PullSecrets instead ImagePullSecrets since these aren't really images

Good point, I can see both sides here.
For keeping imagePullSecrets: Skills are stored and pulled as OCI artifacts using the exact same kubernetes.io/dockerconfigjson secrets as normal container images. The name imagePullSecrets is the standard Kubernetes convention, so it feels familiar right away. Anyone who’s used Kubernetes already knows what it means and how to set it up.
For renaming to pullSecrets: Skills aren’t executed as containers, they’re more like content or configuration that gets pulled. The original imagePullSecrets name is specifically tied to pulling runnable container images at the pod level, so using it in this context could feel a bit overloaded. pullSecrets is more neutral and probably more accurate here.
Both options are valid. I’m happy to go with pullSecrets if you prefer it.
Just say the word and I’ll rename the field right away. 👍

Ok I buy that logic, let's stick with it for now. Just resolve merge conflicts and we'll get this merged

Done! ✅
I’ve resolved the merge conflicts.

Comment on lines +1449 to +1463
func buildDockerAuthMergeScript(imagePullSecrets []corev1.LocalObjectReference) string {
var sb strings.Builder
sb.WriteString(`set -e
mkdir -p /docker-config-out
merged='{"auths":{}}'
`)
for _, secret := range imagePullSecrets {
sb.WriteString(`if [ -f /docker-secrets/` + secret.Name + `/.dockerconfigjson ]; then
merged="$(printf '%s\n%s\n' "$merged" "$(cat /docker-secrets/` + secret.Name + `/.dockerconfigjson)" | jq -s '.[0].auths * .[1].auths | {"auths": .}')"
fi
`)
}
sb.WriteString(`printf '%s' "$merged" > /docker-config-out/config.json
`)
return sb.String()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the continue reviews, is there anyway you could put this into a tmpl file similar to this existing one. In the future I want to move away from these scripts altogether, but I think they're a bit simpler to understand for now.

Copy link
Copy Markdown
Author

@ppeau ppeau May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, I'll check that out right away. I'll get back to you as soon as it's ready.

Copy link
Copy Markdown
Author

@ppeau ppeau May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Eltanya, done! The docker-auth-init script has been moved to a dedicated docker-auth-init.sh.tmpl file, following the same pattern as skills-init.sh.tmpl (//go:embed + template.Must + typed data struct). All existing tests pass including the imagePullSecrets ones.

I'll let you resolve the conversation if it looks good to you 👍

@ppeau ppeau force-pushed the feat/skills-with-imagepullsecrets branch from daaa447 to 9a9ce6e Compare May 5, 2026 15:10
@EItanya
Copy link
Copy Markdown
Contributor

EItanya commented May 7, 2026

Hey there, I'm sorry for going back and forth about this PR, but I have some more questions. It's not clear to me why we need a new container for this, why can't we run this logic inside of the existing skills-init container?

@ppeau
Copy link
Copy Markdown
Author

ppeau commented May 7, 2026

Hey there, I'm sorry for going back and forth about this PR, but I have some more questions. It's not clear to me why we need a new container for this, why can't we run this logic inside of the existing skills-init container?

Hi @EItanya, completely fair question. Technically yes, we could add the jq merge at the start of the skills-init script.

We went with a separate docker-auth-init container because the two-container approach was the design that came out of the discussion in #1222, and it felt like a clean separation between auth setup and skill pulling. skills-init.sh.tmpl is already handling git auth, SSH keys, and krane pulls, so adding credential merging on top would blur its responsibility. It also gives better failure isolation: if credentials fail to merge, Kubernetes reports the failed init container by name immediately, without having to parse skills-init logs.

Worth noting too that docker-auth-init only runs when imagePullSecrets is set, so existing deployments without private registries are completely unaffected.

That said, if you strongly prefer consolidating into skills-init, we are happy to do that. Just let us know!

@EItanya
Copy link
Copy Markdown
Contributor

EItanya commented May 7, 2026

Hey there, I'm sorry for going back and forth about this PR, but I have some more questions. It's not clear to me why we need a new container for this, why can't we run this logic inside of the existing skills-init container?

Hi @EItanya, completely fair question. Technically yes, we could add the jq merge at the start of the skills-init script.

We went with a separate docker-auth-init container because the two-container approach was the design that came out of the discussion in #1222, and it felt like a clean separation between auth setup and skill pulling. skills-init.sh.tmpl is already handling git auth, SSH keys, and krane pulls, so adding credential merging on top would blur its responsibility. It also gives better failure isolation: if credentials fail to merge, Kubernetes reports the failed init container by name immediately, without having to parse skills-init logs.

Worth noting too that docker-auth-init only runs when imagePullSecrets is set, so existing deployments without private registries are completely unaffected.

That said, if you strongly prefer consolidating into skills-init, we are happy to do that. Just let us know!

Although I agree with these, I also think that adding a new container comes with its own difficulties that I'd rather avoid. For example adding new SecurityContext and resources requirements options which clog up the Agent object. As I've mentioned before we're getting to the point where we really need to turn this skills-init container into a golang program so it's clearer what it's actually doing, and I think it should be responsible for all pieces related to that.

@ppeau
Copy link
Copy Markdown
Author

ppeau commented May 7, 2026

Hey there, I'm sorry for going back and forth about this PR, but I have some more questions. It's not clear to me why we need a new container for this, why can't we run this logic inside of the existing skills-init container?

Hi @EItanya, completely fair question. Technically yes, we could add the jq merge at the start of the skills-init script.
We went with a separate docker-auth-init container because the two-container approach was the design that came out of the discussion in #1222, and it felt like a clean separation between auth setup and skill pulling. skills-init.sh.tmpl is already handling git auth, SSH keys, and krane pulls, so adding credential merging on top would blur its responsibility. It also gives better failure isolation: if credentials fail to merge, Kubernetes reports the failed init container by name immediately, without having to parse skills-init logs.
Worth noting too that docker-auth-init only runs when imagePullSecrets is set, so existing deployments without private registries are completely unaffected.
That said, if you strongly prefer consolidating into skills-init, we are happy to do that. Just let us know!

Although I agree with these, I also think that adding a new container comes with its own difficulties that I'd rather avoid. For example adding new SecurityContext and resources requirements options which clog up the Agent object. As I've mentioned before we're getting to the point where we really need to turn this skills-init container into a golang program so it's clearer what it's actually doing, and I think it should be responsible for all pieces related to that.

Totally makes sense, thanks for taking the time to explain your reasoning!
I'll consolidate the credential merge logic directly into skills-init and drop the separate docker-auth-init container.

I'll get back to you once it's ready! 👍

ppeau added 3 commits May 7, 2026 19:10
Add authentication support for pulling skill images from private
registries (Artifactory, ACR, ECR, etc.) by introducing a new
imagePullSecrets field under spec.skills.

When imagePullSecrets is set, a docker-auth-init init container is
prepended that merges all kubernetes.io/dockerconfigjson secrets into
a single config.json using jq. The skills-init container then reads
that config via the DOCKER_CONFIG env var, which krane picks up
automatically when pulling skill images.

Closes kagent-dev#1222

Signed-off-by: ppeau <patrice.peau@gmail.com>
Signed-off-by: ppeau <patrice.peau@gmail.com>
Previously, when imagePullSecrets were specified, the controller created
two init containers: docker-auth-init (to merge dockerconfigjson secrets
into a shared EmptyDir volume) and skills-init (to pull OCI/git skills).

This commit eliminates the separate docker-auth-init container by
embedding the credential merge logic directly into the skills-init shell
script template. Each imagePullSecret is now mounted directly on
skills-init under /docker-secrets/<name>; the script merges them with jq
into /tmp/kagent-docker-config/config.json and exports DOCKER_CONFIG
before invoking krane.

Changes:
- Remove docker-auth-init.sh.tmpl and all associated Go code
- Add ImagePullSecrets []string field to skillsInitData
- Render credential merge block at top of skills-init.sh.tmpl
- Mount pull-secret volumes directly on skills-init container
- Update unit tests: assert exactly one init container, no
  docker-auth-init, no kagent-docker-config EmptyDir volume
- Update E2E test: verify single init container and script content

Signed-off-by: ppeau <patrice.peau@gmail.com>
@ppeau ppeau force-pushed the feat/skills-with-imagepullsecrets branch from b1a02db to 1d6b918 Compare May 7, 2026 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Container-based skills image download authentication support

3 participants