Skip to content

fix(images): change agent user UID from 1000 to 1001#466

Merged
jonwiggins merged 1 commit intojonwiggins:mainfrom
nethi:fix/agent-uid-1001
Apr 20, 2026
Merged

fix(images): change agent user UID from 1000 to 1001#466
jonwiggins merged 1 commit intojonwiggins:mainfrom
nethi:fix/agent-uid-1001

Conversation

@nethi
Copy link
Copy Markdown
Contributor

@nethi nethi commented Apr 20, 2026

Problem

StatefulSet pods were crashing with git permission errors:

error: could not lock config file /.gitconfig: Permission denied

Root Cause

Mismatch between Dockerfile and K8s securityContext:

  • Dockerfile (images/base.Dockerfile): Created agent user with UID 1000
  • K8s pod (k8s-workload-service.ts): Runs as UID 1001

When the pod runs as UID 1001 with no matching user in /etc/passwd:

  • No HOME directory is set
  • Git falls back to / for .gitconfig
  • Permission denied on /.gitconfig

History

  • Commit 0a32e34 (Apr 15): Set both Dockerfile and K8s to UID 1000 ✅
  • Commit eeaa4ba (later): Changed K8s spec to UID 1001, but incorrectly assumed the Dockerfile was already 1001 ❌

The commit message in eeaa4ba stated:

"runAsUser: 1000 doesn't match image's agent UID 1001... Changed all three fields to 1001."

But the image was still at UID 1000, creating the mismatch.

Solution

Changed agent user from UID 1000 to UID 1001 in base.Dockerfile:

  • Removed ubuntu user deletion (no longer needed - agent naturally gets 1001 as next available UID)
  • Simplified Dockerfile by removing unnecessary workaround steps
  • Now matches K8s securityContext at UID 1001

Testing

  • ✅ Rebuilt all agent images (base, node, python, go, rust, full)
  • ✅ Created StatefulSet pod - started successfully
  • ✅ Git clone completed without permission errors
  • ✅ All pre-commit hooks passed (lint, format, typecheck)
  • ✅ All 2005 API tests passed

Files Changed

  • images/base.Dockerfile - Simplified agent user creation to use UID 1001

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

@jonwiggins
Copy link
Copy Markdown
Owner

The UID fix itself is correct and worth merging — apps/api/src/services/k8s-workload-service.ts:531-538 runs agent pods as UID/GID 1001 with a comment claiming it matches images/base.Dockerfile, but the Dockerfile currently creates the agent user at UID 1000. Right now agents only work because the home-perm-fix init container (running as root) chowns /home/agent every startup. Aligning the Dockerfile to 1001 is the right call.

However, the branch can't be reviewed in this state. It has 13 commits including copies of #462, #463, #464, three Merge upstream/main commits, and six drive-by changes that never got their own PRs:

  • apps/api/src/services/repo-service.ts — workspace fallback in getRepoByUrl
  • apps/api/src/services/ticket-sync-service.ts — nullable GitHub API name field
  • apps/api/src/routes/github-token.ts, apps/api/src/routes/setup.ts
  • apps/api/vitest.config.ts + apps/api/src/services/review-service.test.tsOPTIO_AUTH_DISABLED=true in test env
  • scripts/setup-local.sh--no-build flag + K8s version regex fix

Could you rebase this onto fresh main so the PR contains only the images/base.Dockerfile UID change? #462, #463, #461 have now merged, so those will drop out on rebase automatically.

The drive-bys look potentially useful — please split them into separate PRs so each can be reviewed on its own merits. Happy to fast-track the ones that are obviously correct once they're isolated.

The K8s securityContext in k8s-workload-service.ts sets runAsUser=1001,
but the base image was creating the agent user with UID 1000. This
mismatch caused git to fail with permission errors because:

- Pod runs as UID 1001 (no matching user in /etc/passwd)
- No HOME directory set for UID 1001
- Git falls back to / for .gitconfig
- Permission denied: /.gitconfig

Fix:
- Changed agent user from UID 1000 to UID 1001
- Removed ubuntu user deletion (no longer needed - agent naturally gets 1001)
- Simplified Dockerfile by removing unnecessary steps

This aligns with commit eeaa4ba which changed K8s to UID 1001.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@nethi nethi force-pushed the fix/agent-uid-1001 branch from 73eea9f to 4bb3d4c Compare April 20, 2026 18:04
@jonwiggins jonwiggins merged commit 8a7db32 into jonwiggins:main Apr 20, 2026
7 checks passed
jplorier pushed a commit to jplorier/optio that referenced this pull request May 5, 2026
The K8s securityContext in k8s-workload-service.ts sets runAsUser=1001,
but the base image was creating the agent user with UID 1000. This
mismatch caused git to fail with permission errors because:

- Pod runs as UID 1001 (no matching user in /etc/passwd)
- No HOME directory set for UID 1001
- Git falls back to / for .gitconfig
- Permission denied: /.gitconfig

Fix:
- Changed agent user from UID 1000 to UID 1001
- Removed ubuntu user deletion (no longer needed - agent naturally gets 1001)
- Simplified Dockerfile by removing unnecessary steps

This aligns with commit eeaa4ba which changed K8s to UID 1001.

Co-authored-by: Ramesh Nethi <r.nethi@gogatewayai.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants