fix: replace worker password with per-task JWT tokens#904
Conversation
Replace the shared-password model for the internal worker user with per-task JWT tokens minted at dispatch time, eliminating the well-known default credential that grants superuser access on unpatched deployments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rely on fastapi-users' built-in "user already exists" rejection for registration — a custom guard is YAGNI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8-task plan covering config changes, executor simplification, JWT minting via WorkerTokenDep, login guard, broker cleanup, Docker/docs sweep, and integration tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ntials Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a custom /jwt/login route that shadows the fastapi-users login to reject login attempts with the internal worker email (403), while preserving normal login behavior for all other users. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removes ZNDRAW_SERVER_WORKER_PASSWORD from docker/templates/.env and the corresponding migration reference from the pydantic-settings plan. Worker authentication now uses JWT tokens, not passwords. Verification: grep -r 'worker_password', 'WORKER_PASSWORD', 'WORKER_EMAIL' all return zero hits (excluding worker-auth-jwt docs). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR replaces static worker-password authentication with per-task JWT authentication: removes the worker password env var, makes the internal worker user created/updated with a random password, adds a FastAPI dependency to mint short-lived worker JWTs at dispatch time, and updates executors and dispatch paths to accept and forward per-task tokens. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Router as Task Router
participant TokenDep as WorkerTokenDep
participant DB as Database
participant Queue as TaskIQ Queue
participant Executor as Internal Executor
Client->>Router: POST /submit_task (job)
Router->>TokenDep: get_worker_token()
TokenDep->>DB: lookup_worker_user(internal_worker_email)
DB-->>TokenDep: user
TokenDep->>TokenDep: mint short-lived JWT
TokenDep-->>Router: token
Router->>Queue: kiq(task_payload, token=token)
Queue-->>Executor: invoke task with token
Executor->>Executor: ZnDraw(base_url, token=token)
Executor-->>Client: execute extension (authenticated)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/zndraw/database.py (1)
377-386: Consider adding error handling for the session acquisition.If
lookup_worker_userraisesRuntimeError(db not initialized), the error will propagate to the request handler. This is reasonable behavior, but you may want to ensure the error message is user-friendly in the HTTP response.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/zndraw/database.py` around lines 377 - 386, The _mint_worker_token coroutine should catch runtime errors from session acquisition or lookup_worker_user and convert them into a user-friendly HTTP error; wrap the session/lookup call in a try/except for RuntimeError (and optionally Exception), log the original exception, and raise a fastapi.HTTPException with a clear message (e.g., "internal worker not available" or "database not initialized") so callers get a friendly response; keep the rest of the flow (JWTStrategy and strategy.write_token) unchanged and reference _mint_worker_token, lookup_worker_user, app.state.session_maker, and strategy.write_token when implementing the try/except and rethrowing the HTTPException.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/superpowers/plans/2026-04-01-worker-auth-jwt.md`:
- Around line 664-666: Add explicit language identifiers to the unlabeled fenced
code blocks containing the environment variable and the inline diff: change the
block with "ZNDRAW_SERVER_WORKER_PASSWORD=zndraw-worker" to use ```env and
change the block showing the `ZNDRAW_WORKER_PASSWORD` →
`ZNDRAW_SERVER_WORKER_PASSWORD` line to use ```text (or ```diff if you prefer
highlighting), ensuring the fenced backticks around those snippets include the
language identifier so markdownlint MD040 warnings are resolved; locate the
blocks by searching for the strings
"ZNDRAW_SERVER_WORKER_PASSWORD=zndraw-worker" and "ZNDRAW_WORKER_PASSWORD" to
update them.
In `@docs/superpowers/specs/2026-04-01-worker-auth-jwt-design.md`:
- Around line 43-44: The spec contains contradictory statements about
registration (one place says “no custom guard needed” in the Register section
while other places state “register should return 403 / register guards added”);
pick one canonical behavior (either rely on fastapi-users to reject duplicates
with no custom guard, or explicitly require/register custom guard that returns
403) and make all references consistent: update the Register section text, the
later statements that mention “register should return 403 / register guards
added”, any examples, and any expected HTTP status lists to reflect the chosen
behavior; ensure any references to fastapi-users behavior (e.g., duplicate-user
handling) are accurate and that the spec clearly documents the expected HTTP
status codes and guard presence/absence for functions/methods named in the spec
(Register, register guard entries, and any register-related examples).
In `@tests/zndraw/test_worker_auth.py`:
- Around line 83-101: The test mutates app.dependency_overrides by setting
app.dependency_overrides[get_worker_token] = _mint_worker_token and never
restores it, leaking a closure that captures session; wrap the override in a
try/finally (or use a context manager) to save the original override =
app.dependency_overrides.get(get_worker_token) before you set it, run the test
logic (including calling override() and the /v1/auth/users/me request), and in
finally restore app.dependency_overrides[get_worker_token] = original (or delete
the key if original is None) so _mint_worker_token/session aren’t kept across
tests.
---
Nitpick comments:
In `@src/zndraw/database.py`:
- Around line 377-386: The _mint_worker_token coroutine should catch runtime
errors from session acquisition or lookup_worker_user and convert them into a
user-friendly HTTP error; wrap the session/lookup call in a try/except for
RuntimeError (and optionally Exception), log the original exception, and raise a
fastapi.HTTPException with a clear message (e.g., "internal worker not
available" or "database not initialized") so callers get a friendly response;
keep the rest of the flow (JWTStrategy and strategy.write_token) unchanged and
reference _mint_worker_token, lookup_worker_user, app.state.session_maker, and
strategy.write_token when implementing the try/except and rethrowing the
HTTPException.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5cc38110-943f-4139-92fd-5f6e2d7efe52
📒 Files selected for processing (13)
docker/templates/.envdocs/superpowers/plans/2026-03-25-pydantic-settings-phase1.mddocs/superpowers/plans/2026-04-01-worker-auth-jwt.mddocs/superpowers/specs/2026-04-01-worker-auth-jwt-design.mdsrc/zndraw/broker.pysrc/zndraw/config.pysrc/zndraw/database.pysrc/zndraw/executor.pysrc/zndraw/routes/auth.pysrc/zndraw_joblib/dependencies.pysrc/zndraw_joblib/registry.pysrc/zndraw_joblib/router.pytests/zndraw/test_worker_auth.py
💤 Files with no reviewable changes (2)
- docker/templates/.env
- docs/superpowers/plans/2026-03-25-pydantic-settings-phase1.md
| ``` | ||
| ZNDRAW_SERVER_WORKER_PASSWORD=zndraw-worker | ||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced code blocks (MD040).
Line 664 and Line 672 use unlabeled fenced blocks, which triggers markdownlint warnings.
Suggested fix
-```
+```env
ZNDRAW_SERVER_WORKER_PASSWORD=zndraw-worker@@
- +text
ZNDRAW_WORKER_PASSWORD→ZNDRAW_SERVER_WORKER_PASSWORD
Also applies to: 672-674
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 664-664: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/superpowers/plans/2026-04-01-worker-auth-jwt.md` around lines 664 - 666,
Add explicit language identifiers to the unlabeled fenced code blocks containing
the environment variable and the inline diff: change the block with
"ZNDRAW_SERVER_WORKER_PASSWORD=zndraw-worker" to use ```env and change the block
showing the `ZNDRAW_WORKER_PASSWORD` → `ZNDRAW_SERVER_WORKER_PASSWORD` line to
use ```text (or ```diff if you prefer highlighting), ensuring the fenced
backticks around those snippets include the language identifier so markdownlint
MD040 warnings are resolved; locate the blocks by searching for the strings
"ZNDRAW_SERVER_WORKER_PASSWORD=zndraw-worker" and "ZNDRAW_WORKER_PASSWORD" to
update them.
_mint_worker_token now accepts SessionDep so FastAPI reuses the request-scoped session instead of opening a second one (which re-entered the non-reentrant SQLite asyncio.Lock and deadlocked every @internal task submission). JWTStrategy moved to closure scope. Test conftest now wires get_worker_token override (mirrors lifespan), and the test verifies the conftest-wired override — not a hand-rolled one. Also fixes FAST002 lint (Annotated), removes dead TYPE_CHECKING block, corrects stale error message, and updates spec wording. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/zndraw/routes/auth.py (1)
78-81: Note: Redundant fastapi-users login route is still included.The
get_auth_router(auth_backend)creates a/loginendpoint that will never be reached since the custom/jwt/loginroute shadows it. Consider whether to:
- Keep it as-is (harmless, provides other auth endpoints like
/logout)- Document that the login route is intentionally shadowed
This is a minor observation - the current approach works correctly and preserves other auth router endpoints.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/zndraw/routes/auth.py` around lines 78 - 81, The included fastapi-users auth router via router.include_router(fastapi_users.get_auth_router(auth_backend), prefix="/jwt") exposes a /jwt/login endpoint that duplicates and is shadowed by your custom /jwt/login, so either remove or document the redundancy; to fix, either remove the get_auth_router(...) include if you only want your custom login and re-add only needed endpoints from fastapi-users (e.g., mount specific routers or use get_register_router/get_reset_password_router), or keep the include and add a code comment in auth.py explaining that get_auth_router(auth_backend) is intentionally included to preserve other endpoints (like /jwt/logout) while the custom /jwt/login shadows the built-in login.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/superpowers/specs/2026-04-01-worker-auth-jwt-design.md`:
- Around line 43-45: The register behavior in auth.py currently mounts
fastapi_users.get_register_router() without the spec-required guard; implement a
custom POST /v1/auth/register handler that checks the internal worker email
constant (e.g., WORKER_EMAIL or the existing internal worker email value used
elsewhere) and returns HTTP 403 when the incoming registration email equals that
value, otherwise delegate to the fastapi-users registration flow (or call the
underlying create user logic); replace or shadow the existing
fastapi_users.get_register_router() registration route so the 403 check runs
first and ensure the new handler uses the same input schema and response codes
as fastapi-users.
---
Nitpick comments:
In `@src/zndraw/routes/auth.py`:
- Around line 78-81: The included fastapi-users auth router via
router.include_router(fastapi_users.get_auth_router(auth_backend),
prefix="/jwt") exposes a /jwt/login endpoint that duplicates and is shadowed by
your custom /jwt/login, so either remove or document the redundancy; to fix,
either remove the get_auth_router(...) include if you only want your custom
login and re-add only needed endpoints from fastapi-users (e.g., mount specific
routers or use get_register_router/get_reset_password_router), or keep the
include and add a code comment in auth.py explaining that
get_auth_router(auth_backend) is intentionally included to preserve other
endpoints (like /jwt/logout) while the custom /jwt/login shadows the built-in
login.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b60ba8e7-3ca2-4729-9e56-b6ffafb7bdd8
📒 Files selected for processing (7)
Dockerfiledocs/superpowers/specs/2026-04-01-worker-auth-jwt-design.mdsrc/zndraw/database.pysrc/zndraw/routes/auth.pysrc/zndraw_joblib/dependencies.pytests/zndraw/conftest.pytests/zndraw/test_worker_auth.py
🚧 Files skipped from review as they are similar to previous changes (3)
- src/zndraw_joblib/dependencies.py
- src/zndraw/database.py
- tests/zndraw/test_worker_auth.py
| - **Register** (`POST /v1/auth/register`): reject with 403 if email matches. | ||
| Registration would already fail with "user exists", but an explicit check | ||
| avoids leaking that the email is taken and makes the intent clear. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for any register guard implementation in the auth routes
rg -n "register" src/zndraw/routes/auth.py
# Check if there's a custom /register route that shadows the fastapi-users one
ast-grep --pattern 'router.post("/register")'
# Search for any 403 handling related to registration
rg -n -C3 "register.*403|403.*register" src/Repository: zincware/ZnDraw
Length of output: 122
Spec describes register guard, but implementation does not include it.
Lines 43–45 state the register endpoint should "reject with 403 if email matches" the internal worker email, and lines 152, 162, and 177 reference this behavior. However, auth.py line 83 includes fastapi_users.get_register_router() without a custom shadowing route that implements the 403 check.
The current implementation allows registration attempts with the internal worker email to fail with "user exists" rather than returning a clean 403, leaking that the email is taken.
Either implement the register guard in auth.py or update the spec to reflect that this validation is not included.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/superpowers/specs/2026-04-01-worker-auth-jwt-design.md` around lines 43
- 45, The register behavior in auth.py currently mounts
fastapi_users.get_register_router() without the spec-required guard; implement a
custom POST /v1/auth/register handler that checks the internal worker email
constant (e.g., WORKER_EMAIL or the existing internal worker email value used
elsewhere) and returns HTTP 403 when the incoming registration email equals that
value, otherwise delegate to the fastapi-users registration flow (or call the
underlying create user logic); replace or shadow the existing
fastapi_users.get_register_router() registration route so the 403 check runs
first and ensure the new handler uses the same input schema and response codes
as fastapi-users.
submit_task always resolves WorkerTokenDep, even for @global jobs. The joblib test conftest was missing the override, causing 18 test failures with NotImplementedError. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make get_worker_token a proper FastAPI dependency that reads from request.app.state instead of a stub requiring override in lifespan and test conftest. Remove the YAGNI custom /jwt/login route — the random UUID password is sufficient protection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/superpowers/specs/2026-04-01-worker-auth-jwt-design.md`:
- Around line 25-28: The spec example and implementation diverge: the spec uses
SecretStr for the generated password (password = SecretStr(str(uuid.uuid4())))
but database.py's ensure_internal_worker implementation calls
password_helper.hash(str(uuid.uuid4())) directly; update the spec example to
match the implementation by removing SecretStr (or alternatively update the
implementation to wrap the generated UUID in SecretStr before hashing if you
prefer that API), and reference the ensure_internal_worker routine and the
password_helper.hash call so the example and actual behavior are consistent.
In `@src/zndraw/database.py`:
- Around line 137-162: The helper lookup_worker_user currently duplicates the
select logic used in get_worker_token; refactor get_worker_token to call
lookup_worker_user(session, settings.internal_worker_email) instead of
re-running the select, removing the duplicated query block; ensure
get_worker_token imports lookup_worker_user, handle/translate the RuntimeError
raised by lookup_worker_user into the same behavior get_worker_token previously
had (e.g., raise HTTPException or return None as appropriate), and delete
lookup_worker_user only if you confirm it's not part of the public API or used
elsewhere.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 35c48c00-6abb-41fc-995f-e32bf538e65f
📒 Files selected for processing (5)
docs/superpowers/specs/2026-04-01-worker-auth-jwt-design.mdsrc/zndraw/database.pysrc/zndraw_joblib/dependencies.pytests/zndraw/conftest.pytests/zndraw/test_worker_auth.py
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/zndraw/test_worker_auth.py
- src/zndraw_joblib/dependencies.py
- tests/zndraw/conftest.py
| ```python | ||
| # database.py — ensure_internal_worker | ||
| password = SecretStr(str(uuid.uuid4())) | ||
| ``` |
There was a problem hiding this comment.
Minor spec/code mismatch in the code example.
The spec shows password = SecretStr(str(uuid.uuid4())), but the actual implementation in database.py directly hashes the UUID string without wrapping in SecretStr:
# Actual code (database.py line 118)
hashed = password_helper.hash(str(uuid.uuid4()))Consider updating the example to match the implementation:
# database.py — ensure_internal_worker
-password = SecretStr(str(uuid.uuid4()))
+hashed = password_helper.hash(str(uuid.uuid4()))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/superpowers/specs/2026-04-01-worker-auth-jwt-design.md` around lines 25
- 28, The spec example and implementation diverge: the spec uses SecretStr for
the generated password (password = SecretStr(str(uuid.uuid4()))) but
database.py's ensure_internal_worker implementation calls
password_helper.hash(str(uuid.uuid4())) directly; update the spec example to
match the implementation by removing SecretStr (or alternatively update the
implementation to wrap the generated UUID in SecretStr before hashing if you
prefer that API), and reference the ensure_internal_worker routine and the
password_helper.hash call so the example and actual behavior are consistent.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #904 +/- ##
==========================================
+ Coverage 91.71% 91.78% +0.07%
==========================================
Files 245 246 +1
Lines 22839 22881 +42
==========================================
+ Hits 20946 21001 +55
+ Misses 1893 1880 -13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Query logic now lives solely in get_worker_token (zndraw_joblib). Calling back into zndraw from zndraw_joblib would create a circular import, so the helper had no callers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/zndraw/database.py`:
- Around line 120-135: When updating an existing internal worker (variable
existing of class User) ensure you set all the same flags as in the create path:
set existing.hashed_password = hashed, existing.is_active = True,
existing.is_verified = True, and existing.is_superuser = True before calling
await session.commit(); update the log message calls (log.debug/log.info) remain
the same but ensure the update branch mirrors the create branch's flags so a
previously disabled or unverified user is repaired on startup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| email: str, | ||
| ) -> None: | ||
| """Create or update the internal worker superuser. | ||
|
|
||
| Idempotent — safe to call on every startup. | ||
| Idempotent — safe to call on every startup. The password is a random | ||
| UUID generated each time — it is never used for login. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| session | ||
| Async database session. | ||
| password | ||
| Worker password from ``Settings.worker_password``. | ||
| Internal worker email from ``Settings.internal_worker_email``. | ||
| """ | ||
| password_helper = PasswordHelper() | ||
|
|
||
| result = await session.execute( | ||
| select(User).where(User.email == WORKER_EMAIL) # type: ignore[arg-type] | ||
| select(User).where(User.email == email) # type: ignore[arg-type] | ||
| ) | ||
| existing = result.scalar_one_or_none() | ||
|
|
||
| hashed = password_helper.hash(password.get_secret_value()) | ||
| hashed = password_helper.hash(str(uuid.uuid4())) | ||
|
|
||
| if existing is None: | ||
| worker = User( | ||
| email=WORKER_EMAIL, | ||
| email=email, | ||
| hashed_password=hashed, | ||
| is_active=True, | ||
| is_superuser=True, |
There was a problem hiding this comment.
Don't identify the internal worker by arbitrary email match.
ensure_internal_worker() now trusts settings.internal_worker_email as the worker's identity. If that setting changes during an upgrade, the legacy worker@internal.user row is never rotated or disabled, which undermines the shipped-credential remediation. If it collides with a real user, startup will reset that user's password and force is_superuser=True. This needs a dedicated internal-worker marker or a fixed non-overridable identity, not an email-only rewrite.
Also applies to: 132-135, 176-176
Set is_active and is_verified in the update branch so a previously disabled or unverified worker user is repaired on startup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
worker@internal.user) was created with a well-known default password (zndraw-worker) that granted full superuser access on any deployment that hadn't changed it. Confirmed exploitable onzndraw.icp.uni-stuttgart.de.ZnDraw(token=...)directly.Changes
config.pyworker_password, addinternal_worker_emaildatabase.pylookup_worker_user, wireWorkerTokenDepoverrideexecutor.pytokenper-call instead of holding credentialsregistry.pytokento protocol, task fn signature, and closuredependencies.pyWorkerTokenDep/get_worker_tokenrouter.pyWorkerTokenDep, pass token tokiq()broker.pybase_urlroutes/auth.pydocker/templates/.envZNDRAW_SERVER_WORKER_PASSWORDTest plan
test_worker_login_blocked— worker email returns 403test_regular_login_still_works— normal users unaffectedtest_worker_token_dep_mints_valid_jwt— JWT authenticates as superuser workergrep -r worker_passwordreturns zero hits (excluding spec/plan docs)ZNDRAW_SERVER_WORKER_PASSWORDenv var🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Security
Configuration
Tests & Docs