Skip to content

fix(control-plane): authorize execution note writes(#420)#575

Open
Luffy2208 wants to merge 2 commits into
Agent-Field:mainfrom
Luffy2208:fix/420-execution-notes-authorization
Open

fix(control-plane): authorize execution note writes(#420)#575
Luffy2208 wants to merge 2 commits into
Agent-Field:mainfrom
Luffy2208:fix/420-execution-notes-authorization

Conversation

@Luffy2208
Copy link
Copy Markdown
Contributor

Summary

Fixes an IDOR in the execution notes write endpoint by enforcing execution ownership before appending a note.

File-specific changes:

  • control-plane/internal/handlers/execution_notes.go

    • Resolves the caller agent identity before writing a note.
    • Compares the caller agent ID with the execution owner, execution.AgentNodeID.
    • Returns 403 execution_ownership_mismatch when the caller does not own the execution.
    • Supports DID-authenticated callers by resolving the verified caller DID to an agent ID.
  • control-plane/internal/server/middleware/auth.go

    • Stores API-key caller identity in Gin context using CallerAgentIDKey.
    • Uses X-Caller-Agent-ID first, with X-Agent-Node-ID as fallback.
  • control-plane/internal/handlers/execution_notes_test.go

    • Adds coverage for owner write success.
    • Adds coverage for non-owner API-key write returning 403.
    • Adds coverage for DID-authenticated owner and non-owner behavior.
  • control-plane/internal/server/middleware/auth_test.go

    • Adds coverage that API-key auth populates caller identity in Gin context.
  • control-plane/internal/handlers/coverage_handlers_90_test.go

    • Updates the existing successful note-write coverage test to include matching caller identity.

Testing

  • ./scripts/test-all.sh
  • Additional verification:
    • cd control-plane && go test ./internal/handlers ./internal/server/middleware
    • cd control-plane && go test ./internal/handlers ./internal/server/middleware -coverprofile=/tmp/issue-420.coverprofile
    • cd control-plane && golangci-lint run --new-from-rev=upstream/main ./internal/handlers ./internal/server/middleware
    • Manual E2E curl verification:
      • Agent A adding a note to Agent A’s execution returns 200 OK
      • Agent A adding a note to Agent B’s execution returns 403 Forbidden
      • Agent B’s fresh execution remains with notes: [] after the blocked write

Note: full repo lint currently reports pre-existing unrelated Go lint issues outside this PR’s changed files. Changed-line lint for the touched packages reports 0 issues.

Checklist

  • I updated documentation where applicable.
  • I added or updated tests (or none were needed).
  • I updated CHANGELOG.md (or this change does not warrant a changelog entry).

Screenshots (if UI-related)

Not UI-related.

Related issues

Fixes #420

@Luffy2208 Luffy2208 requested review from a team and AbirAbbas as code owners May 19, 2026 03:44
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

📊 Coverage gate

Thresholds from .coverage-gate.toml: per-surface ≥ 86%, aggregate ≥ 88%, max per-surface regression ≤ 1.0 pp, max aggregate regression ≤ 0.50 pp.

Surface Current Baseline Δ
control-plane 87.50% 87.30% ↑ +0.20 pp 🟡
sdk-go 91.80% 90.70% ↑ +1.10 pp 🟢
sdk-python 93.73% 93.63% ↑ +0.10 pp 🟢
sdk-typescript 92.80% 92.56% ↑ +0.24 pp 🟢
web-ui 89.91% 90.01% ↓ -0.10 pp 🟡
aggregate 89.02% 89.01% ↑ +0.01 pp 🟡

✅ Gate passed

No surface regressed past the allowed threshold and the aggregate stayed above the floor.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

📐 Patch coverage gate

Threshold: 80% on lines this PR touches vs origin/main (from .coverage-gate.toml:thresholds.min_patch).

Surface Touched lines Patch coverage Status
control-plane 76 100.00%
sdk-go 0 ➖ no changes
sdk-python 0 ➖ no changes
sdk-typescript 0 ➖ no changes
web-ui 0 ➖ no changes

✅ Patch gate passed

Every surface whose lines were touched by this PR has patch coverage at or above the threshold.

@Luffy2208 Luffy2208 force-pushed the fix/420-execution-notes-authorization branch from b9fb5f5 to 72f59ee Compare May 19, 2026 04:01
@Luffy2208
Copy link
Copy Markdown
Contributor Author

I fixed the patch coverage failure by adding targeted tests in control-plane/internal/handlers/execution_notes_test.go.

What was added:

  • Tests for missing execution owner returning 403
  • Tests for missing caller identity returning 403
  • Tests for DID resolver failure returning 500
  • Tests for unresolved DID fail-closed behavior returning 403
  • Tests for caller identity resolution from:
    • Gin context CallerAgentIDKey
    • X-Caller-Agent-ID
    • X-Agent-Node-ID
    • DID list fallback
    • DID lookup error fallback
  • Direct coverage for executionNoteAuthorizationError.Error()

Copy link
Copy Markdown
Member

@santoshkumarradha santoshkumarradha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 PR-AF Review — Needs Major Rework

Automated multi-agent code review · PR-AF built with AgentField

10 findings · 🔴 3 critical · 🟠 4 important · 🔵 0 suggestions · ⚪ 0 nitpicks

PR Overview

Summary

Fixes an IDOR in the execution notes write endpoint by enforcing execution ownership before appending a note.

File-specific changes:

  • control-plane/internal/handlers/execution_notes.go

    • Resolves the caller agent identity before writing a note.
    • Compares the caller agent ID with the execution owner, execution.AgentNodeID.
    • Returns 403 execution_ownership_mismatch when the caller does not own the execution.
    • Supports DID-authenticated callers by resolving the verified caller DID to an agent ID.
  • control-plane/internal/server/middleware/auth.go

    • Stores API-key caller identity in Gin context using CallerAgentIDKey.
    • Uses X-Caller-Agent-ID first, with X-Agent-Node-ID as fallback.
  • control-plane/internal/handlers/execution_notes_test.go

    • Adds coverage for owner write success.
    • Adds coverage for non-owner API-key write returning 403.
    • Adds coverage for DID-authenticated owner and non-owner behavior.
  • control-plane/internal/server/middleware/auth_test.go

    • Adds coverage that API-key auth populates caller identity in Gin context.
  • control-plane/internal/handlers/coverage_handlers_90_test.go

    • Updates the existing successful note-write coverage test to include matching caller identity.

Testing

  • ./scripts/test-all.sh
  • Additional verification:
    • cd control-plane && go test ./internal/handlers ./internal/server/middleware
    • cd control-plane && go test ./internal/handlers ./internal/server/middleware -coverprofile=/tmp/issue-420.coverprofile
    • cd control-plane && golangci-lint run --new-from-rev=upstream/main ./internal/handlers ./internal/server/middleware
    • Manual E2E curl verification:
      • Agent A adding a note to Agent A’s execution returns 200 OK
      • Agent A adding a note to Agent B’s execution returns 403 Forbidden
      • Agent B’s fresh execution remains with notes: [] after the blocked write

Note: full repo lint currently reports pre-existing unrelated Go lint issues outside this PR’s changed files. Changed-line lint for the touched packages reports 0 issues.

Checklist

  • I updated documentation where applicable.
  • I added or updated tests (or none were needed).
  • I updated CHANGELOG.md (or this change does not warrant a changelog entry).

Screenshots (if UI-related)

Not UI-related.

Related issues

Fixes #420

Key Findings

7 issue(s) should be addressed before merge:

  • 🔴 Complete ownership enforcement bypass when APIKey is empty and DID auth is off — the default configuration (control-plane/internal/server/middleware/auth.go:24) — When AuthConfig.APIKey is empty (the default in ALL deployment configurations: config/agentfield.yaml, Docker Compose at deployments/docker/docker-compose.yml, and Helm at `deployments/helm/agen…
  • 🔴 Three-tier identity fallback in execution notes handler has no fail-closed mechanism: raw-header tier silently becomes primary identity source when all upstream auth middleware is configuration-disabled (control-plane/internal/handlers/execution_notes.go:184) — The execution notes handler's executionNoteCallerAgentID implements a three-tier identity resolution cascade: (1) verified DID from DIDAuthMiddleware, (2) CallerAgentIDKey context value from APIKeyAut…
  • 🔴 Fixing the default authentication bypass by enabling DID auth silently activates diverging DID resolution paths — a 'fix F4, expose F1' trap (control-plane/internal/handlers/execution_notes.go:184) — F4 establishes that under the default configuration (APIKey empty AND did_auth_enabled false), the execution notes handler accepts unvalidated raw headers as caller identity — a complete ownership enf…
  • 🟠 GetExecutionNotesHandler leaks execution notes to any authenticated caller with no ownership enforcement (control-plane/internal/handlers/execution_notes.go:235) — The PR fixes an IDOR on the write path (AddExecutionNoteHandler) by enforcing execution ownership, but the read path (GetExecutionNotesHandler) remains completely open: any API-key-authenticated calle…
  • 🟠 DID auth middleware provides no defense against no-auth bypass — attacker can simply omit DID headers (control-plane/internal/server/middleware/did_auth.go:177) — The assignment question asks: "Does the DIDAuthMiddleware at routes_middleware.go:77-88 provide any defense when API key auth is off?" No, not in any meaningful way. If `DID.Enabled && DIDAuthEna…
  • 🟠 Two-tier DID resolution reads semantically-different field names (AgentID vs AgentNodeID) from independent tables with no structural equivalence guarantee (control-plane/internal/handlers/execution_notes.go:206) — The function resolveExecutionNoteAgentIDByDID at line 206 resolves a caller DID to an agent identifier through two independent code paths that read *differently-named columns from different tables
  • 🟠 Type-unsafe CallerAgentIDKey enables silent reversion to attacker-controlled raw-header identity when any middleware writes a non-string value — turning a compile-time type error into a runtime authentication bypass (control-plane/internal/handlers/execution_notes.go:189) — The combination of F2 (CallerAgentIDKey accepts any value type via Gin's c.Set(key string, value any)) and F4 (executionNoteCallerAgentID falls through to raw X-Caller-Agent-ID / X-Agent-Node-ID hea…

Files with findings: control-plane/internal/handlers/execution_notes.go, control-plane/internal/server/middleware/auth.go, control-plane/internal/server/middleware/did_auth.go

All Findings by Severity

🔴 Critical (3)

  • Complete ownership enforcement bypass when APIKey is empty and DID auth is off — the default configuration control-plane/internal/server/middleware/auth.go:24
  • Three-tier identity fallback in execution notes handler has no fail-closed mechanism: raw-header tier silently becomes primary identity source when all upstream auth middleware is configuration-disabled control-plane/internal/handlers/execution_notes.go:184
  • Fixing the default authentication bypass by enabling DID auth silently activates diverging DID resolution paths — a 'fix F4, expose F1' trap control-plane/internal/handlers/execution_notes.go:184

🟠 Important (4)

  • GetExecutionNotesHandler leaks execution notes to any authenticated caller with no ownership enforcement control-plane/internal/handlers/execution_notes.go:235
  • DID auth middleware provides no defense against no-auth bypass — attacker can simply omit DID headers control-plane/internal/server/middleware/did_auth.go:177
  • Two-tier DID resolution reads semantically-different field names (AgentID vs AgentNodeID) from independent tables with no structural equivalence guarantee control-plane/internal/handlers/execution_notes.go:206
  • Type-unsafe CallerAgentIDKey enables silent reversion to attacker-controlled raw-header identity when any middleware writes a non-string value — turning a compile-time type error into a runtime authentication bypass control-plane/internal/handlers/execution_notes.go:189
Review Process Details

Dimensions Analyzed (15):

  • No-Auth Mode Identity Spoofing Bypass — 4 file(s)
  • CallerAgentIDKey Context Semantics Collision — 3 file(s)
  • GET/POST Execution Notes Authorization Asymmetry — 2 file(s)
  • DID Resolution Silent Degradation — Error vs Not-Found Conflation — 3 file(s)
  • Context.Background() in GET Handler Bypasses Request Timeout — 2 file(s)
  • DID resolution field-name divergence: AgentID vs AgentNodeID — 3 file(s)
  • CallerAgentIDKey context type contract: non-string write → silent fallback — 3 file(s)
  • No-auth middleware bypass: mechanical trace from empty APIKey to unauthenticated header reads — 4 file(s)
  • Storage error propagation contract: errors.As dependency on UpdateExecutionRecord fidelity — 2 file(s)
  • Context.Background() drift in GET handler: deadline propagation gap vs POST handler — 2 file(s)
  • No-auth bypass of execution ownership enforcement via raw header fallback — 4 file(s)
  • Triplicated caller resolution logic with diverged priority chains sharing one context namespace — 3 file(s)
  • Auth error classification breaks if production storage provider wraps or replaces closure errors — 2 file(s)
  • APIKeyAuth global broadcast of CallerAgentIDKey changes semantics for ALL authenticated routes — 2 file(s)
  • DID-to-agent ID resolution produces different identifiers depending on resolution path (AgentID vs AgentNodeID field mismatch) — 3 file(s)

Meta-Dimension Lenses (3):

  • Semantic — 5 dimension(s), 92% coverage confidence
  • Mechanical — 5 dimension(s), 92% coverage confidence
  • Systemic — 5 dimension(s), 92% coverage confidence

Cross-Reference & Adversary Analysis:

  • 6 compound finding(s) synthesized
  • 12 finding(s) adversarially tested: 12 confirmed, 0 challenged
Pipeline Stats
Metric Value
Duration 4643.7s
Agent invocations 65
Coverage iterations 0
Estimated cost N/A (provider does not report cost)
Budget exhausted No
PR type bugfix
Complexity standard

Review ID: rev_b6e41625c18a


return nil
}

func executionNoteCallerAgentID(ctx context.Context, c *gin.Context, storageProvider ExecutionNoteStorage) (string, error) {
Copy link
Copy Markdown
Member

@santoshkumarradha santoshkumarradha May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 [CRITICAL] Raw-header fallback becomes sole identity source under default config

executionNoteCallerAgentID has 3 tiers: (1) verified DID, (2) CallerAgentIDKey context value, (3) raw X-Caller-Agent-ID/X-Agent-Node-ID headers. Tiers 1 & 2 are config-gated. Under defaults (APIKey="", did_auth_enabled=false), both are skipped — tier 3 accepts attacker-controlled headers with zero validation, flowing directly to ensureExecutionNoteOwnership.

Evidence:

  • routes_middleware.go:77 — DID middleware not installed when disabled
  • auth.go:26-28 — APIKeyAuth no-ops when APIKey=="", never sets context key
  • execution_notes.go:196-201 — raw header read with no validation
  • When APIKeyAuth does run, it reads the same headers (auth.go:118-124), so the fallback is either dead code or active bypass — never legitimate

Fix: Delete the raw-header fallback. Add a startup assertion in routes_middleware.go that refuses to register write routes when both auth methods are disabled.


Compound Analysis · confidence 95%

🤖 Reviewed by AgentField PR-AF


// GetExecutionNotesHandler handles GET /api/v1/executions/:execution_id/notes
// Retrieves notes for a specific execution with optional tag filtering
func GetExecutionNotesHandler(storageProvider ExecutionNoteStorage) gin.HandlerFunc {
Copy link
Copy Markdown
Member

@santoshkumarradha santoshkumarradha May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 [IMPORTANT] Read path leaks execution notes — no ownership check

PR fixes IDOR on write path but GetExecutionNotesHandler (line 235) remains open: any API-key-authenticated caller can read any execution's notes by ID. Notes carry workflow state (phase progress, intermediate results, confidence reasoning — see examples/python_agent_nodes/agentic_rag/main.py:912-1097).

storageProvider.GetExecutionRecord() is called with no caller identity resolution or comparison against execution.AgentNodeID — same IDOR pattern just fixed on write.

PR mentions this is "deliberately NOT modified" but provides no rationale, no tests, no code comment. Likely oversight.

Fix: Mirror write path — resolve caller via executionNoteCallerAgentID, then ensureExecutionNoteOwnership, return 403 on mismatch. If intentional, document with code comment + test confirming open-read is the contract.


Authorization Asymmetry · confidence 95%

🤖 Reviewed by AgentField PR-AF

return nil
}

func executionNoteCallerAgentID(ctx context.Context, c *gin.Context, storageProvider ExecutionNoteStorage) (string, error) {
Copy link
Copy Markdown
Member

@santoshkumarradha santoshkumarradha May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 [CRITICAL] Fix-F4-expose-F1 trap: enabling DID auth activates divergent resolution paths

Operator fixing F4 (default-config bypass) by setting did_auth_enabled=true unknowingly activates two DID→AgentNodeID resolution chains that read different stores:

  • resolveExecutionNoteAgentIDByDID (execution_notes.go:206-230) → SQL via GetDIDDocument + ListAgentDIDs from agent_dids table
  • DIDService.ResolveAgentIDByDID (did_service.go:500-518) → in-memory registry.AgentNodes (used by MemoryPermissionMiddleware, memory_permission.go:140)

No transactional sync between SQL and in-memory registry. During register/unregister race windows, the two stores can return different AgentNodeID values for the same DID — causing false 403s or cross-agent writes.

Fix: (1) Unify DID resolution into a single source of truth. (2) Add startup assertion when write routes registered but no auth enabled. (3) Annotate resolved identity with provenance for audit logging.


Compound Analysis · confidence 82%

🤖 Reviewed by AgentField PR-AF

return "", nil
}

func resolveExecutionNoteAgentIDByDID(ctx context.Context, storageProvider ExecutionNoteStorage, callerDID string) (string, error) {
Copy link
Copy Markdown
Member

@santoshkumarradha santoshkumarradha May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 [IMPORTANT] DID resolution reads differently-named columns from independent tables

resolveExecutionNoteAgentIDByDID (line 206) has two paths returning what should be the same identifier:

  • Path 1 (line 209): record.AgentID from did_documents.agent_id (migration 019)
  • Path 2 (line 226): info.AgentNodeID from agent_dids.agent_node_id (migration 002)

Both compared against execution.AgentNodeID. They converge today only because nodes_register.go:1027,1044 populates both from newNode.ID — pure code coincidence. No FK, unique constraint, or comment ties them. A future migration touching one but not the other silently breaks resolution, causing false 403s.

The naming divergence (AgentID vs AgentNodeID) obscures the invariant.

Fix: (1) Document the equivalence on DIDDocumentRecord.AgentID. (2) Rename to AgentNodeID for consistency. (3) Add invariant test that creates entries in both tables and verifies both paths return the same result.


Field-name divergence · confidence 85%

🤖 Reviewed by AgentField PR-AF

return resolveExecutionNoteAgentIDByDID(ctx, storageProvider, callerDID)
}

if callerID, exists := c.Get(string(middleware.CallerAgentIDKey)); exists {
Copy link
Copy Markdown
Member

@santoshkumarradha santoshkumarradha May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 [IMPORTANT] Type-unsafe CallerAgentIDKey enables silent fallback to attacker headers

c.Set(string, any) accepts any value type — only convention enforces string. The handler at line 190 does id, ok := callerID.(string); on a non-string value, ok=false, id="", the inner block is skipped, and control falls through to raw X-Caller-Agent-ID / X-Agent-Node-ID headers (lines 196-201) with no log or error.

Attack path: an authenticated request hits a future middleware bug that writes a non-string to the key → identity silently discarded → handler reads attacker-controlled headers → ownership check passes if attacker knows the execution ID.

Proof: context_helpers_additional_test.go:19-20 shows c.Set(string(CallerAgentIDKey), 42) compiles cleanly and GetCallerAgentID returns "" — the language permits the bug.

Fix: (1) Typed setter SetCallerAgentID(c, agentID string) + forbidigo lint rule banning raw c.Set on this key. (2) When exists==true && ok==false, log WARN and return error — never fall to raw headers. (3) Remove the raw-header fallback entirely (see related comment).


Compound Analysis · confidence 70%

🤖 Reviewed by AgentField PR-AF


func resolveExecutionNoteAgentIDByDID(ctx context.Context, storageProvider ExecutionNoteStorage, callerDID string) (string, error) {
if lookup, ok := storageProvider.(executionNoteDIDDocumentLookup); ok {
if record, err := lookup.GetDIDDocument(ctx, callerDID); err == nil && record != nil {
Copy link
Copy Markdown
Member

@santoshkumarradha santoshkumarradha May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Revoked DIDs still pass ownership check

At line 208, GetDIDDocument result is accepted on err == nil && record != nil with no record.IsRevoked() check. LocalStorage.GetDIDDocument (local.go:8310-8328) returns revoked records with RevokedAt populated but err==nil. Compare GetDIDDocumentByAgentID (local.go:8345) which adds AND revoked_at IS NULL.

A caller holding a revoked DID can still resolve to their old agent ID and pass ownership checks on former executions.

Fix: After successful retrieval, if record.IsRevoked() { /* fall through or error */ }. Or push the filter into the SQL query as GetDIDDocumentByAgentID does.


Security · confidence 95%

🤖 Reviewed by AgentField PR-AF

}

func resolveExecutionNoteAgentIDByDID(ctx context.Context, storageProvider ExecutionNoteStorage, callerDID string) (string, error) {
if lookup, ok := storageProvider.(executionNoteDIDDocumentLookup); ok {
Copy link
Copy Markdown
Member

@santoshkumarradha santoshkumarradha May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] GetDIDDocument errors swallowed; transient failures masked as not-found

resolveExecutionNoteAgentIDByDID (lines 207-211) checks err == nil && record != nil. GetDIDDocument (local.go:8323-8328) returns the same error type for both not-found and DB failures with no sentinel. Transient DB failure → falls through to ListAgentDIDs → if that also returns empty, client gets 403 "caller agent identity is required" while the real cause is infrastructure. Error info is lost.

Fix: (a) sentinel ErrDIDNotFound so only genuine not-found triggers fallback; or (b) log the GetDIDDocument error before falling through so operators can detect degradation.


Reliability · confidence 85%

🤖 Reviewed by AgentField PR-AF

@santoshkumarradha
Copy link
Copy Markdown
Member

@Luffy2208, just a heads-up that we're currently evaluating the review quality of https://github.com/Agent-Field/pr-af. If you notice any of these automated findings are off-base, noisy, or unhelpful, please let us know! Your feedback would be super helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security] Execution notes write endpoint has no authorization (IDOR)

2 participants