fix(fleet-mcp): security hardening for the MCP server#47908
Conversation
- Startup token-posture check (WhoAmI -> GET /me) logs the principal and warns when FLEET_API_KEY is not API-only or is write-capable (global or per-fleet role). - Replace the per-IP/X-Forwarded-For rate limiter with a global token-bucket (local flood backstop, no client-IP attribution) and enforce MCP_AUTH_TOKEN >= 32 chars (the real brute-force defense). - Refuse a link-local/cloud-metadata FLEET_BASE_URL at startup; schema fetch refuses cross-host redirects. - Trim raw Fleet error bodies returned to the client/LLM. - Remove dead GetFleetConfig; fix README tool counts (19/17); retain auth+rate-limit tests and add TestValidateFleetBaseURL + TestFleetIdentityPrivilege.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #47908 +/- ##
==========================================
+ Coverage 67.23% 67.31% +0.07%
==========================================
Files 3637 3640 +3
Lines 229947 230653 +706
Branches 11968 11968
==========================================
+ Hits 154608 155259 +651
+ Misses 61454 61432 -22
- Partials 13885 13962 +77
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Generic make targets (build/tools/posture/call/sse) to drive the MCP from any shell (jq optional, no language runtime needed). Add .env to .gitignore so a real FLEET_API_KEY/MCP_AUTH_TOKEN can't be committed via the documented 'cp .env.example .env' workflow.
…onfirmation - Refuse startup unless FLEET_API_KEY is an API-only Fleet user (fails closed if /me can't confirm); still warn-only on write-capability. - MCP_RATE_LIMIT_MODE selects "global" (default, shared bucket) or "ip" (one bucket per TCP peer / RemoteAddr — never reads the spoofable X-Forwarded-For), which resolves the Finding C bypass while keeping per-IP for internal networks. - fleetMCPInstructions now directs clients to surface the SQL + target scope and confirm before create_saved_query / run_live_query (advisory client guidance).
| // GetFleetConfig retrieves the Fleet server configuration. | ||
| func (fc *FleetClient) GetFleetConfig(ctx context.Context) (map[string]interface{}, error) { | ||
| resp, err := fc.makeFleetRequest(ctx, "GET", "/api/v1/fleet/config", nil) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to get fleet config: %w", err) | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| return nil, fmt.Errorf("failed to get fleet config: status code %d", resp.StatusCode) | ||
| } | ||
|
|
||
| var result map[string]interface{} | ||
| if err := json.NewDecoder(resp.Body).Decode(&result); err != nil { | ||
| return nil, fmt.Errorf("failed to decode fleet config: %w", err) | ||
| } | ||
| return result, nil | ||
| } |
There was a problem hiding this comment.
This piece of code was unused
…Makefile polish - Drop the read/write rights inference (privilege(), role/teams parsing) and its warning. The startup /me check now ONLY enforces API-only (fails closed) and logs a factual confirmation; who-can-do-what is documented (Fleet RBAC), not guessed. Removes the obsolete TestFleetIdentityPrivilege. - README: link Fleet's API-only-user docs; add an MCP_RATE_LIMIT_MODE row and Render guidance (use the default `global`; `ip` would collapse to the proxy IP); fix a stale per-IP rate-limit note. - Makefile: note the API-only requirement + MCP_RATE_LIMIT_MODE; `make tools` now surfaces a startup error instead of printing nothing.
- Rename checkTokenPosture → requireAPIOnlyUser: it now only enforces the API-only requirement (the read/write rights inference was already removed). - Add a signal-aware root context (SIGINT/SIGTERM) and thread it through the startup /me check, the temp-query sweep, and the stdio/SSE serving loops. SSE now drains in-flight requests via http.Server.Shutdown (10s cap) before exiting. Before: SIGTERM = immediate exit 143, in-flight connections reset. After: "shutting down" → drain → exit 0. Matters for Render redeploys.
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
There was a problem hiding this comment.
Pull request overview
Hardens the experimental tools/fleet-mcp server against several security findings by tightening token posture checks, mitigating SSRF/token egress risks, reducing error-body leakage, improving SSE rate limiting semantics, and adding graceful shutdown behavior.
Changes:
- Enforces stronger startup security posture: API-only Fleet token required,
MCP_AUTH_TOKENminimum length, and link-local/metadata base URL rejection. - Reworks SSE rate limiting to support
global(default) vsipmodes, removing reliance onX-Forwarded-For. - Adds redirect hardening for schema fetches, improves shutdown handling on SIGTERM/SIGINT, updates docs, and expands tests.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/fleet-mcp/schema.go | Refuses cross-host redirects during canonical schema fetch to reduce SSRF/token egress risk. |
| tools/fleet-mcp/README.md | Documents new security model decisions (API-only requirement, rate-limit modes, token strength) and tool count updates. |
| tools/fleet-mcp/rate_limit.go | Introduces global vs ip rate-limiting modes and removes XFF-based client identification. |
| tools/fleet-mcp/rate_limit_test.go | Removes tests tied to prior XFF/trusted-proxy behavior. |
| tools/fleet-mcp/mcp_server.go | Expands client instructions to require explicit operator confirmation before write tools. |
| tools/fleet-mcp/Makefile | Adds local dev/test helpers for stdio tool calls, listing tools, and running SSE. |
| tools/fleet-mcp/main.go | Adds token strength check, API-only startup verification, ctx-aware shutdown, and graceful HTTP server shutdown. |
| tools/fleet-mcp/fleet_integration.go | Adds base URL validation, /me identity check, and trims Fleet error bodies. Removes dead GetFleetConfig. |
| tools/fleet-mcp/fleet_integration_test.go | Updates/extends tests for new rate limiter modes and base URL validation. |
| tools/fleet-mcp/config.go | Adds MCP_RATE_LIMIT_MODE config plumbing. |
| tools/fleet-mcp/.gitignore | Ignores .env for safer local config handling. |
| tools/fleet-mcp/.env.example | Documents new env vars and least-privilege guidance for Fleet tokens. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
WalkthroughThis PR hardens the Fleet MCP server across several dimensions. The rate limiter in Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tools/fleet-mcp/rate_limit.go (1)
107-156: 💤 Low valueBackground sweep goroutine runs indefinitely without a shutdown mechanism.
The
sweepLoopgoroutine started innewPerIPRateLimiterhas no way to be stopped. While this is acceptable for the current server lifecycle (the limiter lives until process exit), it prevents clean shutdown in tests and makes the type unsuitable for reuse in contexts where the limiter might be replaced or stopped.Consider accepting a context for cancellation if you anticipate needing cleaner teardown in future or for testing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/fleet-mcp/rate_limit.go` around lines 107 - 156, The sweepLoop goroutine launched in newPerIPRateLimiter runs indefinitely without a way to stop it, which prevents clean shutdown in tests and makes the perIPRateLimiter unsuitable for reuse where it might need to be replaced or stopped. Modify newPerIPRateLimiter to accept a context parameter and pass it to sweepLoop, then update the sweepLoop method to listen for context cancellation alongside the ticker—when the context is cancelled (via ctx.Done()), the loop should break and the goroutine should exit cleanly instead of running forever.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tools/fleet-mcp/rate_limit.go`:
- Around line 107-156: The sweepLoop goroutine launched in newPerIPRateLimiter
runs indefinitely without a way to stop it, which prevents clean shutdown in
tests and makes the perIPRateLimiter unsuitable for reuse where it might need to
be replaced or stopped. Modify newPerIPRateLimiter to accept a context parameter
and pass it to sweepLoop, then update the sweepLoop method to listen for context
cancellation alongside the ticker—when the context is cancelled (via
ctx.Done()), the loop should break and the goroutine should exit cleanly instead
of running forever.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 06b57476-d1f8-4c33-bc8e-c1eff8f8452c
⛔ Files ignored due to path filters (1)
tools/fleet-mcp/README.mdis excluded by!**/*.md
📒 Files selected for processing (11)
tools/fleet-mcp/.env.exampletools/fleet-mcp/.gitignoretools/fleet-mcp/Makefiletools/fleet-mcp/config.gotools/fleet-mcp/fleet_integration.gotools/fleet-mcp/fleet_integration_test.gotools/fleet-mcp/main.gotools/fleet-mcp/mcp_server.gotools/fleet-mcp/rate_limit.gotools/fleet-mcp/rate_limit_test.gotools/fleet-mcp/schema.go
💤 Files with no reviewable changes (1)
- tools/fleet-mcp/rate_limit_test.go
- /healthz: unauthenticated liveness probe (healthzHandler), carved out ahead of auth and the rate limiter via a ServeMux so orchestrators (Render, k8s) can always confirm the process is up — even under a flood. /sse and /message still go through the full middleware chain unchanged. - README: note that the SSE listener is plain HTTP, so TLS must be terminated in front (Render edge / reverse proxy); document the /healthz path for probes.
Render defaults to a TCP/port-open check; healthCheckPath makes it probe the new /healthz endpoint so deploys gate on a real 200, not just an open port.
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
Related issue: Relates to #43544
Security review + hardening of the experimental Fleet MCP server. Issues were reproduced against a live dev Fleet; this PR fixes the MCP-layer ones.
/mecheck refuses any non-API-only token (fails closed if unreachable). API-only users can be scoped to specific endpoints/teams and their token revoked. Who-can-do-what is documented (Fleet RBAC), not inferred at runtime.main.gorequireAPIOnlyUser;fleet_integration.goWhoAmIX-Forwarded-For→ rotating it bypassed (0×429). NowMCP_RATE_LIMIT_MODE=global(default) |ip(keyed on TCP peerRemoteAddr, never XFF) → rotating XFF throttles (207×429/250). +MCP_AUTH_TOKEN≥ 32.rate_limit.go,config.go,main.goFLEET_BASE_URLunchecked → refuse link-local/metadata at startup; schema fetch blocks cross-host redirects.fleet_integration.govalidateFleetBaseURL;schema.gouuid) → trimmed, e.g.Fleet API returned HTTP 409: Resource Already Exists.fleet_integration.gofleetErrMsgGetFleetConfig; README 18→19 / 16→17; added validator/role/rate-limit tests.fleet_integration.go,README.md,*_test.godestructive=trueannotation →fleetMCPInstructionsnow tells the client to show the SQL + targets and confirm before writes. Advisory, not a server control.mcp_server.goListenAndServeblocked with no signal handling → on SIGTERM the process was killed immediately (exit 143), resetting any in-flight connection at once. Now a signal-aware root context (signal.NotifyContext) drains in-flight requests viahttp.Server.Shutdown(10s cap), logsshutting down, and exits 0 — and the startup/mecheck + temp-query sweep + stdio loop all honor it. Matters for Render redeploys (SIGTERM).main.goKey decisions
ipkeys on the unspoofable TCP peer (for direct-connect internal nets);global(default) suits anything behind a proxy/WAF, where per-client DoS limiting belongs.create_saved_queryandrun_live_query) stay enabled. Multi-host live query needsPOST /reports→ run → delete (no read-only equivalent), so read-only-by-default would break it. Bounded by the token's role (observer ⇒ Fleet403s writes), the advisory confirm (F), and agent--disable_tables.fleetMCPInstructions/destructive=trueare advisory — a prompt-injected or raw client ignores them. The enforceable controls are the token role and--disable_tables.curl/carvesexfil is a Fleet/osquery capability, equally reachable via the UI/fleetctl/REST — not an MCP bug. Comprehensive fix is agent--disable_tables(separate Fleet-server/agent issue).Retry-Afterfor the limiter.Residual attack surface (re: "no new attack vectors")
None of these is a vector the MCP invents beyond what Fleet already exposes:
curl/carves/filedevice SSRF + exfil--disable_tables(separate issue) — not an MCP-layer fix429s operators (shared bucket — fails safe)ipmode (direct nets), WAF (proxied), or stdioFLEET_BASE_URLBottom line: with an API-only observer token (now enforced) over stdio,
ipmode, or SSE-behind-an-edge, the MCP adds no new attack vector beyond Fleet's existing live-query capability; the residualcurl/carvesrisk is a Fleet-layer concern tracked separately.Checklist for submitter
If some of the following don't apply, delete the relevant line.
changes/,orbit/changes/oree/fleetd-chrome/changes.See Changes files for more information.
Testing
Added/updated automated tests
QA'd all new/changed functionality manually
Summary by CodeRabbit
MCP_RATE_LIMIT_MODEto select global or per-IP throttling/healthzendpointRetry-After: 1; per-IP uses the direct client address host)MCP_AUTH_TOKENlength checks.envdocs/ignores, added developer Makefile helpers, and refreshed rate-limiter/Fleet base URL tests