Skip to content

Commit dfe3cbb

Browse files
Dumbrisclaude
andauthored
feat(security): add sensitive data detection for tool calls (Spec 026) (#289)
* feat(security): add sensitive data detection for tool calls (Spec 026) Implement automatic scanning of tool call arguments and responses for secrets, credentials, and sensitive data patterns including: - Cloud credentials (AWS, GCP, Azure) - Private keys (RSA, EC, DSA, OpenSSH, PGP) - API tokens (GitHub, GitLab, Stripe, Slack, OpenAI) - Database connection strings (MySQL, PostgreSQL, MongoDB) - Credit card numbers (with Luhn validation) - Sensitive file paths (.ssh/, .aws/, .env files) - High-entropy strings (potential secrets) Key features: - Async detection integrated with ActivityService - REST API filtering (sensitive_data, detection_type, severity params) - CLI flags: --sensitive-data, --detection-type, --severity - Web UI: detection badges, severity indicators, detail drawer - Configurable categories and custom patterns support - Event bus integration for real-time notifications Also fixes CLI socket path detection bug where os.Stat was called with unix:// prefix, causing fallback to HTTP with wrong port. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(security): add LLM provider API keys and doctor status - Add sensitive data detection status to `mcpproxy doctor` output - Include SensitiveDataDetection in DefaultConfig() for new installs - Add detection patterns for 14 LLM/AI providers: - Google AI/Gemini (AIzaSy prefix) - xAI/Grok (xai- prefix) - Groq (gsk_ prefix) - Hugging Face (hf_, api_org_ prefixes) - Replicate (r8_ prefix) - Perplexity (pplx- prefix) - Fireworks AI (fw_ prefix) - Anyscale (esecret_ prefix) - Mistral AI (keyword context) - Cohere (keyword context) - DeepSeek (sk- with keyword) - Together AI (keyword context) - Improve OpenAI pattern (sk-proj-, sk-svcacct-, sk-admin-) - Improve Anthropic pattern (sk-ant-api03-, sk-ant-admin01-) - Add comprehensive tests with dynamic key construction - Update documentation with new provider patterns Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(security): add comprehensive tests for LLM API key detection Add extensive test coverage for all 15 LLM provider API key patterns: - TestLLMKeysInJSONContext: Keys in JSON configuration files - TestLLMKeysInYAMLContext: Keys in YAML configuration files - TestLLMKeysInCodeSnippets: Keys in Python/JS/Shell code examples - TestLLMKeysFalsePositivePrevention: Ensures patterns don't over-match - TestLLMKeysWithMixedAlphanumeric: Realistic mixed-case key patterns - TestLLMKeysInLogOutput: Keys exposed in error messages and logs - TestOpenAIAnthropicImprovedPatterns: All OpenAI/Anthropic variants - TestAllLLMPatternsExist: Validates all expected patterns are registered Tests cover: - OpenAI (sk-, sk-proj-, sk-svcacct-, sk-admin-) - Anthropic (sk-ant-api03-, sk-ant-admin01-) - Google AI/Gemini (AIzaSy) - xAI/Grok (xai-) - Groq (gsk_) - HuggingFace (hf_, api_org_) - Replicate (r8_) - Perplexity (pplx-) - Fireworks AI (fw_) - Anyscale (esecret_) - Mistral, Cohere, DeepSeek, Together AI (keyword context) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(security): reduce false positives and deduplicate detections - Add deduplication to AddDetection() to prevent duplicate type+location - AWS secret key pattern now requires keyword context (aws_secret_access_key=, AWS_SECRET_KEY:, secretAccessKey:) to avoid matching random base64 in RSA keys - Azure client secret pattern now requires keyword context (AZURE_CLIENT_SECRET=, client_secret:, clientSecret:) to avoid false positives - Update tests to reflect context-required behavior - Add TestResult_AddDetection_Deduplication test Before: id_rsa showed 9 detections (including aws_secret_key false positives) After: id_rsa shows 3 detections (rsa_private_key, private_key, high_entropy) Before: .env showed 29 detections (many duplicates) After: .env shows 9 unique detections Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add sensitive data detection to activity log documentation - Add sensitive data detection section to activity-log.md - Document detection metadata structure and filtering options - Add cross-reference to sensitive-data-detection.md - Update sidebars.js with sensitive data detection page - Update intro.md and AGENTS.md references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(spec): add specification for sensitive data detection (Spec 026) - spec.md: Feature specification and requirements - plan.md: Implementation plan - tasks.md: Task breakdown - data-model.md: Data model design - research.md: Research notes - quickstart.md: Quick start guide - contracts/: API contracts - checklists/: Implementation checklists - MANUAL_TESTING_PLAN.md: Manual testing guide Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 55b0861 commit dfe3cbb

67 files changed

Lines changed: 17567 additions & 45 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

AGENTS.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,10 @@
3232
## Security & Configuration Tips
3333
- Never hardcode secrets; load them via the tray secure store or environment lookups in `internal/secret`.
3434
- When editing configs, prefer `runtime.SaveConfiguration()` flows so disk state and in-memory state stay aligned; regenerated files land in `~/.mcpproxy/`.
35+
36+
## Active Technologies
37+
- Go 1.24 (toolchain go1.24.10) + BBolt (storage), Chi router (HTTP), Zap (logging), regexp (stdlib), existing ActivityService (026-pii-detection)
38+
- BBolt database (`~/.mcpproxy/config.db`) - ActivityRecord.Metadata extension (026-pii-detection)
39+
40+
## Recent Changes
41+
- 026-pii-detection: Added Go 1.24 (toolchain go1.24.10) + BBolt (storage), Chi router (HTTP), Zap (logging), regexp (stdlib), existing ActivityService

CLAUDE.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -304,6 +304,81 @@ See `docs/code_execution/` for complete guides:
304304

305305
See [docs/features/security-quarantine.md](docs/features/security-quarantine.md) for details.
306306

307+
## Sensitive Data Detection
308+
309+
Automatic scanning of tool call arguments and responses for secrets, credentials, and sensitive data. Enabled by default and integrates with the activity log for security auditing.
310+
311+
### Detection Categories
312+
313+
| Category | Examples | Severity |
314+
|----------|----------|----------|
315+
| `cloud_credentials` | AWS keys, GCP API keys, Azure storage keys | critical |
316+
| `private_key` | RSA, EC, DSA, OpenSSH, PGP private keys | critical |
317+
| `api_token` | GitHub, GitLab, Stripe, Slack, OpenAI, Anthropic, Google AI, xAI, Groq, HuggingFace, Replicate, Perplexity, Fireworks, Anyscale, Mistral, Cohere, DeepSeek, Together AI tokens | critical |
318+
| `database_credential` | MySQL, PostgreSQL, MongoDB connection strings | critical/high |
319+
| `credit_card` | Visa, Mastercard, Amex (Luhn validated) | high |
320+
| `sensitive_file` | Paths to `.ssh/`, `.aws/`, `.env` files | high/medium |
321+
| `high_entropy` | Base64/hex strings with high Shannon entropy | medium |
322+
323+
### Key Files
324+
325+
| File | Purpose |
326+
|------|---------|
327+
| `internal/security/detector.go` | Main detector with `Scan()` method |
328+
| `internal/security/types.go` | Detection, Result, Severity, Category types |
329+
| `internal/security/patterns/` | Pattern definitions by category |
330+
| `internal/security/patterns/cloud.go` | AWS, GCP, Azure credential patterns |
331+
| `internal/security/patterns/keys.go` | Private key detection patterns |
332+
| `internal/security/patterns/tokens.go` | API token patterns |
333+
| `internal/security/patterns/database.go` | Database connection string patterns |
334+
| `internal/security/patterns/creditcard.go` | Credit card patterns with Luhn validation |
335+
| `internal/security/entropy.go` | High-entropy string detection |
336+
| `internal/security/paths.go` | Sensitive file path patterns |
337+
| `internal/runtime/activity_service.go` | Integration point via `SetDetector()` |
338+
339+
### CLI Commands
340+
341+
```bash
342+
mcpproxy activity list --sensitive-data # Show only activities with detections
343+
mcpproxy activity list --severity critical # Filter by severity level
344+
mcpproxy activity list --detection-type aws_access_key # Filter by detection type
345+
mcpproxy activity show <id> # View detection details
346+
mcpproxy activity export --sensitive-data --output audit.jsonl # Export for compliance
347+
```
348+
349+
### Configuration
350+
351+
```json
352+
{
353+
"sensitive_data_detection": {
354+
"enabled": true,
355+
"scan_requests": true,
356+
"scan_responses": true,
357+
"max_payload_size_kb": 1024,
358+
"entropy_threshold": 4.5,
359+
"categories": {
360+
"cloud_credentials": true,
361+
"private_key": true,
362+
"api_token": true,
363+
"database_credential": true,
364+
"credit_card": true,
365+
"high_entropy": true
366+
},
367+
"custom_patterns": [
368+
{
369+
"name": "internal_api_key",
370+
"regex": "INTERNAL-[A-Z0-9]{32}",
371+
"severity": "high",
372+
"category": "custom"
373+
}
374+
],
375+
"sensitive_keywords": ["password", "secret"]
376+
}
377+
}
378+
```
379+
380+
See [docs/features/sensitive-data-detection.md](docs/features/sensitive-data-detection.md) for complete reference.
381+
307382
### Exit Codes
308383

309384
| Code | Meaning |
@@ -394,6 +469,8 @@ See `docs/prerelease-builds.md` for download instructions.
394469
- BBolt database (`~/.mcpproxy/config.db`) - `oauth_tokens` bucket with `OAuthTokenRecord` model (023-oauth-state-persistence)
395470
- Go 1.24 (toolchain go1.24.10) + TypeScript 5.x / Vue 3.5 + Cobra CLI, Chi router, BBolt storage, Zap logging, mark3labs/mcp-go, Vue 3, Tailwind CSS, DaisyUI (024-expand-activity-log)
396471
- BBolt database (`~/.mcpproxy/config.db`) - ActivityRecord model (024-expand-activity-log)
472+
- Go 1.24 (toolchain go1.24.10) + BBolt (storage), Chi router (HTTP), Zap (logging), regexp (stdlib), existing ActivityService (026-pii-detection)
473+
- BBolt database (`~/.mcpproxy/config.db`) - ActivityRecord.Metadata extension (026-pii-detection)
397474

398475
## Recent Changes
399476
- 001-update-version-display: Added Go 1.24 (toolchain go1.24.10)

0 commit comments

Comments
 (0)