Automatic detection and redaction of personally identifiable information (PII) before sending data to LLMs.
InferShield protects sensitive data by detecting and redacting PII before it reaches AI models. This is critical for:
- HIPAA compliance - Healthcare organizations
- GDPR compliance - European users
- PCI-DSS compliance - Payment card data
- SOC 2 compliance - Enterprise security audits
- SSN - US Social Security Numbers
- Credit Cards - Validated with Luhn algorithm
- Passport Numbers - Government-issued passports
- Medical Records - MRN, patient IDs
- Bank Accounts - Account numbers
- API Keys - Generic + AWS-specific
- AWS Keys - Access keys (AKIA...)
- Email Addresses - Full email detection
- Driver's Licenses - State-issued IDs
- Phone Numbers - US format
- IP Addresses - IPv4 (validated)
- Dates of Birth - MM/DD/YYYY format
Replace entire value with label: ``` Input: "My SSN is 123-45-6789" Output: "My SSN is [SSN_REDACTED]" ```
Show last 4 digits: ``` Input: "Card: 4532-1488-0343-6467" Output: "Card: XXXX-XXXX-XXXX-6467" ```
One-way hash: ``` Input: "SSN: 123-45-6789" Output: "SSN: [a3f5c2d1]" ```
Reversible tokenization: ``` Input: "SSN: 123-45-6789" Output: "SSN: [TOKEN_e3b0c442]" (can be detokenized with key) ```
Complete removal: ``` Input: "My SSN is 123-45-6789 and email is test@example.com" Output: "My SSN is and email is " ```
```javascript const { piiRedactionMiddleware, RedactionStrategy } = require('./services/pii-redactor');
app.use(piiRedactionMiddleware({ enabled: true, strategy: RedactionStrategy.PARTIAL, patterns: ['ssn', 'credit_card', 'email', 'phone'] })); ```
```javascript const { detectPII, redactPII } = require('./services/pii-redactor');
// Detect PII const text = 'My SSN is 123-45-6789'; const detected = detectPII(text);
console.log(detected); // [{ // type: 'ssn', // name: 'SSN', // value: '123-45-6789', // position: 10, // length: 11, // severity: 'critical', // category: 'government_id' // }]
// Redact PII const result = redactPII(text, { strategy: 'partial' });
console.log(result); // { // redacted: 'My SSN is XXX-XX-6789', // original: 'My SSN is 123-45-6789', // detections: [...], // changed: true // } ```
```javascript // Only check for critical PII const result = redactPII(text, { patterns: ['ssn', 'credit_card', 'passport'], strategy: 'mask' });
// Check everything except emails const allPatterns = Object.keys(PII_PATTERNS); const patternsExceptEmail = allPatterns.filter(p => p !== 'email');
const result = redactPII(text, { patterns: patternsExceptEmail }); ```
```bash
PII_REDACTION_ENABLED=true
PII_REDACTION_STRATEGY=partial
PII_TOKEN_KEY=your-256-bit-encryption-key
PII_PATTERNS=ssn,credit_card,email,phone ```
Pass options in request headers:
```bash curl -X POST https://api.infershield.io/v1/chat/completions \ -H "X-PII-Strategy: partial" \ -H "X-PII-Patterns: ssn,credit_card" \ -d '{"prompt": "My SSN is 123-45-6789"}' ```
Redacted requests include these headers:
``` X-PII-Redacted: true X-PII-Detections: 2 X-PII-Types: ssn,email ```
All PII detections are logged:
```javascript { "timestamp": "2024-02-21T23:00:00Z", "action": "pii.detected", "userId": "user-123", "requestId": "req-abc", "detections": [ { "type": "ssn", "severity": "critical", "redacted": true, "strategy": "partial" } ], "totalDetected": 1 } ```
- Detection speed: <5ms per 1KB of text
- Memory: ~2MB per process
- Latency: <1ms added to request
- Throughput: 10,000 req/s (single process)
Optimizations:
- Compiled regex patterns (cached)
- Streaming detection for large payloads
- Zero-copy redaction where possible
```javascript app.use('/api/medical', piiRedactionMiddleware({ patterns: ['ssn', 'medical_record', 'date_of_birth', 'phone', 'email'], strategy: RedactionStrategy.MASK })); ```
```javascript app.use('/api/payments', piiRedactionMiddleware({ patterns: ['credit_card', 'bank_account', 'ssn'], strategy: RedactionStrategy.HASH })); ```
```javascript // Keep last 4 digits for user reference app.use(piiRedactionMiddleware({ strategy: RedactionStrategy.PARTIAL, patterns: ['credit_card', 'ssn', 'phone'] })); ```
Run PII detection tests:
```bash npm test services/pii-redactor.test.js ```
Coverage:
- 14 PII pattern types
- 5 redaction strategies
- Edge cases (empty, long text, special chars)
- Validation (Luhn algorithm for credit cards)
Protected Health Information (PHI):
- ✅ Names (partial via email)
- ✅ Dates of birth
- ✅ Phone numbers
- ✅ Email addresses
- ✅ Medical record numbers
- ✅ SSN
Personal Data:
- ✅ Email addresses
- ✅ Phone numbers
- ✅ IP addresses
- ✅ Government IDs
Cardholder Data:
- ✅ Credit card numbers (Luhn validated)
- ✅ CVV detection (pattern: `\b\d{3,4}\b`)
Sensitive Data Protection:
- ✅ PII detection
- ✅ Audit logging
- ✅ Encryption (TOKEN strategy)
- ✅ Access controls
- Names (too many false positives)
- Addresses (complex patterns)
- Non-US formats (international phone, etc.)
- Biometric data
- Photos/images (only text)
- Phone numbers: May match invoice numbers
- Bank accounts: May match order IDs
- API keys: May match hashes
Mitigation:
- Use `validateMatches: true` option
- Customize patterns for your domain
- Whitelist known false positives
- International phone formats
- Address detection (US + EU)
- Name detection (with ML)
- Custom regex patterns (user-defined)
- OCR for image-based PII
- Audio transcription redaction
- Real-time streaming redaction
- Multi-language support
- Use PARTIAL for user-facing - Users can verify last 4 digits
- Use HASH for audit logs - One-way, searchable
- Use TOKEN for reversibility - Enterprise customers only
- Always log detections - Compliance requirement
- Test with real data - Validate patterns work for your use case
- Customize patterns - Add industry-specific patterns
- Monitor false positives - Adjust patterns as needed
- Docs: https://docs.infershield.io/pii-redaction
- Issues: https://github.com/InferShield/infershield/issues
- Discussions: https://github.com/InferShield/infershield/discussions
MIT - See LICENSE file for details