Document Masking/Unmasking REST API - Implementation Plan

Overview

A powerful REST API built with Node.js and TypeScript for masking and unmasking documents. The API accepts file uploads, masks specified keywords, and provides secure recovery mechanisms.

Core Requirements

File-based processing: Upload/download files instead of string content
Multiple format support: Text, Markdown, JSON, XML, TXT files
Flexible keyword input: Space/comma separated with quoted phrases
Secure recovery: UUID v4 keys without embedded keyword information
Minimal storage: Redis database for keyword mappings only
Case handling: Case-insensitive masking, uppercase restoration

API Endpoints

1. POST /api/v1/mask

Purpose: Upload document, mask keywords, return masked file with recovery key

Request:

Method: POST
Content-Type: multipart/form-data
File Field: document (file to mask)
Form Field: keywords (keyword string)

Example Keywords Input:

Hello world "Boston Red Sox", 'Pepperoni Pizza', 'Cheese Pizza', beer

Parsed Keywords:

Hello
world
beer
Boston Red Sox
Pepperoni Pizza
Cheese Pizza

Response:

Content-Type: application/octet-stream (download file)
Headers:
- X-Recovery-Key: [UUID v4]
- Content-Disposition: attachment; filename="masked_[original_filename]"
Body: Masked document file (keywords → XXXXX)

2. POST /api/v1/unmask

Purpose: Upload masked document, restore original content using recovery key

Request:

Method: POST
Content-Type: multipart/form-data
File Field: maskedDocument (masked file)
Form Field: recoveryKey (UUID v4 from masking)

Response:

Content-Type: application/octet-stream (download file)
Headers:
- Content-Disposition: attachment; filename="original_[original_filename]"
Body: Restored original document (XXXXX → original keywords in UPPERCASE)

Database Schema (Redis)

Key Structure: keyword_map:{recovery_key}

Value Format: JSON array of mappings

[
  [1, "Hello"],
  [1, "world"],
  [3, "Boston Red Sox"],
  [5, "Pepperoni Pizza"],
  [7, "Cheese Pizza"],
  [10, "beer"]
]

Array Format: [lineNumber, originalText]

Index 0: Line number (integer)
Index 1: Original keyword text (string)

TTL: Optional expiration (1 year automatic delete by redis instance)

Processing Logic

Keyword Parsing

Split by spaces and commas
Extract quoted phrases (single and double quotes)
Case-insensitive matching
Preserve quoted phrases as single keywords
Nested simple or double quotes is not allowed

Masking Process

Read uploaded file line by line
For each line, find keyword matches (case-insensitive)
Replace matches with "XXXXX"
Store mapping: [lineNumber, originalText] in Redis
Write masked line to output file so the outputfile is written line by line at the same time we read the input file line by line
Return file with recovery key in headers

Unmasking Process

Validate recovery key exists in Redis
Read masked file line by line
Count XXXXX occurrences per line
Replace each XXXXX with original text from Redis (UPPERCASE)
Write restored line to output file
Return restored file

Security Considerations

Recovery Key Generation

UUID v4 (128-bit entropy)
No keyword information embedded
Secure random generation
Format: xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx

File Handling

Temporary file storage (/tmp folder is writable in aws lambda environment) during processing
Read file line by line instead of whole read it in the memory
Automatic cleanup of temp files
Input validation and sanitization

Database Security

Redis with authentication
Key with 1 year expiration
No storage of original documents
Minimal data footprint

Technical Stack

Backend

Runtime: Node.js version 24
Language: TypeScript version 5
Framework: Express.js version 5
Cloud Provider: AWS Lambda Zip
File Upload: Multer version 2
Database: Redis with Aiven or redis.io service providers
UUID Generation: uuidv4()

Dependencies

{
  "express": "^5.x",
  "multer": "^2.x",
  "redis": "^5.x",
  "uuid": "^13.x",
  "typescript": "^5.x"
}

File Format Support

Processing Approach

Line-by-line processing for all formats
Keyword matching preserves format structure
Masking maintains original file formatting
Restoration maintains original file structure

Error Handling

Validation Errors

400 Bad Request: Invalid file format
400 Bad Request: Missing keywords
400 Bad Request: Invalid recovery key format

Processing Errors

422 Unprocessable Entity: Unsupported file format
500 Internal Server Error: Processing failures
404 Not Found: Invalid recovery key

Rate Limiting

Optional: Basic rate limiting for abuse prevention
Configurable limits per IP address

Implementation Phases

Phase 1: Core API

Set up project structure
Implement file upload/download endpoints
Basic keyword parsing logic
File processing and masking
Recovery key generation

Phase 2: Database Integration

Redis connection and configuration
Keyword mapping storage
Recovery key validation
Data cleanup and TTL

Phase 3: Advanced Features

Multiple file format support
Enhanced error handling
Logging and monitoring
Performance optimization

Phase 4: Production Ready

Security hardening
Documentation
Testing suite
Deployment configuration

Testing Strategy

Unit Tests

Keyword parsing logic
File processing functions
Recovery key generation
Database operations

Integration Tests

End-to-end masking workflow
End-to-end unmasking workflow
Error scenarios
File format compatibility (binary files or any other than plain text formats are not allowed)

Performance Tests

Large file processing
Concurrent requests
Memory usage optimization

Deployment Considerations

Environment Variables

REDIS_URL="redis://default:$REDIS_PARIS_PASSWORD@redis-11686.crce282.eu-west-3-1.ec2.cloud.redislabs.com:11686"
PORT=3000
TEMP_DIR=/tmp
FILE_SIZE_LIMIT=5MB #same than aws lambda limits where this api will be deployed

Docker Configuration

Multi-stage build for production
Redis service dependency
Health check endpoints
Environment-based configuration

Monitoring

Request/response logging
Error tracking
Performance metrics
Redis connection monitoring

Usage Examples

Masking a Document in AWS Paris region

curl -X POST \
  https://some-id.lambda-url.eu-west-3.on.aws/mask \
  -F "document=@document.txt" \
  -F "keywords=Hello world \"Boston Red Sox\", 'Pepperoni Pizza', beer"

Unmasking a Document

curl -X POST \
  https://some-id.lambda-url.eu-west-3.on.aws/unmask \
  -F "maskedDocument=@masked_document.txt" \
  -F "recoveryKey=550e8400-e29b-41d4-a716-446655440000"

Success Criteria

✅ File-based upload/download functionality
✅ Support for plain text based document formats (xml, txt, md, json, csv, etc)
✅ Flexible keyword input parsing
✅ Secure UUID v4 recovery keys
✅ Case-insensitive masking with uppercase restoration
✅ Minimal Redis database storage
✅ Proper error handling and validation
✅ Performance with large files
✅ Security best practices
✅ Comprehensive testing coverage

FilesExpand file tree

API_Plan.md

Latest commit

History