384 lines (314 loc) · 9.61 KB

MarkItDown MCP Server - Testing Strategy

Overview

This document outlines the comprehensive testing strategy for the MarkItDown MCP server to ensure reliability, security, and compatibility before release.

Testing Levels

1. Unit Tests

A. MCP Protocol Layer

MCPRequest/MCPResponse serialization/deserialization
- Valid JSON-RPC 2.0 format
- Invalid JSON handling
- Missing required fields
- Type validation
Request routing
- initialize method handling
- tools/list method handling
- tools/call method handling
- Unknown method handling
- Invalid method names
Error handling
- Internal server errors
- Request validation errors
- Tool execution errors
- Timeout handling

B. Tool Implementation

convert_file tool
- File path validation
- Base64 content decoding
- File existence checks
- Permission validation
- Return format validation
list_supported_formats tool
- Format list accuracy
- Categorization correctness
- Response structure
convert_directory tool
- Directory traversal logic
- File filtering
- Progress tracking
- Error aggregation

C. MarkItDown Integration

Document conversion
- Success path testing
- Error handling
- Result formatting
- Memory management

2. Integration Tests

A. MCP Protocol Integration

Server lifecycle
- Initialization sequence
- Clean shutdown
- Graceful error recovery
- Connection state management
Tool execution flow
- Request parsing → Tool execution → Response formatting
- Concurrent request handling
- Request timeout behavior
- Resource cleanup

B. File System Integration

File operations
- Read permissions
- Path traversal security
- Symbolic link handling
- Network drive compatibility
- Large file handling
Directory operations
- Recursive traversal
- Mixed file types
- Empty directories
- Nested structures

3. File Format Testing

A. Supported Formats (29+ formats)

For each supported format, test:

Valid files: Typical use cases
Edge cases: Empty files, minimal content
Large files: Memory and performance impact
Corrupted files: Graceful error handling
Special cases: Password-protected, encrypted

B. Format-Specific Tests

PDF Files

Simple text PDFs
Complex layouts with tables/images
Scanned PDFs (image-based)
Password-protected PDFs
Corrupted PDF files
Multi-page documents
Large PDFs (100+ pages)

Office Documents

Excel (.xlsx, .xls)
- Multiple worksheets
- Formulas and calculations
- Charts and graphs
- Large spreadsheets
- Password-protected files
Word (.docx)
- Simple text documents
- Complex formatting
- Images and tables
- Track changes/comments
PowerPoint (.pptx)
- Text-heavy slides
- Image-heavy presentations
- Animations and transitions

Images

EXIF metadata extraction
- Photos with full EXIF data
- Images without metadata
- Corrupted EXIF data
Format variety
- JPG, PNG, GIF, BMP, TIFF, WebP
- Different resolutions
- Color vs. grayscale

Audio Files

Speech recognition
- Clear speech recordings
- Multiple speakers
- Background noise
- Different audio qualities
Format support
- MP3, WAV, FLAC, M4A, OGG, WMA
- Different bitrates
- Mono vs. stereo

Other Formats

Web formats: HTML, XML, JSON, CSV
Text formats: TXT, MD, RST
Archives: ZIP files with mixed content
E-books: EPUB files

4. Performance Testing

A. Scalability Tests

Concurrent requests
- Multiple simultaneous conversions
- Resource contention
- Memory usage patterns
- CPU utilization
Large file handling
- Files > 100MB
- Memory efficiency
- Streaming vs. loading
- Timeout behavior

B. Stress Testing

Resource limits
- Maximum concurrent requests
- Memory exhaustion scenarios
- CPU-bound vs. I/O-bound operations
- Recovery from resource exhaustion
Load testing
- Sustained high request rates
- Gradual load increase
- Peak load handling
- Performance degradation patterns

5. Security Testing

A. Input Validation

Path traversal attacks
- ../../../etc/passwd attempts
- Absolute path handling
- Symbolic link exploitation
- Network path attempts
Malicious content
- Files with embedded scripts
- Zip bombs
- Files with excessive metadata
- Binary files disguised as text

B. Resource Protection

Denial of Service (DoS)
- Large file uploads
- Infinite loop scenarios
- Memory exhaustion attempts
- CPU exhaustion attacks
Information disclosure
- Error message content
- File path leakage
- System information exposure

6. Compatibility Testing

A. Environment Matrix

Operating Systems
- macOS (Intel/Apple Silicon)
- Windows 10/11
- Ubuntu/Debian Linux
- CentOS/RHEL
Python Versions
- Python 3.10, 3.11, 3.12, 3.13
- Virtual environments
- System Python vs. user installations

B. Dependency Testing

Optional dependencies
- Missing dependencies behavior
- Partial dependency installation
- Version compatibility ranges
- Dependency conflict resolution
Claude Desktop Integration
- Different Claude Desktop versions
- Configuration variations
- Network conditions
- Error recovery scenarios

7. Error Handling Testing

A. Expected Errors

File not found
- Non-existent paths
- Deleted files during processing
- Network disconnections
Permission errors
- Read-only files
- Protected directories
- Insufficient privileges
Format errors
- Unsupported file types
- Corrupted files
- Incomplete files

B. Unexpected Errors

System failures
- Out of memory
- Disk full
- Network timeouts
- Process kills
Dependency failures
- Missing libraries
- Version conflicts
- Runtime errors

8. User Experience Testing

A. Claude Desktop Integration

Tool discovery
- Tools appear in interface
- Descriptions are clear
- Parameter hints work
Conversion workflows
- Single file conversion
- Batch directory conversion
- Error reporting clarity
- Progress indication

B. Error Messages

User-friendly errors
- Clear problem descriptions
- Actionable solutions
- No technical jargon
- Helpful suggestions

Test Data Requirements

A. Sample Files

Create a comprehensive test dataset including:

Small files (< 1KB) of each format
Medium files (1KB - 10MB) representing typical use
Large files (> 10MB) for performance testing
Edge cases: Empty files, single character, maximum size
Corrupted files: Intentionally broken formats
Special characters: Unicode filenames, spaces, symbols

B. Test Scenarios

Happy path: Ideal conditions, all dependencies available
Error paths: Missing dependencies, invalid inputs
Edge cases: Boundary conditions, unusual inputs
Real-world: Typical user files and workflows

Test Infrastructure

A. Automated Testing

Unit tests: pytest framework
Integration tests: Full MCP protocol simulation
Performance tests: Load generation and metrics
CI/CD: GitHub Actions for multiple environments

B. Manual Testing

Claude Desktop integration: Real environment testing
User workflow validation: End-to-end scenarios
Exploratory testing: Edge cases and creative usage

Success Criteria

A. Functionality

✅ All 29+ file formats convert successfully
✅ All MCP tools work as documented
✅ Error handling is graceful and informative
✅ Performance meets acceptable thresholds

B. Reliability

✅ No crashes under normal usage
✅ Graceful degradation under stress
✅ Memory leaks eliminated
✅ Resource cleanup on errors

C. Security

✅ No path traversal vulnerabilities
✅ No information disclosure
✅ DoS protection mechanisms
✅ Safe handling of malicious files

D. Compatibility

✅ Works on all target platforms
✅ Compatible with all supported Python versions
✅ Handles missing dependencies gracefully
✅ Integrates properly with Claude Desktop

Test Execution Plan

Phase 1: Foundation (Week 1)

Set up test framework and infrastructure
Implement unit tests for core functionality
Create basic test data set

Phase 2: Core Testing (Week 2)

Complete unit test coverage
Implement integration tests
File format testing for major formats

Phase 3: Comprehensive Testing (Week 3)

Complete file format coverage
Performance and stress testing
Security testing

Phase 4: Validation (Week 4)

End-to-end testing with Claude Desktop
Multi-platform compatibility testing
User experience validation
Bug fixes and retesting

Risk Assessment

High Risk Areas

Large file handling - Memory issues, timeouts
Concurrent requests - Resource contention, race conditions
Dependency management - Missing/incompatible packages
Security vulnerabilities - Path traversal, DoS attacks

Mitigation Strategies

Comprehensive performance testing with realistic data
Security review of all input handling
Dependency testing across multiple environments
Staged rollout with monitoring and rollback capability

Deliverables

Test suite - Comprehensive automated tests
Test data - Representative file collection
Performance benchmarks - Baseline metrics
Security assessment - Vulnerability analysis
Compatibility matrix - Platform/version support
User testing report - Real-world validation results