A blazingly fast CLI tool for validating XML files against XML Schemas, built in Rust with a focus on concurrent processing, intelligent caching, and low memory overhead.
Validate 20,000 files in seconds with automatic schema caching, concurrent validation, and comprehensive error reporting.
✨ Core Capabilities
- Concurrent Validation: Uses all available CPU cores for parallel XML/XSD validation
- Schema Caching: Two-tier caching (L1 memory, L2 disk) prevents redundant downloads
- Batch Processing: Validate entire directory trees (100,000+ files) without memory exhaustion
- Output: Text (human-readable) or Compact Summary
- Smart Error Reporting: Line/column numbers, clear error messages, detailed diagnostics
⚡ Performance
- Pure Rust: Uses xmloxide for XML/XSD validation — no system dependencies or unsafe code
- Async I/O: Tokio-based async operations for files and HTTP downloads
- In-Memory Caching: First-run download + cross-run disk cache for schema reuse
- Bounded Memory: Concurrent validation with configurable limits
🏗️ Architecture
- Hybrid Async/Sync: Async I/O (files, HTTP, caching) + sync CPU-bound validation (xmloxide)
- True Parallel Validation: No global locks - 10x throughput on multi-core CPUs
- Parse Once, Validate Many: Schemas are parsed once and shared safely across threads
- Modular Design: Clean separation of concerns (discovery, loading, validation, reporting)
- Non-Blocking: CPU-intensive tasks are offloaded to
spawn_blockingto keep the async runtime responsive.
- Rust: 1.81+ (stable toolchain) with Cargo
git clone https://github.com/franklinchen/validate-xml-rust.git
cd validate-xml-rust
cargo install --path .This installs the validate-xml binary to ~/.cargo/bin. Add ~/.cargo/bin to your $PATH if not already present.
Validate all XML files in a directory:
# Validate all .xml files (recursive)
validate-xml /path/to/xml/files
# Validate files with custom extensions
validate-xml --extensions xml,xsd /path/to/files
# Validate with verbose output and progress bar
validate-xml --verbose /path/to/filesValidate XML files against a specific XSD schema, even if the XML files don't contain xsi:schemaLocation attributes:
# Validate using an explicit schema file
validate-xml --schema /path/to/schema.xsd /path/to/xml/filesStandard output includes validation status per file (in verbose mode) and a final summary.
Validation Summary:
Total files: 20000
Valid: 19950
Invalid: 50
Errors: 0
Skipped: 0
Success rate: 99.8%
Duration: 4.20s
Validation errors are reported with precise location information for easy IDE integration:
path/to/file.xml:42:15: Missing required element 'id'
path/to/file.xml:87:3: Element 'invalid' not allowed here
path/to/file.xml:120:1: Schema error: Could not locate schema resource
A robust way to test remote schema validation is using an Apache Maven POM file. A sample is provided in samples/pom.xml:
# Validate the provided sample which uses a remote Apache Maven schema
validate-xml samples/pom.xmlvalidate-xml [OPTIONS] <DIRECTORY>| Option | Default | Description |
|---|---|---|
--extensions <EXT> |
xml |
XML file extension to match (comma-separated) |
--threads <N> |
CPU cores | Max concurrent validation threads |
--cache-dir <PATH> |
Platform specific | Schema cache directory |
--cache-ttl <HOURS> |
24 |
Schema cache TTL in hours |
--verbose |
- | Show detailed output |
--quiet |
- | Suppress non-error output |
--progress |
Auto | Show progress bar |
--schema <PATH> |
- | Validate against a specific XSD (overrides schema references in XML) |
--fail-fast |
- | Stop validation on first error |
--help |
- | Show help message |
--version |
- | Show version information |
| Code | Meaning |
|---|---|
0 |
All files valid |
1 |
Configuration or CLI error |
2 |
Errors occurred during validation (system/network) |
3 |
Invalid files found (schema violations) |
The validator consists of four main components:
1. File Discovery
- Recursively traverses directory tree and filters by extension.
2. Schema Loading
- Extracts schema URLs (xsi:schemaLocation, xsi:noNamespaceSchemaLocation).
- Downloads remote schemas (HTTP/HTTPS) and caches raw bytes to memory and disk.
- Parse Once: Parsed schema structures are cached in memory and shared safely across threads.
3. Concurrent Validation
- Spawns async tasks bounded by
--threads. - Heavy CPU tasks (parsing, validation) are offloaded to
spawn_blocking. - Thread Safety: xmloxide is fully thread-safe — no global locks needed. Full parallel execution for XML validation.
4. Error Reporting
- Aggregates and formats errors with line/column information.
- L1 Parsed Cache: In-memory
mokacache storing compiledXsdSchema. Ensures we parse any XSD exactly once. - L2 Raw Cache: Disk-backed
cacachefor persistent cross-run storage of schema bytes.
Micro-benchmarks measuring the core validation engine (on Apple M1 Max):
| Operation | Median Time | Throughput |
|---|---|---|
| Schema Parsing | ~6.0 µs | 166,000/sec |
| Valid XML Validation | ~17.2 µs | 58,000/sec |
| Invalid XML Validation | ~17.6 µs | 56,000/sec |
Note: Validation includes reading the XML file and checking against the cached schema.
# Clone repository
git clone https://github.com/franklinchen/validate-xml-rust.git
cd validate-xml-rust
# Build (release, optimized)
cargo build --release
# Run tests
cargo test
# Run benchmarks
cargo benchBefore submitting changes, ensure you run:
cargo fmtcargo clippycargo test
MIT License - See LICENSE file for details
Google Gemini was used as an aid in improving this project, particularly in streamlining the architecture and test suite.