Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -79,3 +79,7 @@ site/

# Cursor
.cursor/

# Claude Code
CLAUDE.md
.claude/
37 changes: 37 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,43 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Removed
- None

## [0.4.3] - 2025-09-06

### Added
- feat(schema): Enhanced SCHEMA rule with metadata validation capabilities
- feat(schema): String length validation via `max_length` parameter for precise VARCHAR constraints
- feat(schema): Float precision and scale validation via `precision`/`scale` parameters for DECIMAL constraints
- feat(cli): Extended JSON schema format support with metadata fields (max_length, precision, scale)
- feat(core): Database-agnostic metadata extraction across MySQL, PostgreSQL, and SQLite
- feat(core): Vendor-specific type parsing with regex-based metadata extraction
- feat(core): Performance-optimized validation using database catalog queries (no data scanning)
- feat(validation): Comprehensive metadata comparison logic with detailed failure reporting
- feat(cli): Enhanced rule parameter validation for metadata fields with logical constraints
- feat(tests): Comprehensive metadata validation test suite (87% coverage on SchemaExecutor)
- feat(tests): Unit, integration, and CLI tests for metadata validation scenarios
- feat(docs): Enhanced documentation with metadata validation examples and troubleshooting guide
- feat(docs): Migration guide for legacy schema formats and performance characteristics

### Changed
- refactor(schema): Enhanced SchemaExecutor with metadata validation capabilities
- refactor(cli): Extended CLI schema parsing to support metadata fields with validation
- refactor(core): Improved database metadata extraction and type mapping
- improve(performance): Metadata validation uses single database query per table (no data scans)
- improve(validation): Enhanced error messages with specific metadata mismatch descriptions
- improve(architecture): Clear separation between structure validation (SCHEMA) and content validation (RANGE/ENUM)

### Fixed
- None

### Removed
- None

### Migration Guide
- **Backward Compatibility**: Existing schema files without metadata continue to work unchanged
- **Enhanced Validation**: Add `max_length`, `precision`, and `scale` fields incrementally to existing schemas
- **Performance**: Metadata validation provides superior performance vs scanning-based approaches
- **Architecture**: Enhanced SCHEMA rule eliminates need for separate LENGTH rule type

## [0.4.2] - 2025-08-27

### Added
Expand Down
160 changes: 160 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,166 @@ Set up validation checkpoints at various stages of your data pipelines to guaran
vlite schema --conn "mysql://user:pass@host:3306/sales" --rules customers_schema.json
```

### Advanced Schema Examples

**Multi-Table Validation:**
```json
{
"customers": {
"rules": [
{ "field": "id", "type": "integer", "required": true },
{ "field": "name", "type": "string", "required": true },
{ "field": "email", "type": "string", "required": true },
{ "field": "age", "type": "integer", "min": 18, "max": 100 }
],
"strict_mode": true
},
"orders": {
"rules": [
{ "field": "id", "type": "integer", "required": true },
{ "field": "customer_id", "type": "integer", "required": true },
{ "field": "total", "type": "float", "min": 0 },
{ "field": "status", "enum": ["pending", "completed", "cancelled"] }
]
}
}
```

**CSV File Validation:**
```bash
# Validate CSV file structure
vlite schema --conn "sales_data.csv" --rules csv_schema.json --output json
```

**Complex Data Types:**
```json
{
"events": {
"rules": [
{ "field": "timestamp", "type": "datetime", "required": true },
{ "field": "event_type", "enum": ["login", "logout", "purchase"] },
{ "field": "user_id", "type": "string", "required": true },
{ "field": "metadata", "type": "string" }
],
"case_insensitive": true
}
}
```

**Available Data Types:**
- `string` - Text data (VARCHAR, TEXT, CHAR)
- `integer` - Whole numbers (INT, BIGINT, SMALLINT)
- `float` - Decimal numbers (FLOAT, DOUBLE, DECIMAL)
- `boolean` - True/false values (BOOLEAN, BOOL, BIT)
- `date` - Date only (DATE)
- `datetime` - Date and time (DATETIME, TIMESTAMP)

### Enhanced Schema Validation with Metadata

ValidateLite now supports **metadata validation** for precise schema enforcement without scanning table data. This provides superior performance by validating column constraints directly from database metadata.

**Metadata Validation Features:**
- **String Length Validation**: Validate `max_length` for string columns
- **Float Precision Validation**: Validate `precision` and `scale` for decimal columns
- **Database-Agnostic**: Works across MySQL, PostgreSQL, and SQLite
- **Performance Optimized**: Uses database catalog queries, not data scans

**Enhanced Schema Examples:**

**String Metadata Validation:**
```json
{
"users": {
"rules": [
{
"field": "username",
"type": "string",
"max_length": 50,
"required": true
},
{
"field": "email",
"type": "string",
"max_length": 255,
"required": true
},
{
"field": "biography",
"type": "string",
"max_length": 1000
}
]
}
}
```

**Float Precision Validation:**
```json
{
"products": {
"rules": [
{
"field": "price",
"type": "float",
"precision": 10,
"scale": 2,
"required": true
},
{
"field": "weight",
"type": "float",
"precision": 8,
"scale": 3
}
]
}
}
```

**Mixed Metadata Schema:**
```json
{
"orders": {
"rules": [
{ "field": "id", "type": "integer", "required": true },
{
"field": "customer_name",
"type": "string",
"max_length": 100,
"required": true
},
{
"field": "total_amount",
"type": "float",
"precision": 12,
"scale": 2,
"required": true
},
{ "field": "order_date", "type": "datetime", "required": true },
{ "field": "notes", "type": "string", "max_length": 500 }
],
"strict_mode": true
}
}
```

**Backward Compatibility**: Existing schema files without metadata continue to work unchanged. Metadata validation is optional and can be added incrementally to enhance validation precision.

**Command Options:**
```bash
# Basic validation
vlite schema --conn <connection> --rules <rules_file>

# JSON output for automation
vlite schema --conn <connection> --rules <rules_file> --output json

# Exit with error code on any failure
vlite schema --conn <connection> --rules <rules_file> --fail-on-error

# Verbose logging
vlite schema --conn <connection> --rules <rules_file> --verbose
```

---

## Quick Start: Ad-Hoc Checks with `check`
Expand Down
2 changes: 1 addition & 1 deletion cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
Provides a unified `vlite check` command for data quality checking.
"""

__version__ = "0.4.2"
__version__ = "0.4.3"

from .app import cli_app

Expand Down
2 changes: 1 addition & 1 deletion cli/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ def _setup_logging() -> None:


@click.group(name="vlite", invoke_without_command=True)
@click.version_option(version="0.4.2", prog_name="vlite")
@click.version_option(version="0.4.3", prog_name="vlite")
@click.pass_context
def cli_app(ctx: click.Context) -> None:
"""
Expand Down
Loading