Project: DataPact
Version: 2.0.0
Status: ✅ PRODUCTION RELEASED (DataPact2.0 tag)
Created: February 8, 2026 | Updated: February 13, 2026
📁 Total Files: 139
💻 Code Files: 21 (Python modules)
📚 Docs Files: 18 (markdown + guides)
⚙️ Config Files: 4 (toml, yaml, gitignore)
🧪 Test Files: 15 test modules + 70 fixtures
📊 Code Statistics:
Lines of Code: ~1100+
Type Hint Coverage: 100%
Docstring Coverage: Comprehensive
External API Calls: 0 (local only)
Test Cases: 118 (21 core + 17 versioning + 19 banking/finance + 2 concurrency + 2 profiling + 4 reporting + 2 policy packs + 42 exhaustive + 4 db source + 5 odcs)
Code Coverage: 66%+
🔧 Supported Python: 3.9, 3.10, 3.11, 3.12
📦 Dependencies: pandas, pyyaml, pyarrow, pact-python (+ optional psycopg2-binary, pymysql)
✅ CI/CD: GitHub Actions configured
✨ Features: Schema drift + Quality + SLA + Custom rules + Distribution validation + Chunked validation + Profiling + Rule Severity + Versioning with auto-migration + Report sinks + Policy packs + Database sources + Contract providers + Normalization scaffold + ODCS compatibility
| Audience | Start Here | Then Read |
|---|---|---|
| End Users | README.md | QUICKSTART.md |
| Developers | CONTRIBUTING.md | ARCHITECTURE.md |
| AI Agents | .github/copilot-instructions.md | FILE_REFERENCE.md |
| Project Managers | DELIVERY_SUMMARY.md | SETUP_SUMMARY.md |
Core Documentation (14 files):
├─ README.md User guide & feature overview
├─ FEATURES.md Functional feature list with examples
├─ QUICKSTART.md Setup & quick start (5 min)
├─ CONTRIBUTING.md Developer guide
├─ docs/ARCHITECTURE.md Design decisions & data flow
├─ docs/VERSIONING.md Version history & migration guide
├─ FILE_REFERENCE.md File-by-file responsibilities
├─ PROJECT_STRUCTURE.md Visual directory tree
├─ SETUP_SUMMARY.md What was created
├─ COMPLETION_CHECKLIST.md Feature & QA checklist
├─ DELIVERY_SUMMARY.md Project overview & summary
├─ INDEX.md Navigation guide (this file)
├─ SEQUENCE_DIAGRAM_GUIDE.md Sequence diagram guide
└─ VERSIONING_IMPLEMENTATION.md Versioning implementation notes
AI & Advanced Documentation (2 files):
├─ .github/copilot-instructions.md AI coding guide (5.6 KB)
└─ docs/AI_INSTRUCTIONS_GUIDE.md Template for AI instructions
Configuration (4 files):
├─ pyproject.toml Python packaging
├─ setup.py Setuptools compatibility
├─ .gitignore Git exclusions
└─ .github/workflows/tests.yml GitHub Actions CI/CD
┌─────────────────────────────────────────────────────────┐
│ User / CI Pipeline │
│ (CLI: datapact validate) │
└────────────────────┬────────────────────────────────────┘
│
┌────────────┴────────────┐
↓ ↓
Contract YAML Data File / DB
│ │
(validation) (CSV/Parquet/JSON/DB)
│ │
└────────────┬────────────┘
↓
┌────────────────────────────┐
│ Contract Parser (YAML → │
│ Python Models) │
└────────────┬───────────────┘
↓
┌────────────────────────────┐
│ Data Source Loader │
│ (DataFrame + schema) │
└────────────┬───────────────┘
↓
┌────────────────────────────────────────┐
│ VALIDATION PIPELINE │
├────────────────────────────────────────┤
│ 1. Schema Validator │
│ (columns, types, required) │
│ → Blocking if critical issues │
├────────────────────────────────────────┤
│ 2. Quality Validator │
│ (nulls, unique, ranges, regex, enum)│
│ → Non-blocking │
├────────────────────────────────────────┤
│ 3. SLA Validator │
│ (min/max rows, freshness rules) │
│ → Non-blocking │
├────────────────────────────────────────┤
│ 4. Custom Rule Validator │
│ (plugin-defined rules) │
│ → Non-blocking │
├────────────────────────────────────────┤
│ 5. Distribution Validator │
│ (mean, std, drift detection) │
│ → Warnings only │
└────────────┬───────────────────────────┘
↓
┌────────────────────────────┐
│ Error Aggregation & Report │
│ Generation │
└────────────┬───────────────┘
↓
┌────────────────────────────┐
│ Output: JSON + Console + Sinks │
│ Exit Code: 0 or 1 │
└────────────────────────────┘
sequenceDiagram
autonumber
actor User as User/CLI
participant CLI as CLI Interface
participant Provider as Contract Provider
participant Parser as Contract Parser
participant Loader as Data Loader
participant Normalizer as Normalizer
participant Schema as Schema Validator
participant Quality as Quality Validator
participant Distribution as Distribution Validator
participant Reporter as Report Generator
participant Output as JSON/Console/Sinks
User->>+CLI: datapact validate --contract.yaml --data.csv/--db-*
CLI->>+Provider: Resolve format and load contract
Provider->>Parser: Parse contract YAML
Parser->>Parser: Apply policy packs
Provider-->>-CLI: Contract object
CLI->>+Loader: Load data (file or DB)
Loader-->>-CLI: DataFrame
CLI->>+Normalizer: Normalize dataframe (noop by default)
Normalizer-->>-CLI: DataFrame
rect rgb(200, 220, 255)
Note over Schema,Distribution: VALIDATION PIPELINE
CLI->>+Schema: Validate schema
Schema-->>-CLI: Errors/OK
CLI->>+Quality: Validate quality rules
Quality-->>-CLI: Errors & warnings
CLI->>+Distribution: Check distributions
Distribution-->>-CLI: Warnings only
end
CLI->>+Reporter: Aggregate results
Reporter-->>-CLI: ValidationReport
CLI->>+Output: Generate output
Output->>Output: Save JSON report
Output->>Output: Send to report sinks
Output->>Output: Print summary
Output-->>-CLI: Done
CLI->>User: Exit 0 or 1
src/datapact/
├── __init__.py Package entry point
├── contracts.py Contract parsing & models
│ ├─ Contract
│ ├─ Field
│ ├─ FieldRule
│ ├─ DistributionRule
│ ├─ Dataset
│ └─ FlattenConfig
├── providers/ Contract providers (format dispatch)
│ ├─ base.py Abstract provider interface
│ ├─ datapact_provider.py YAML contract loader
│ ├─ odcs_provider.py ODCS v3.1.0 schema mapper
│ └─ pact_provider.py Pact API contract type inference
├── policies.py Policy pack registry
├── datasource.py Data loading & inference
│ └─ DataSource
├── normalization/ Normalization scaffold
│ ├─ config.py
│ └─ normalizer.py
├── cli.py CLI interface
│ ├─ main()
│ ├─ validate_command()
│ ├─ init_command()
│ └─ profile_command()
├── profiling.py Contract profiling helpers
├── reporting.py Report generation
│ ├─ ErrorRecord
│ └─ ValidationReport
└── validators/ Validation pipeline
├── schema_validator.py
├── quality_validator.py
└── distribution_validator.py
| Feature | Status | File |
|---|---|---|
| YAML Contract Parsing | ✅ | contracts.py |
| CSV Loading | ✅ | datasource.py |
| Parquet Loading | ✅ | datasource.py |
| JSON Lines Loading | ✅ | datasource.py |
| Schema Validation | ✅ | validators/schema_validator.py |
| Schema Drift Policy | ✅ | contracts.py |
| Quality Rules | ✅ | validators/quality_validator.py |
| SLA Checks | ✅ | validators/sla_validator.py |
| Distribution Monitoring | ✅ | validators/distribution_validator.py |
| Chunked Validation | ✅ | datasource.py |
| Profiling | ✅ | profiling.py |
| Rule Severity | ✅ | contracts.py |
| Policy Packs | ✅ | policies.py |
| Custom Rule Plugins | ✅ | validators/custom_rule_validator.py |
| Contract Providers | ✅ | providers/ |
| Pact API Contracts | ✅ | providers/pact_provider.py |
| ODCS Compatibility | ✅ | providers/odcs_provider.py |
| Normalization Scaffold | ✅ | normalization/ |
| JSON Report Output | ✅ | reporting.py |
| Console Output | ✅ | reporting.py |
| CLI: validate | ✅ | cli.py |
| CLI: init | ✅ | cli.py |
| Type Hints | ✅ | All files |
| Pytest Tests | ✅ | tests/test_validator.py |
| Test Fixtures | ✅ | tests/fixtures/ |
| GitHub Actions CI/CD | ✅ | .github/workflows/tests.yml |
1. SETUP
└─ pip install -e ".[dev]"
└─ PYTHONPATH=./src
2. DEVELOPMENT
├─ Edit code in src/
├─ Run tests: pytest tests/
├─ Format: black src/ tests/
├─ Lint: ruff check src/ tests/
└─ Type check: mypy src/
3. VALIDATION
├─ python3 src/datapact/cli.py validate \
│ --contract tests/fixtures/customer_contract.yaml \
│ --data tests/fixtures/valid_customers.csv
└─ Check reports/
4. COMMIT
├─ git add .
├─ git commit -m "Feature: ..."
└─ GitHub Actions runs tests
Unit Tests:
├─ TestSchemaValidator (schema checks)
├─ TestQualityValidator (quality rules)
├─ TestSLAValidator (row count checks)
├─ TestChunkedValidation (chunked validation)
├─ TestCustomRules (plugin rules)
├─ TestDataSource (loading & inference)
└─ TestDistributionValidator (distribution checks)
Test Data:
├─ customer_contract.yaml (comprehensive contract)
├─ customer_contract_v1.yaml (legacy contract)
├─ customer_contract_v2.yaml (current contract)
├─ valid_customers.csv (passes all checks)
├─ invalid_customers.csv (intentional violations)
├─ deposits_contract.yaml (deposits contract)
├─ lending_contract.yaml (lending contract)
├─ deposits_data.csv (deposits accounts)
├─ lending_data.csv (lending loans)
├─ deposits_accounts_agg_contract.yaml (aggregate deposits contract)
├─ lending_loans_agg_contract.yaml (aggregate lending contract)
├─ deposits_transactions_contract.yaml (deposits transactions contract)
├─ lending_payments_contract.yaml (lending payments contract)
├─ deposits_transactions.csv (deposits transactions)
├─ lending_payments.csv (lending payments)
├─ deposits_accounts_agg.csv (aggregate deposits)
└─ lending_loans_agg.csv (aggregate lending)
Coverage:
├─ All validators ✅
├─ All data formats ✅
├─ All rule types ✅
├─ Error conditions ✅
└─ Report generation ✅
- not_null - Require no null values
- unique - All values must be distinct
- min/max - Numeric range constraints
- regex - Pattern matching
- enum - Value whitelist
- max_null_ratio - Tolerance for nulls
- freshness_max_age_hours - Max allowed timestamp age (hours)
- mean - Expected average
- std - Expected standard deviation
- max_drift_pct - Alert threshold for changes
- max_z_score - Outlier detection threshold
ERROR (blocks validation)
├─ Missing required columns
├─ Type mismatches
├─ Constraint violations
└─ Validation failures
WARN (informational)
├─ Extra columns not in contract (configurable)
├─ Distribution drift detected
├─ Statistical anomalies
└─ Soft constraint violations
{
"passed": false,
"contract": {
"name": "customer_data",
"version": "2.0.0"
},
"dataset": {"name": "customers"},
"metadata": {
"timestamp": "2026-02-08T10:30:45",
"tool_version": "0.2.0"
},
"summary": {
"error_count": 2,
"warning_count": 1
},
"errors": [
{
"code": "QUALITY",
"field": "email",
"message": "has 1 values not matching regex",
"severity": "ERROR"
}
]
}# Validate data
export PYTHONPATH=./src
python3 src/datapact/cli.py validate \
--contract contract.yaml \
--data data.csv
# Infer contract from data
python3 src/datapact/cli.py init \
--contract new_contract.yaml \
--data data.csv
# Run tests
pytest tests/test_validator.py -v
# Code quality
black src/ tests/
ruff check src/ tests/
mypy src/Automated performance and non-functional requirements (NFR) tests ensure DataPact is robust and efficient at scale. These tests cover:
- Large CSV validation time
- Contract parsing speed
- CLI startup time
- Memory usage for large files
- Batch validation throughput
- Concurrent validation throughput
- Performance degradation (scaling)
See PERFORMANCE_NFR_SUMMARY.md for the latest results, coverage, and CI integration instructions.
Performance/NFR tests are run automatically in CI (see .github/workflows/ci.yml). Reports are uploaded as artifacts for every push and pull request.
To run locally:
PYTHONPATH=src python3 -m pytest tests/test_performance.py tests/test_performance_extra.py --durations=10 --tb=short --junitxml=performance_report.xml- ✅ Setup - Follow QUICKSTART.md
- 📖 Learn - Read .github/copilot-instructions.md
- 🧪 Test - Run
pytest tests/test_validator.py - 🚀 Build - Create your own contracts
- 📊 Deploy - Integrate into CI/CD pipelines
Repository: GitHub - meetnishant/DataPact
Status: ✅ Production Ready (v2.0.0)
Release Tag: DataPact2.0
Last Updated: February 13, 2026