This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
# Create virtual environment
python3.11 -m venv venv
source venv/bin/activate
# Install Requirements
pip install --upgrade pip setuptools wheel
pip install -e '.[dev]'
# Setup pre-commit hooks
pre-commit installAlternatively, using uv (recommended):
# Pin Python version
uv python pin 3.11
uv pip install -e '.[dev]'Tests that use PySpark (e.g., test_test_kafka.py, test_test_delta.py, test_test_dataframe.py, test_import_spark.py) require Java 21. Set JAVA_HOME to a Java 21 installation before running these tests.
# Using SDKMAN
source ~/.sdkman/bin/sdkman-init.sh
sdk use java 21-open
# Or set JAVA_HOME directly
export JAVA_HOME=$(/usr/libexec/java_home -v 21)# Run all tests
pytest
# Run tests in parallel
pytest -n 8
# Run a specific test file
pytest tests/test_specific.py
# Run a specific test function
pytest tests/test_specific.py::test_function_name# Check code with ruff
ruff check .
# Fix linting issues automatically
ruff check --fix .
# Format code
ruff format .
# Run all pre-commit hooks
pre-commit run --all-files# Initialize a new data contract
datacontract init
# Lint a data contract file
datacontract lint datacontract.yaml
# Test a data contract against actual data
datacontract test datacontract.yaml
# Export a data contract to a different format
datacontract export --format html datacontract.yaml --output datacontract.html
# Import from a different format
datacontract import --format sql --source my-ddl.sql --dialect postgres --output datacontract.yaml
# Show a changelog between two data contracts
datacontract changelog datacontract-v1.yaml datacontract-v2.yaml
The Data Contract CLI is an open-source command-line tool for working with data contracts:
-
CLI Interface (
datacontract/cli.py): Entry point for the command-line interface using Typer. -
Data Contract Core (
datacontract/data_contract.py): Central class for working with data contracts, handling operations like testing, validation, and export/import. -
Engines: Modules for connecting to different data stores and executing tests:
datacontract/engines/: Contains implementations for testing against various data sources- Supports multiple backend types: S3, BigQuery, Postgres, Snowflake, Kafka, etc.
-
Export/Import: Modules for converting data contracts to/from different formats:
datacontract/export/: Converters for formats like Avro, SQL, dbt, HTML, etc.datacontract/imports/: Importers from formats like SQL, Avro, JSON Schema, etc.
-
Linting (
datacontract/lint/): Tools for validating data contract files against schema and best practices. -
Changelog (
datacontract/changelog/): Semantic comparison of ODCS data contracts.
The project uses factory patterns for extensibility:
exporter_factoryandimporter_factoryallow registering custom exporters/importers- You can create custom exporters for new output formats or importers for new input formats
- Tests are organized in the
tests/directory - Many tests use fixtures in
tests/fixtures/which provide sample data contracts and test data - Supports integration testing with various databases and data stores
- Tests describe expected behavior, not actual behavior. Write the test for what the code should do. If the test fails, fix the code under test, not the test (unless there is a justified reason for simplification).
- Python 3.10+ syntax and features
- Uses Pydantic for data validation and schema definition
- Type hints throughout the codebase
- Follows PEP 8 style guidelines with some adjustments (120 character line length)