Skip to content

Releases: getyourguide/dataframe-expectations

v0.6.0

18 Mar 14:19
0fb3d21

Choose a tag to compare

0.6.0 (2026-03-18)

Features

  • PySpark is now an optional dependency (27e864b)

    Users running PySpark in managed environments (Databricks, EMR, etc.) typically have PySpark
    pre-installed and cannot or do not want the library to reinstall it. PySpark is now optional
    and must be explicitly requested:

    pip install dataframe-expectations           # pandas only
    pip install dataframe-expectations[pyspark]  # includes pyspark
    

    pandas, pydantic, and tabulate remain hard dependencies. Importing dataframe_expectations
    no longer touches PySpark at all when it isn't installed — all PySpark imports are deferred
    behind @lru_cache helpers that return a proxy raising a clear ImportError only when a
    PySpark code path is actually executed.

  • PySpark tests isolated by marker (ef847ed)

    All PySpark test cases are decorated with @pytest.mark.pyspark and separated into their own
    parametrize blocks. --strict-markers is now enforced so unregistered markers cause an
    immediate failure. Tests can be run without PySpark present:

    pytest -m "not pyspark"   # no PySpark required
    pytest -m pyspark          # requires PySpark
    
  • CI updated to cover three install scenarios

    Job How PySpark is present Tests run
    tests-without-pyspark Not installed -m "not pyspark"
    tests-with-pyspark-extra pip install .[pyspark] All
    tests-with-external-pyspark Pre-installed externally All

    The tests-with-external-pyspark job specifically validates that the library works correctly
    when PySpark is already present in the environment and was not installed by this package.

v0.5.2

16 Mar 15:05
33a969e

Choose a tag to compare

What's Changed

Improvements Fixes

  • PySpark is now treated as an optional dependency at runtime. Users who only need pandas DataFrame validation will no longer see import errors if PySpark is not installed. (15d38ca)

Other Changes

  • Added workflow_dispatch trigger to the release-please workflow for manual triggering via GitHub Actions UI.
  • Dependency updates: ruff bumped to 0.15.6, tabulate bumped to 0.10.0.

Full Changelog: v0.5.1...v0.5.2

v0.5.1

28 Jan 11:38
639061b

Choose a tag to compare

0.5.1 (2026-01-28)

Features

  • adding new numeric expectations (2b07c7a)
  • adding new numeric expectations (90f8cb7)

Documentation

  • improved the API docs website (df9f7b1)
  • improved the API docs website (966ea5a)
  • minor corrections to readme (15fa72d)
  • minor corrections to readme (3358d23)
  • partitioned readme (8cf59b1)
  • partitioned readme (30500e7)

v0.5.0

22 Nov 14:11
d62ce0b

Choose a tag to compare

Features

Tag-based Filtering

Add support for selective expectation execution using custom tags.

Key Features:

  • New TagMatchMode enum with ANY (OR logic) and ALL (AND logic) options
  • Tag expectations with "key:value" format (e.g., "priority:high", "env:prod")
  • Filter expectations at build time

Example:

# Tag expectations
suite = (
    DataFrameExpectationsSuite()
    .expect_value_greater_than(column_name="age", value=18, tags=["priority:high", "env:prod"])
    .expect_value_not_null(column_name="name", tags=["priority:high"])
    .expect_min_rows(min_rows=1, tags=["priority:low", "env:test"])
)

# Run only high-priority checks (OR logic)
runner = suite.build(tags=["priority:high"], tag_match_mode=TagMatchMode.ANY)

# Run production-critical checks (AND logic)
runner = suite.build(tags=["priority:high", "env:prod"], tag_match_mode=TagMatchMode.ALL)

Programmatic Result Inspection

Enhanced SuiteExecutionResult for detailed validation analysis.

Key Features:

  • Use raise_on_failure=False to inspect results without raising exceptions
  • Access comprehensive metrics: total_expectations, total_passed, total_failed, pass_rate, total_duration_seconds
  • Inspect individual expectation results with status, violation counts, descriptions, and timing
  • View applied tag filters in execution results

Example:

# Get results without raising exceptions
result = runner.run(df, raise_on_failure=False)

# Inspect the results programmatically
print(f"Total expectations: {result.total_expectations}")
print(f"Passed: {result.total_passed}, Failed: {result.total_failed}")
print(f"Pass rate: {result.pass_rate:.2%}")
print(f"Applied filters: {result.applied_filters}")
print(f"Tag match mode: {result.tag_match_mode}")

# Access individual expectation results
for exp_result in result.results:
    if exp_result.status == "failed":
        print(f"Failed: {exp_result.description}")
        print(f"Violation count: {exp_result.violation_count}")

Documentation

  • Added tag-based filtering examples to README.md and getting_started.rst
  • Updated adding_expectations.rst with proper tag handling patterns for custom expectations
  • Documented programmatic result inspection with comprehensive examples
  • Reorganized documentation structure: user guide in getting_started.rst, developer notes in adding_expectations.rst

Full Changelog: v0.4.0...v0.5.0

v0.4.0

10 Nov 16:19
a110faf

Choose a tag to compare

0.4.0 (2025-11-10)

⚠ BREAKING CHANGES

  • ‼️ BREAKING CHANGE: Major codebase restructuring with new module organization. However, most changes are made to the internal modules.

What changed:

  • All internal modules have been reorganized into a core/ package
  • Expectation registry simplified from three-dictionary to two-dictionary structure with O(1) lookups
  • Main imports updated from expectations_suite to suite

Migration guide:
Update your imports to use the new module structure:

# Before
from dataframe_expectations.expectations_suite import DataFrameExpectationsSuite

# After
from dataframe_expectations.suite import DataFrameExpectationsSuite

Features

  • restructure codebase with core/ module and explicit imports (42a233a)
  • restructure codebase, and registry refactoring (111bca1)
  • simplified registry (c182858)

Bug Fixes

  • consolidate imports (9a76467)
  • deleted duplicate dataclass and enums from registry (82bec0c)
  • deleted duplicate DataFrameExpectation codefrom expectations package (d47eb8b)
  • import enums from types (fa84764)
  • manually trigger CI for release-please PRs (49419e6)
  • manually trigger CI for release-please PRs (9585cf5)
  • return corrent version when package is built (82ff343)

Documentation

Full Changelog: v0.3.0...v0.4.0

v0.3.0

09 Nov 12:43
5567760

Choose a tag to compare

🎯 DataFrame Expectations v0.3.0

⚠️ Breaking Changes

This release introduces a builder pattern for the DataFrameExpectationsSuite that changes how you create and run expectation suites.

Migration Guide:

# Before (v0.2.0)
suite = DataFrameExpectationsSuite()
suite.expect_min_rows(min_rows=3)
suite.run(df)

# After (v0.3.0)
suite = DataFrameExpectationsSuite()
suite.expect_min_rows(min_rows=3)
runner = suite.build()  # New: Build a runner
runner.run(df)          # Run on the runner

✨ New Features

🏗️ Builder Pattern & Immutable Runners

  • Introduces DataFrameExpectationsSuiteRunner - an immutable runner created via .build()
  • Allows reusing the same validation logic across multiple DataFrames
  • Enables building multiple independent runners from the same suite at different stages

🎨 Decorator Pattern for Automatic Validation

Validate DataFrames returned by functions automatically using the @runner.validate decorator:

@runner.validate
def load_data():
    return pd.DataFrame({"col": [1, 2, 3]})

# Supports optional DataFrame returns
@runner.validate(allow_none=True)
def maybe_load_data():
    if condition:
        return pd.DataFrame(...)
    return None

🔍 Expectation Inspection

  • Added expectation_count property to check the number of expectations
  • Added list_expectations() method to view all expectations in a runner

📚 Documentation

  • Added Spark session initialization to PySpark examples in README and documentation
  • Improved example code to be immediately runnable

🔧 Maintenance

  • Updated release configuration for simpler tag generation
  • Dependency updates: pytest 9.0.0, ruff 0.14.4, pre-commit 4.4.0

📦 What's Changed

  • fix: update release please config to generate simple tags by @ryanseq-gyg in #13
  • feat!: implement builder pattern for expectation suite runner by @ryanseq-gyg in #18
  • build(deps): bump pre-commit from 4.3.0 to 4.4.0 by @dependabot in #17
  • build(deps): bump ruff from 0.14.3 to 0.14.4 by @dependabot in #16
  • build(deps): bump pytest from 8.4.2 to 9.0.0 in the 01_major-updates group by @dependabot in #15

Full Changelog: v0.2.0...v0.3.0

v0.2.0

08 Nov 20:16
0170ac5

Choose a tag to compare

This release introduces a major refactoring of the expectation registration system, replacing 800+ lines of boilerplate with dynamic method generation from a central registry. The refactoring maintains full IDE type-ahead support through auto-generated stub files while significantly improving maintainability.

Features

  • Dynamic Expectation Registration: Implement dynamic method generation with centralized registry system
    • Replaces manual method definitions in DataFrameExpectationsSuite
    • Maintains IDE type hints through auto-generated .pyi stub files
    • Reduces boilerplate and improves maintainability

Bug Fixes

  • Handle pandas DataFrame.map() compatibility for older versions
  • Convert expectation category to str while generating stubs

Documentation

  • Update documentation for new registration system
  • Remove API reference button on expectation cards
  • Update README with additional badges

Chores

  • Add publishing and release workflows
  • Pin action commit hashes and update PR template
  • Update sanity checks script for dynamic expectation calls
  • Update release-please to approved version

What's Changed

Full Changelog: v0.1.1...dataframe-expectations-v0.2.0

v0.1.1

31 Oct 15:25
3f89e95

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: https://github.com/getyourguide/dataframe-expectations/commits/v0.1.1