-
Notifications
You must be signed in to change notification settings - Fork 0
Handle VTL Number type correctly with tolerance-based comparisons. Docs updates #460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
javihern98
wants to merge
15
commits into
main
Choose a base branch
from
cr-457
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…tput formatting Implements tolerance-based comparison for Number values in equality operators and configurable output formatting with significant digits. Changes: - Add _number_config.py utility module for reading environment variables - Modify comparison operators (=, >=, <=, between) to use significant digits tolerance for Number comparisons - Update CSV output to use float_format with configurable significant digits - Add comprehensive tests for all new functionality Environment variables: - COMPARISON_ABSOLUTE_THRESHOLD: Controls comparison tolerance (default: 10) - OUTPUT_NUMBER_SIGNIFICANT_DIGITS: Controls output formatting (default: 10) Values: - None/not defined: Uses default value of 10 significant digits - 6 to 14: Uses specified number of significant digits - -1: Disables the feature (uses Python's default behavior) Closes #457
- Add tolerance-based equality checks to HREqual, HRGreaterEqual, HRLessEqual - Update test expected output for DEMO1 to reflect new tolerance behavior (filtering out floating-point precision errors in check_hierarchy results)
- More conservative tolerance (5e-14 instead of 5e-10) - DEMO1 test now expects 4 real imbalance rows (filters 35 floating-point artifacts) - Updated test for numbers_are_equal to use smaller difference
- Add --unsafe-fixes flag to ruff check - Add mandatory step 3 with all quality checks before creating PR - Require: ruff format, ruff check --fix --unsafe-fixes, mypy, pytest
IEEE 754 float64 guarantees 15 significant decimal digits (DBL_DIG=15). Updated DEFAULT_SIGNIFICANT_DIGITS and MAX_SIGNIFICANT_DIGITS from 14 to 15 to use the full guaranteed precision of double-precision floating point. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The S3 mock tests now expect float_format="%.15g" in to_csv calls, matching the output formatting behavior added for Number type handling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New docs/environment_variables.rst documenting: - COMPARISON_ABSOLUTE_THRESHOLD (Number comparison tolerance) - OUTPUT_NUMBER_SIGNIFICANT_DIGITS (CSV output formatting) - AWS/S3 environment variables - Usage examples for each scenario Includes float64 precision rationale (DBL_DIG=15) explaining the valid range of 6-15 significant digits. Closes #458 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Contributor
Author
|
Do not merge automatically this branch, I will merge it when Suite team have checked also the changes |
Ensure tolerance-based equality is evaluated before strict < or > comparison in _numbers_less_equal and _numbers_greater_equal. Also tighten parameter types from Any to Union[int, float]. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Inline isinstance checks so mypy can narrow types in the Between operator. Function signatures were already formatted correctly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Convert TestCase classes to plain pytest functions with @pytest.mark.parametrize for cleaner, more concise test definitions. Add Claude Code instructions based on copilot-instructions.md. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements tolerance-based comparisons for floating-point numbers in VTL, addressing issue #457, and adds environment variable documentation for issue #458.
Float64 Precision Rationale
IEEE 754 float64 (double precision) has 52 mantissa bits (53 effective with implicit leading 1):
The valid range for significant digits is 6–15, where:
Changes
New Environment Variables
Two new environment variables control the behavior:
COMPARISON_ABSOLUTE_THRESHOLD- Controls tolerance for comparison operatorsOUTPUT_NUMBER_SIGNIFICANT_DIGITS- Controls CSV output formattingTolerance Algorithm
Relative tolerance is calculated as:
0.5 * 10^(-(N-1))where N = significant digitsFor the default of 15 significant digits:
5e-15This is the most conservative setting that still filters floating-point precision artifacts, using the full guaranteed precision of float64.
Modified Operators
Standard Comparison Operators (
Comparison.py):=)<>)>=) — equality checked before strict><=) — equality checked before strict<Hierarchical Ruleset Operators (
HROperators.py):><Output Formatting
CSV output now uses
float_format="%.{N}g"to limit floating-point precision in output files.Documentation (Issue #458)
New
docs/environment_variables.rstpage documenting:COMPARISON_ABSOLUTE_THRESHOLD,OUTPUT_NUMBER_SIGNIFICANT_DIGITS)AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_SESSION_TOKEN,AWS_DEFAULT_REGION,AWS_ENDPOINT_URL)Added to the docs toctree in
docs/index.rst.Claude Code Instructions
New
CLAUDE.mdfile with project-specific instructions for Claude Code, derived from.github/copilot-instructions.md.Test Coverage
@pytest.mark.parametrizefloat_formatparameter into_csvcallstest_DEMO1expected output: now returns 4 rows with real imbalances (filters 35 floating-point artifacts)Union[int, float]) on_numbers_less_equaland_numbers_greater_equalBreaking Changes
This change affects comparison results for floating-point numbers. Users can:
COMPARISON_ABSOLUTE_THRESHOLD=-1COMPARISON_ABSOLUTE_THRESHOLD=10(tolerance ~5e-10)COMPARISON_ABSOLUTE_THRESHOLD=15(~5e-15, full float64 precision)Closes #457
Closes #458