Handle VTL Number type correctly with tolerance-based comparisons. Docs updates #460

javihern98 · 2026-01-23T16:27:31Z

Summary

This PR implements tolerance-based comparisons for floating-point numbers in VTL, addressing issue #457, and adds environment variable documentation for issue #458.

Float64 Precision Rationale

IEEE 754 float64 (double precision) has 52 mantissa bits (53 effective with implicit leading 1):

Property	Value	Meaning
log10(2^53)	≈ 15.95	Maximum significant decimal digits
DBL_DIG	15	Guaranteed decimal digits for round-trip (decimal → float64 → decimal)
DBL_DECIMAL_DIG	17	Digits needed for exact float64 → decimal → float64 round-trip

The valid range for significant digits is 6–15, where:

6 is the minimum practical precision (coarse tolerance)
15 is the maximum guaranteed precision for float64 — beyond this, representation noise appears

Changes

New Environment Variables

Two new environment variables control the behavior:

COMPARISON_ABSOLUTE_THRESHOLD - Controls tolerance for comparison operators
- Default: 15 (significant digits, the maximum guaranteed precision for float64)
- Range: 6-15
- Set to -1 to disable tolerance (exact comparison, may trigger errors or extra decimals not needed)
OUTPUT_NUMBER_SIGNIFICANT_DIGITS - Controls CSV output formatting
- Same values as above
- Controls float_format parameter in pandas to_csv

Tolerance Algorithm

Relative tolerance is calculated as: 0.5 * 10^(-(N-1)) where N = significant digits

For the default of 15 significant digits:

Relative tolerance = 5e-15
Absolute tolerance = relative_tolerance × max(|a|, |b|)

This is the most conservative setting that still filters floating-point precision artifacts, using the full guaranteed precision of float64.

Modified Operators

Standard Comparison Operators (Comparison.py):

Equal (=)
NotEqual (<>)
GreaterEqual (>=) — equality checked before strict >
LessEqual (<=) — equality checked before strict <
Between

Hierarchical Ruleset Operators (HROperators.py):

HREqual
HRGreaterEqual — equality checked before strict >
HRLessEqual — equality checked before strict <

Output Formatting

CSV output now uses float_format="%.{N}g" to limit floating-point precision in output files.

Documentation (Issue #458)

New docs/environment_variables.rst page documenting:

Number handling variables (COMPARISON_ABSOLUTE_THRESHOLD, OUTPUT_NUMBER_SIGNIFICANT_DIGITS)
S3/AWS configuration variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_DEFAULT_REGION, AWS_ENDPOINT_URL)
Usage examples for each scenario
Float64 precision rationale

Added to the docs toctree in docs/index.rst.

Claude Code Instructions

New CLAUDE.md file with project-specific instructions for Claude Code, derived from .github/copilot-instructions.md.

Test Coverage

Number handling tests refactored to pytest with @pytest.mark.parametrize
S3 mock tests updated to expect float_format parameter in to_csv calls
All 3463 tests passing
Updated test_DEMO1 expected output: now returns 4 rows with real imbalances (filters 35 floating-point artifacts)
Strict typing (Union[int, float]) on _numbers_less_equal and _numbers_greater_equal

Breaking Changes

This change affects comparison results for floating-point numbers. Users can:

Goal	Setting
Disable tolerance (exact comparisons)	`COMPARISON_ABSOLUTE_THRESHOLD=-1`
More lenient tolerance	`COMPARISON_ABSOLUTE_THRESHOLD=10` (tolerance ~5e-10)
Strictest tolerance (default)	`COMPARISON_ABSOLUTE_THRESHOLD=15` (~5e-15, full float64 precision)

Closes #457
Closes #458

…tput formatting Implements tolerance-based comparison for Number values in equality operators and configurable output formatting with significant digits. Changes: - Add _number_config.py utility module for reading environment variables - Modify comparison operators (=, >=, <=, between) to use significant digits tolerance for Number comparisons - Update CSV output to use float_format with configurable significant digits - Add comprehensive tests for all new functionality Environment variables: - COMPARISON_ABSOLUTE_THRESHOLD: Controls comparison tolerance (default: 10) - OUTPUT_NUMBER_SIGNIFICANT_DIGITS: Controls output formatting (default: 10) Values: - None/not defined: Uses default value of 10 significant digits - 6 to 14: Uses specified number of significant digits - -1: Disables the feature (uses Python's default behavior) Closes #457

- Add tolerance-based equality checks to HREqual, HRGreaterEqual, HRLessEqual - Update test expected output for DEMO1 to reflect new tolerance behavior (filtering out floating-point precision errors in check_hierarchy results)

…eter

- More conservative tolerance (5e-14 instead of 5e-10) - DEMO1 test now expects 4 real imbalance rows (filters 35 floating-point artifacts) - Updated test for numbers_are_equal to use smaller difference

…tions

- Add --unsafe-fixes flag to ruff check - Add mandatory step 3 with all quality checks before creating PR - Require: ruff format, ruff check --fix --unsafe-fixes, mypy, pytest

…onfig)

IEEE 754 float64 guarantees 15 significant decimal digits (DBL_DIG=15). Updated DEFAULT_SIGNIFICANT_DIGITS and MAX_SIGNIFICANT_DIGITS from 14 to 15 to use the full guaranteed precision of double-precision floating point. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The S3 mock tests now expect float_format="%.15g" in to_csv calls, matching the output formatting behavior added for Number type handling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

New docs/environment_variables.rst documenting: - COMPARISON_ABSOLUTE_THRESHOLD (Number comparison tolerance) - OUTPUT_NUMBER_SIGNIFICANT_DIGITS (CSV output formatting) - AWS/S3 environment variables - Usage examples for each scenario Includes float64 precision rationale (DBL_DIG=15) explaining the valid range of 6-15 significant digits. Closes #458 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

javihern98 · 2026-01-23T18:12:18Z

Do not merge automatically this branch, I will merge it when Suite team have checked also the changes

Ensure tolerance-based equality is evaluated before strict < or > comparison in _numbers_less_equal and _numbers_greater_equal. Also tighten parameter types from Any to Union[int, float]. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Inline isinstance checks so mypy can narrow types in the Between operator. Function signatures were already formatted correctly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Convert TestCase classes to plain pytest functions with @pytest.mark.parametrize for cleaner, more concise test definitions. Add Claude Code instructions based on copilot-instructions.md. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

javihern98 added 8 commits January 20, 2026 12:45

Bump version to 1.5.0rc4

6220d4c

Add tolerance-based comparison to HR operators

128ba42

- Add tolerance-based equality checks to HREqual, HRGreaterEqual, HRLessEqual - Update test expected output for DEMO1 to reflect new tolerance behavior (filtering out floating-point precision errors in check_hierarchy results)

Fix ruff issues in tests: combine with statements and add match param…

e748e96

…eter

Change default threshold from 10 to 14 significant digits

b1da384

- More conservative tolerance (5e-14 instead of 5e-10) - DEMO1 test now expects 4 real imbalance rows (filters 35 floating-point artifacts) - Updated test for numbers_are_equal to use smaller difference

Add Git workflow and branch naming convention (cr-{issue}) to instruc…

5a16c05

…tions

Enforce mandatory quality checks before PR creation in instructions

e2ce92b

- Add --unsafe-fixes flag to ruff check - Add mandatory step 3 with all quality checks before creating PR - Require: ruff format, ruff check --fix --unsafe-fixes, mypy, pytest

Remove folder specs from quality check commands (use pyproject.toml c…

e544657

…onfig)

javihern98 mentioned this pull request Jan 23, 2026

Handle VTL Number type correctly in comparison operators and output formatting #457

Open

javihern98 and others added 3 commits January 23, 2026 17:47

Merge branch 'main' into cr-457

14335de

Fix S3 tests to expect float_format parameter in to_csv calls

963fe84

The S3 mock tests now expect float_format="%.15g" in to_csv calls, matching the output formatting behavior added for Number type handling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

javihern98 marked this pull request as ready for review January 23, 2026 18:06

javihern98 requested review from a team and albertohernandez1995 January 23, 2026 18:06

javihern98 changed the title ~~Fix #457: Handle VTL Number type correctly with tolerance-based comparisons~~ Handle VTL Number type correctly with tolerance-based comparisons Jan 23, 2026

javihern98 changed the title ~~Handle VTL Number type correctly with tolerance-based comparisons~~ Handle VTL Number type correctly with tolerance-based comparisons. Docs updates Jan 23, 2026

javihern98 and others added 3 commits January 23, 2026 21:16

Fix ruff and mypy issues in comparison operators

4339d57

Inline isinstance checks so mypy can narrow types in the Between operator. Function signatures were already formatted correctly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle VTL Number type correctly with tolerance-based comparisons. Docs updates #460

Handle VTL Number type correctly with tolerance-based comparisons. Docs updates #460

Uh oh!

javihern98 commented Jan 23, 2026 •

edited

Loading

Uh oh!

javihern98 commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Handle VTL Number type correctly with tolerance-based comparisons. Docs updates #460

Are you sure you want to change the base?

Handle VTL Number type correctly with tolerance-based comparisons. Docs updates #460

Uh oh!

Conversation

javihern98 commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Float64 Precision Rationale

Changes

New Environment Variables

Tolerance Algorithm

Modified Operators

Output Formatting

Documentation (Issue #458)

Claude Code Instructions

Test Coverage

Breaking Changes

Uh oh!

javihern98 commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

javihern98 commented Jan 23, 2026 •

edited

Loading