fix: enforce nullable types for null literals in test cases#989
fix: enforce nullable types for null literals in test cases#989benbellick merged 10 commits intomainfrom
Conversation
a7c8cad to
3a1ba99
Compare
| @staticmethod | ||
| def is_same_type(func_arg_type, arg_type): | ||
| arg_type_base = arg_type.split("<")[0] | ||
| arg_type_base = arg_type.split("<")[0].rstrip("?") |
There was a problem hiding this comment.
I'm not familiar with this part of repo... but this looks like comparing only the base type. is_same_type is the proper name for this function? also, shouldn't it consider DISCRETE nullability https://substrait.io/expressions/scalar_functions/#nullability-handling?
There was a problem hiding this comment.
I'm not so familiar with this part of it either. At one point in history I was looking into rewriting quite a bit of it to use a more typed representation, but ended up focusing on other things.
Yes, I think it is only comparing the base type. Though that was the behavior before this PR 🤷. I agree it should consider DISCRETE nullability. Is it possible to leave those sort of validations for another PR?
I think there is a lot of room for improvement in these test cases, but wanted to keep changes manageable.
There was a problem hiding this comment.
I don't mean to hold back this PR, just want to understand whether I am missing something. 😄
But overall, it appears that this needs to go under some careful review and clean up. Yes, we should do it incrementally -- otherwise, we are going nowhere. 😆
There was a problem hiding this comment.
+1 to this being something we can improve, but that which we should improve in the future.
| and(null::bool, null::bool) = null::bool | ||
| and(true::bool, null::bool?) = null::bool? | ||
| and(null::bool?, true::bool) = null::bool? | ||
| and(false::bool, null::bool?) = false::bool |
There was a problem hiding this comment.
Actually maybe this output should be nullable, too?
|
Added a potential 3rd option for this in the original issue #988 (comment) It's a tiny bit obnoxious to have to add * specifically the proto |
…and discrete functions Ensure test case output types respect nullability handling rules: - MIRROR: output is nullable when any input is nullable - DECLARED_OUTPUT: output nullability matches declared return type - DISCRETE: same as DECLARED_OUTPUT for output nullability
Add automated validation that test case return type nullability is consistent with the function's declared nullability handling (MIRROR, DECLARED_OUTPUT, DISCRETE). Fix visitor to preserve nullable markers on bool, string, timestamp, and other types that were previously hardcoded. Revert 3 incorrect list test case changes from prior commit where inner element nullability was confused with outer type nullability.
f8f1b95 to
6534543
Compare
| print(json.dumps(asdict(actual_baseline), indent=2)) | ||
|
|
||
|
|
||
| def test_substrait_nullability_consistency(): |
There was a problem hiding this comment.
This is really the core driver of this PR. I realize there are a lot of cases changed, but all of the changes were driven by making this test pass. This test is just checking that the test cases have nullability which is consistent with the nullability handling of the functions as they are defined in the YAML files.
|
ACTION NEEDED Substrait follows the Conventional Commits The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
| from typing import List | ||
|
|
||
|
|
||
| def type_str_is_outer_nullable(type_str): |
There was a problem hiding this comment.
In the interest of keeping this diff small, I used this hack. I would love to have a better solution, but I think that is worth its own dedicated PR.
This was a weakly held opinion, and I have been easily convinced the explicit is better than implicit in this case. |
vbarua
left a comment
There was a problem hiding this comment.
Partial review, got nerdsniped into doing cursed ANTLR things.
| if not type_str_is_outer_nullable(datatype): | ||
| raise ParseError( | ||
| f"Null literal must have a nullable type (add '?'), " | ||
| f"got: null::{datatype}" |
There was a problem hiding this comment.
Nice, this will make it easy for other folks to fix this issue when adding new tests.
| value=column, | ||
| type=column_type, | ||
| nullable=type_str_is_outer_nullable(column_type), | ||
| ) |
There was a problem hiding this comment.
You were asking if you could do something better*. You can technically just reach down through the nodes and look for the QMark() directly.
column = self.visitColumnValues(ctx.columnValues())
column_type = ctx.dataType()
is_nullable = column_type.children[0].children[0].QMark() == '?'* not sure if this is better tbh, but it was fun to figure out
There was a problem hiding this comment.
Talking with @/benbellick about this, in some ways this is more fragile than his approach of producing the type string and parsing that. I'm happy to go with his approach, and while there's always room for improvement, I don't think it needs to happen in this PR.
| bracket_pos = type_str.find("<") | ||
| if bracket_pos == -1: | ||
| return type_str.endswith("?") | ||
| return "?" in type_str[:bracket_pos] |
There was a problem hiding this comment.
This is reasonable enough, and the type string shape can be considered stable.
vbarua
left a comment
There was a problem hiding this comment.
Changes look good overall. Left some minor comments, but nothing I consider blocking.
Address PR review feedback: - Rename return_nullable to is_return_nullable for better readability - Add is_return_nullable to FunctionOverload.__str__ method
…sults The coverage checker's `is_same_type` compared only the base type name and stripped all parameters, so decimal precision/scale, varchar length, and list/map element types were never actually validated against the extension YAML return formulas. Two wrong test cases had been sitting in-tree because of this: - `sum((2.5, 0, 5.0, -2.5, -7.5)::dec<2, 1>) = -2.5::dec<38, 2>` in `tests/cases/arithmetic_decimal/sum_decimal.test`. `sum` returns `DECIMAL?<38,S>`, so with input scale 1 the output must be `dec?<38, 1>`, not `dec?<38, 2>`. - `nullif(null::dec?<38, 0>, null::dec?<38, 0>) = null::bool?` in `tests/cases/comparison/nullif.test`. `nullif` is `any1, any1 -> any1?`, so with both args `dec<38, 0>` the result must be `dec?<38, 0>`, not `bool?`. Looks like a copy-paste from the bool cases above. Both wrong results date back to the original BFT port (substrait-io#738). They are not undoing prior work: substrait-io#913 only touched the `basic` i8/i16 block at the top of `nullif.test`, and substrait-io#989 rewrote both lines only to add `?` nullability markers, preserving the underlying wrong `38,2` and wrong `bool` base type. This change keeps every `?` marker from substrait-io#989 and only fixes the parameter substrait-io#989 wasn't looking at. To prevent regressions, this adds `tests/coverage/type_checker.py` — a symbolic unifier plus evaluator for the YAML return-formula mini-language (assignments, `min`/`max`, `cond ? a : b` ternary). The new module: - parses type strings like `decimal<P1,S1>`, `list<any2>`, `STRUCT<...>`, `func<any1 -> boolean?>` into tagged tuples; - unifies an impl-declared type against a concrete test type, binding variables like `P1`, `S1`, and polymorphic `any1`/`any2`; - evaluates multi-line return formulas (add/sub/mul/div/mod, min/max, sum, `any1?`, etc.) with the bindings and compares structurally to the test's declared result type. `FunctionOverload`/`FunctionVariant` now carry the raw YAML arg types and return formula alongside the existing short-form fingerprint. `FunctionRegistry.get_function` runs the strict check after the legacy loose match, so the wider test suite's base-type fallback still applies when the caller hasn't supplied full parameterized types. When the formula cannot be evaluated (unbound variable, unusual syntax), the strict check falls back to success, preserving compatibility with the current extensions and leaving room to tighten further. `test_type_checker.py` covers parsing, unification, formula evaluation for add/divide, the two specific bugs above, and the tolerant behavior for tests that omit optional decimal parameters (e.g. `power(dec, dec<38,0>)`).
Test cases now respect the nullability rules defined by each function's
nullabilityhandling in the extension YAML. MIRROR functions require the output to be marked nullable (?) iff any argument is nullable, and DECLARED_OUTPUT functions require the output nullability to match the declared return type.The Python test infrastructure now enforces these rules automatically via
test_substrait_nullability_consistency. This drove the ~105 test case line changes to add missing?markers on return types. The visitor was also fixed to stop silently dropping nullable markers for several types, and to distinguish outer nullability (list?<i32>) from inner element nullability (list<i32?>).DISCRETE nullability handling only validates output type for now — argument nullability matching is not yet implemented. This is fine since no extensions currently use DISCRETE.
Note: These changes were made with Claude's assistance. All code has been reviewed by me.