feat: add lambda expression support#710
Conversation
benbellick
left a comment
There was a problem hiding this comment.
Only just started looking at this but flushing a few comments just to get the ball rolling. On the whole this is looking good though, nice work!
core/src/main/java/io/substrait/expression/proto/ProtoExpressionConverter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/substrait/expression/proto/ProtoExpressionConverter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/substrait/expression/proto/ProtoExpressionConverter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/substrait/expression/proto/ProtoExpressionConverter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/substrait/expression/proto/ProtoExpressionConverter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/substrait/expression/proto/ProtoExpressionConverter.java
Outdated
Show resolved
Hide resolved
|
So I was trying to test this by reading in the test plans from https://github.com/substrait-io/substrait-go/tree/main/expr/testdata/lambda, but then I realized that we were not loading in the functions_list.yaml extensions, and then I realize that we cannot load the extensions because the ANTLR parser hasn't been updated. I made a PR to do just that, because it's very out of date and required a little extra care #728. Once that's in, I can use the test vectors to help verify this work. We also need that PR anyways, because we can't load any of the functions that use lambdas without it 😓 |
Note that these had to be tweaked slightly because they were not entirely valid. This will be fixed in substrait-go.
|
I made a PR into this PR as part of understanding how it works, and fixing a small gap when loading Substrait plans with lambdas: limameml#1 |
test: add lambda plan roundtrip tests
vbarua
left a comment
There was a problem hiding this comment.
Started taking a more thorough look and found a couple of others things. I'll continue reviwing this tomorrow.
examples/substrait-spark/src/main/java/io/substrait/examples/util/ExpressionStringify.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/substrait/expression/proto/ProtoExpressionConverter.java
Outdated
Show resolved
Hide resolved
vbarua
left a comment
There was a problem hiding this comment.
I did leave a couple of very minor suggestions, and I think that there is one check worth enforcing in #710 (comment).
This is a solid pass at integrating lambdas end-to-end, thanks for working on in @limameml. If you can take a look at the above I think we can get this merged this week.
cc: @bestbeforetoday is there anything else you'd like to see in this PR
core/src/test/java/io/substrait/type/proto/LambdaExpressionRoundtripTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/LambdaExpressionRoundtripTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/LambdaExpressionRoundtripTest.java
Outdated
Show resolved
Hide resolved
benbellick
left a comment
There was a problem hiding this comment.
Sorry to block this, I know that you have already gotten an approval. I just want to make sure that this is addressed before merging. Thanks!
core/src/test/java/io/substrait/type/proto/LambdaExpressionRoundtripTest.java
Outdated
Show resolved
Hide resolved
…ameter references Introduces LambdaBuilder, a context-aware builder that maintains a lambda parameter stack (lambdaContext) to validate parameter references at build time. Nested lambdas use the same builder, ensuring stepsOut is computed automatically. Mirrors the lambdaContext pattern from substrait-go.
|
@limameml and I discussed offline some of the difficulties of validating lambdas during construction. One of the difficulties discussed is the inability to thread shared context through the immutables interface to track lambda parameters. I have a draft PR here to try and introduce a new builder method to handle tracking this shared context. |
Moves ProtoExpressionConverter to use LambdaBuilder for lambda parameter validation, removing the private LambdaParameterStack. Replaces builder-based roundtrip tests with parameterized JSON fixtures in expressions/lambda/. Adds arithmetic body test to isthmus LambdaExpressionTest.
…ment depth-capture mechanism
…lidation-impl feat(core): add LambdaBuilder for build-time lambda validation
…d any_match in function mappings
vbarua
left a comment
There was a problem hiding this comment.
Thanks for you additions @benbellick, they make sense to me.
I did identify one minor minor API improvement based on tests you added, but otherwise this is looking good to me.
benbellick
left a comment
There was a problem hiding this comment.
LGTM! I left some comments in some places but after those are resolved, we can merge.
Also, I will say that I really just skimmed the isthmus stuff. I'm really quite shaky on my calcite knowledge, so I will defer to @vbarua's approval for that one. Thanks for seeing this one through!
core/src/test/java/io/substrait/expression/LambdaBuilderTest.java
Outdated
Show resolved
Hide resolved
isthmus/src/main/java/io/substrait/isthmus/expression/FunctionMappings.java
Show resolved
Hide resolved
Change parameter order from (paramIndex, lambdaParamsType, stepsOut) to (stepsOut, paramIndex, lambdaParamsType) so the long type parameter is at the end, improving readability at call sites.
Clarify that this method does not validate stepsOut and that callers should use LambdaBuilder.Scope.ref() for validated lambda construction.
Add LambdaBuilder.newParameterReference() as the public API for creating validated lambda parameter references. This ensures stepsOut is always validated against the current lambda nesting context. ProtoExpressionConverter now uses this method instead of calling FieldReference.newLambdaParameterReference() directly.
Instead of passing the full Type.Struct and doing the field lookup internally, pass the already-extracted Type. This simplifies the method and moves the bounds checking to the caller. Also rename parameter to 'knownType' for consistency with other FieldReference factory methods.
|
I added one more small change, which was to make The justification is that making it public means people can construct invalid lambda references. I could forsee cases where people want to be able to do this, but it is easy enough to add in the future, but is a breaking change if we take it away. So I figure it is okay to just remove it and add it in the future if needed. |
|
I added two more tests just to assure myself that we can see an easy example of building lambdas with function calls in the body, and that we reject protos which contain out-of-bounds param references. Since I didn't introduce any code changes, and only added more tests, I will consider the previous PR still valid. Sorry that I am being a bit detailed with these tests 😮💨 |
This PR is to add support for the Lambda Expression
Summary
Test plan
closes #687