Skip to content

feat: add lambda expression support#710

Open
limameml wants to merge 15 commits intosubstrait-io:mainfrom
limameml:limame.malainine/add-lambda-support
Open

feat: add lambda expression support#710
limameml wants to merge 15 commits intosubstrait-io:mainfrom
limameml:limame.malainine/add-lambda-support

Conversation

@limameml
Copy link

This PR is to add support for the Lambda Expression

Summary

  • Add lambda expression support to substrait-java core and isthmus
  • Implement Type.Func for function types
  • Add Expression.Lambda and lambda parameter references
  • Full proto round-trip support
  • Calcite (isthmus) integration (single-level lambdas only)

Test plan

  • Added LambdaExpressionRoundtripTest in core
  • Added LambdaExpressionTest in isthmus

closes #687

@CLAassistant
Copy link

CLAassistant commented Feb 24, 2026

CLA assistant check
All committers have signed the CLA.

@benbellick benbellick self-requested a review February 24, 2026 17:00
Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only just started looking at this but flushing a few comments just to get the ball rolling. On the whole this is looking good though, nice work!

private final Type.Struct rootType;
private final ProtoTypeConverter protoTypeConverter;
private final ProtoRelConverter protoRelConverter;
private final List<Type.Struct> lambdaParameterStack = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a stack here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to access lambdaIndex = lambdaParameterStack.size() - 1 - stepsOut here: lambdaParameters = lambdaParameterStack.get(lambdaIndex); when building the Lambda parametre ref that's why I didn't use a stack

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. Java's stack does extend Vector though so you still get the get method. I'm not so particular on this one.

The only thing I will say is I do think that the whole stack parameter stuff can be confusing for people without a little bit of PL experience, which is why I pushed @Slimsammylim to encapsulate some of that logic and put it in docstring comments. I could see a case for encapsulating this logic in a local private class, though it probably makes more sense to do this only if this logic comes up again.

I expected to see something like this logic elsewhere, as you need to do it to do validation when building lambdas. Though I didn't find anything like this. Does the builder for lambdas in this PR actually validate that the lambda is semantically correct?

Copy link
Member

@bestbeforetoday bestbeforetoday Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be nice for clarity to encapsulate this quite specific behaviour in a class (single responsibility principle). It doesn't need to extend one of the full collection interfaces, just provide the methods you need:

  • void push(Type.Struct value)
  • Type.Struct pop() (throw runtime NoSuchElementException if empty)
  • Type.Struct get(int stepsOut) (throw runtime IndexOutOfBoundsException on invalid stepsOut)

Or make it generic if you will reuse it for other types. The caller doesn't need to worry about clever indexing logic or what collection implementation is used internally; just make simple method calls that clearly describe the intent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @benbellick, for your last point I don't quite understand what validation I should add, doesn't the validation happen when building the LAMBDA_PARAMETER_REFERENCE?


Type.Struct lambdaParameters = lambdaParameterStack.get(lambdaIndex);
return FieldReference.newLambdaParameterReference(
reference.getDirectReference().getStructField().getField(),
Copy link

@Slimsammylim Slimsammylim Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I'm curious about is why we use reference.getDirectReference().getStructField().getField() here instead of getDirectReferenceSegments(reference.getDirectReference()) like the ROOT_REFERENCE case does.

I'm unfamiliar with substrait-java, but does this current implementation support nested field access through lambda parameters?

*This may be an unimportant issue especially if it's not common to access nested fields in lambdas

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the ofRoot method takes the struct itself while the newLambdaParameterReference takes the index, like the newRootStructOuterReference methode

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I am still confused as to whether this supports nested field access through lambda parameters. I messed around w some tests through claude, and I believe it is unsupported. However, this is a nit and I'm not sure it will come up often in practice, so we can come back to this later if it ends up being an issue!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still confused as to whether this supports nested field access through lambda parameters.

If by nested access you mean accessing nested fields of a lambda parameter inside the body, for example
($0) -> $0.a.[0].hello then I think that is broken here actually. That might be fine for now and we can treat it is a follow-up item, but we should detect if we have a nested field lookup happening by checking to to content of reference.getDirectReference().getStructField().getChild(); If it's empty, there is only a direct reference to the lambda param. If it's not empty we can blow up.

To @Slimsammylim's we point, to handle this correctly we would need to use getDirectReferenceSegments(reference.getDirectReference()) to create the list of reference segments and then re-use some of the other constructors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I will add a check and go back to it in a follow up pr.

@vbarua
Copy link
Member

vbarua commented Mar 6, 2026

So I was trying to test this by reading in the test plans from https://github.com/substrait-io/substrait-go/tree/main/expr/testdata/lambda, but then I realized that we were not loading in the functions_list.yaml extensions, and then I realize that we cannot load the extensions because the ANTLR parser hasn't been updated.

I made a PR to do just that, because it's very out of date and required a little extra care #728. Once that's in, I can use the test vectors to help verify this work. We also need that PR anyways, because we can't load any of the functions that use lambdas without it 😓

@vbarua
Copy link
Member

vbarua commented Mar 10, 2026

I made a PR into this PR as part of understanding how it works, and fixing a small gap when loading Substrait plans with lambdas: limameml#1

Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started taking a more thorough look and found a couple of others things. I'll continue reviwing this tomorrow.

Comment on lines +772 to +773
// TO DO: fix Lambda return type once this issue
// https://github.com/substrait-io/substrait/issues/976 is resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TO DO: fix Lambda return type once this issue
// https://github.com/substrait-io/substrait/issues/976 is resolved
// TODO: Type.Func nullability is hardcoded to false here because the spec does not allow for
// declaring otherwise.
// See: https://github.com/substrait-io/substrait/issues/976

Being a bit more precise about what the issue is, and what the implications are in the code.

Comment on lines +248 to +249
return type.parameterTypes().stream().mapToInt(p -> p.accept(this)).sum()
+ type.returnType().accept(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This visitor is used to determine how many names are needed for a ReadRel based on the schema.

For example, a table with row type struct<i8, i16, i32> needs 3 names. A table like struct<i8, struct<i16, i32>, i64> needs 5 names, because in Substrait we name every struct field. For full details see https://substrait.io/types/named_structs/#determining-names

Now, given something like struct<i8, func<...>, i32> we don't need to look inside the func for names. The type of the column is just func, and none of the parameters to the func can be reference by name.

All of this is to say that this can just be 0.

Suggested change
return type.parameterTypes().stream().mapToInt(p -> p.accept(this)).sum()
+ type.returnType().accept(this);
return 0;

@Override
public String visit(Expression.Lambda expr, EmptyVisitationContext context)
throws RuntimeException {
return "<Lambda>";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: extract for consistency with the other methods

Suggested change
return "<Lambda>";
return "<Lambda >";


Type.Struct lambdaParameters = lambdaParameterStack.get(lambdaIndex);
return FieldReference.newLambdaParameterReference(
reference.getDirectReference().getStructField().getField(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still confused as to whether this supports nested field access through lambda parameters.

If by nested access you mean accessing nested fields of a lambda parameter inside the body, for example
($0) -> $0.a.[0].hello then I think that is broken here actually. That might be fine for now and we can treat it is a follow-up item, but we should detect if we have a nested field lookup happening by checking to to content of reference.getDirectReference().getStructField().getChild(); If it's empty, there is only a direct reference to the lambda param. If it's not empty we can blow up.

To @Slimsammylim's we point, to handle this correctly we would need to use getDirectReferenceSegments(reference.getDirectReference()) to create the list of reference segments and then re-use some of the other constructors.

Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did leave a couple of very minor suggestions, and I think that there is one check worth enforcing in #710 (comment).

This is a solid pass at integrating lambdas end-to-end, thanks for working on in @limameml. If you can take a look at the above I think we can get this merged this week.

cc: @bestbeforetoday is there anything else you'd like to see in this PR

*/
class LambdaExpressionRoundtripTest extends TestBase {

/** Test that lambdas with no parameters are valid. Building: () -> i32(42) : func<() -> i32> */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The parenthesis make it look like a cast to me 🤷

Suggested change
/** Test that lambdas with no parameters are valid. Building: () -> i32(42) : func<() -> i32> */
/** Test that lambdas with no parameters are valid. Building: () -> 42 : func<() -> i32> */

FieldReference invalidRef = FieldReference.newLambdaParameterReference(0, params, 1);

Expression.Lambda lambda =
Expression.Lambda.builder().parameters(params).body(invalidRef).build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this throw an exception? Perhaps I am misunderstanding what the builder does, but don't we expect the builder to validate that we built a correct lambda? Why are we able to build it, and convert it to a proto, but not convert it back?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this issue will need to be resolved before merging. Sorry to be a blocker! It doesn't make sense to me that we would be able to make an invalid proto and only validate on the way back, unless I am misunderstanding something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/substrait-io/substrait-go/blob/48bbdb43eeccc61c7adfecc3549a1b004fd52fd8/expr/lambda_test.go#L148-L165

^ You can see an example here of how in substrait-go the builder throws an error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point Ben. It would make sense for these to fail, especially given that we're already claiming them as validation errors that should fail.

Expression.Lambda.builder().parameters(outerParams).body(innerLambda).build();

// Convert to proto
io.substrait.proto.Expression protoExpression = expressionProtoConverter.toProto(outerLambda);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here as above I think.

Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to block this, I know that you have already gotten an approval. I just want to make sure that this is addressed before merging. Thanks!

FieldReference invalidRef = FieldReference.newLambdaParameterReference(0, params, 1);

Expression.Lambda lambda =
Expression.Lambda.builder().parameters(params).body(invalidRef).build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this issue will need to be resolved before merging. Sorry to be a blocker! It doesn't make sense to me that we would be able to make an invalid proto and only validate on the way back, unless I am misunderstanding something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support lambdas

6 participants