add mypy-based validation, drop syntactic check on return type#536
add mypy-based validation, drop syntactic check on return type#536
Conversation
eb8680
left a comment
There was a problem hiding this comment.
This is nice, but I'm worried about soundness because bugs here will silently cause code generation to fail for no good reason.
It would be ideal if we could define one singledispatch-extensible function quote_type replacing/subsuming _type_to_annotation_str and _collect and an equation it satisfies together with eval/exec (and with the typechecker API), and have it live in effectful.internals.unification with tons of parameterized tests covering corner cases, similar to the other generic type-munging functions there.
|
Does this also address #437? |
|
Taking a pass through the comments now and working on this! |
|
addressed the simple issues, I think as suggested it'd make sense to return an ast.functiondef or an ast instead of constructing strings plus testing a bit more systematically, so will do that now. |
|
The notebook tests are still failing with what appears to be a bug in the type generation here. This suggests that our testing strategy for this PR is still flawed - why is this failure mode not showing up in any of the dozens of new unit tests? What can we do to be more confident that this machinery is sound and will not generate false negatives for arbitrary LLM generated code? |
That's a fair point. The current implementation works by constructing a typing context prelude for the file from a lexical context by: The reason for the bug above is that The fix I think is to restructure how the imports are generated - not just from the lexical context, but also from the union of all types of the values in the context (and recursively for parameters). As for why the current unit tests aren't catching them, I need to think of a good way to exhibit this behaviour in tests. |
This sounds pretty complicated. Maybe instead of trying to generate imports from usage we can just look at |
|
I think including |
|
ah but I see your point @eb8680 including sys.modules as well should ensure that all types of values would be present. Though sys.modules is really big. |
|
Updated to use |
|
Can we prune the |
|
Hmm, I can't think of any way of pruning better than just only keeping the modules used in types of runtime values, which I guess would be equivalent to the union plan I mentioned initially. |
|
What about pruning automatically with |
|
@eb8680 oh yes! good point, we could totally do that! |
…ngs with un-representable types
df8bd01 to
2e86880
Compare
|
The notebook is still broken with |
|
Tests pass!!!!!!!! |
|
@eb8680 tests pass! (hopefully test/build too) re-reading your comments, I was also thinking about making
|
eb8680
left a comment
There was a problem hiding this comment.
I don't see any tests covering typing.Annotated, which I suspect is not handled correctly
|
ah, will add! |
eb8680
left a comment
There was a problem hiding this comment.
I'm sure we'll find more edge cases as we use this, but I think it's good to merge for now.
|
Awesome, sounds good! Will merge once the last tests pass! |
This PR updates the internals of
EncodableCallablesuch thatmypybased typechecking is done on the source code generated by the LLM. The code uses thectx: Mapping[str, Any]to inject appropriate imports and stubs such that the code should type check.This replaces and removes the syntactic check on the ast on the return type where previously we were just syntactically checking the annotations on the returned function which had no guarantee of correctness.
The core interface to the type checking is through this function in
effectful.handlers.llm.type_checking:Closes #535