You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We pass around most of our configuration in flyte via python @dataclass as this makes it very easy for us to manage configuration and such. Unfortunately, this also means that we have to write a ton of wrappers for the inputs + outputs of our flyte @task and @workflow's, as we are often dereferencing attributes to pass along to other inputs, but need to wrap them (in lists and other dataclasses). Something like:
While this doesn't add much to the above toy example, when the dataclasses have many more fields and depth, this gets messy fast. Making changes to a dataclass requires us to update all wrappers for it, the actual visualized graph becomes bloated with wrappers, and ease-of-use goes down.
It's also an issue when trying to wrap a dataclass promise's attribute in a container, such as:
@taskdefget_int() ->int:
return3@dataclassclassIntWrapper(DataClassJSONMixin):
x: int@taskdefget_wrapped_int() ->IntWrapper:
returnIntWrapper(x=3)
@taskdefsum_list(input_list: list[int]) ->int:
returnsum(input_list)
@workflowdefconvert_list_workflow1() ->int:
# This workflow is finepromised_int=get_int()
joined_list= [4, promised_int]
returnsum_list(input_list=joined_list)
@workflowdefconvert_list_workflow2() ->int:
# But this one is notwrapped_int=get_wrapped_int()
joined_list= [4, wrapped_int.x]
returnsum_list(input_list=joined_list)
Goal: What should the final outcome look like, ideally?
Overall I imagine the approach is to increase the level of support for Dataclass attributes across flytekit, doing more to:
Allow wrapping a dataclass promise attribute in a standard collection, such as some_task([task_output.x])
a. Importantly, retain the typing as defined by the dataclass, even when only receiving an attribute (issue for complex types and for ints)
Create tooling that allows flyte to use promises in the construction of a dataclass, treating dataclasses like other collection types which can be passed around as input to workflows and functions.
We could use try @eager execution mode for this, but that seems to change a lot of the semantics of flyte as well as not have widespread support just yet. Plus, it would require significant refactors throughout the codebase.
Propose: Link/Inline OR Additional context
I've taken first stabs at pieces of this:
[wip] Updating flytekit to handle dereferencing lists of promises (local) datologyai/flytekit#5 makes it possible locally to wrap dataclass promise attributes in simple containers like lists, dicts, and tuples, but I don't even know where to get started trying to get similar functionality working for remote. Given this only touches promise.translate_inputs_to_literals and base_task.local_execute, I think it probably has a significant way to go, but I don't know what I don't know. It's also possible that JSON IDL flytekit#2600 completely resolves this part of my issue, but I'd need to dig more there.
First pass at promise logic in dataclasses datologyai/flytekit#2 attempts to make it possible to treat dataclasses like python-native collections, updating binding_data_from_python_std to iterate over dataclasses that contain promises and resolve them using a BindingDataMap rather than a BindingData scalar. This almost works, but if the Promise's types are not primitive then during to_literal during serialization we may not have access to the non-primitive (in a similar issue to what the above attempts to resolve, and what likely causes [BUG] Accessing attributes fails on complex types #5427).
Overall I think this is a huge quality-of-life win for using Flyte with dataclasses, and I'm happy to actually work out the implementation, but I feel like I'm missing some pieces and context on the approach.
Are you sure this issue hasn't been raised already?
Motivation: Why do you think this is important?
We pass around most of our configuration in flyte via python
@dataclassas this makes it very easy for us to manage configuration and such. Unfortunately, this also means that we have to write a ton of wrappers for the inputs + outputs of our flyte@taskand@workflow's, as we are often dereferencing attributes to pass along to other inputs, but need to wrap them (inlists and otherdataclasses). Something like:doesn't end up working, as we can't use the attributes of a promise dataclass nested in containers as inputs.
Instead, we're forced to do things like:
While this doesn't add much to the above toy example, when the dataclasses have many more fields and depth, this gets messy fast. Making changes to a dataclass requires us to update all wrappers for it, the actual visualized graph becomes bloated with wrappers, and ease-of-use goes down.
It's also an issue when trying to wrap a dataclass promise's attribute in a container, such as:
Goal: What should the final outcome look like, ideally?
Overall I imagine the approach is to increase the level of support for Dataclass attributes across flytekit, doing more to:
some_task([task_output.x])a. Importantly, retain the typing as defined by the dataclass, even when only receiving an attribute (issue for complex types and for
ints)I think this would also resolve issues like #5427
Describe alternatives you've considered
We could use try
@eagerexecution mode for this, but that seems to change a lot of the semantics of flyte as well as not have widespread support just yet. Plus, it would require significant refactors throughout the codebase.Propose: Link/Inline OR Additional context
I've taken first stabs at pieces of this:
promise.translate_inputs_to_literalsandbase_task.local_execute, I think it probably has a significant way to go, but I don't know what I don't know. It's also possible that JSON IDL flytekit#2600 completely resolves this part of my issue, but I'd need to dig more there.binding_data_from_python_stdto iterate over dataclasses that contain promises and resolve them using aBindingDataMaprather than aBindingDatascalar. This almost works, but if thePromise's types are not primitive then duringto_literalduring serialization we may not have access to the non-primitive (in a similar issue to what the above attempts to resolve, and what likely causes [BUG] Accessing attributes fails on complex types #5427).Overall I think this is a huge quality-of-life win for using Flyte with dataclasses, and I'm happy to actually work out the implementation, but I feel like I'm missing some pieces and context on the approach.
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?