Skip to content

Conversation

@brian-arnold
Copy link
Collaborator

@brian-arnold brian-arnold commented Nov 10, 2025

Happy if this PR is just a conversation starter, and if the desired behavior should be implemented in the source code another way.

Prior to these changes in this PR, passing resource options to Ray did not work, as in the following example:

pipeline.sorting_FP.run(execution_engine = ray_engine, 
                        execution_engine_opts = {"num_cpus" : 15, "num_gpus" : 1})

I would receive an error message saying that whatever I'm passing to Ray is not a valid argument, and it would list valid arguments like num_cpus and num_gpus. I think the entire dict was likely getting passed instead of as kwargs, so I used ** to pass the contents of the dict as arguments. Works like a charm now.

@brian-arnold
Copy link
Collaborator Author

brian-arnold commented Nov 11, 2025

I made a second commit to fix a separate issue and forgot that these changes would be reflected here! But it is another bugfix. AI summary here:

Summary
Issue: PodNodeStream.run_async() fails to reload cached results on subsequent runs, throwing ValueError: Specified tag columns {...} are not present in the table.

Root Cause: In PodNodeStream.run_async(), when processing existing cached entries, the code incorrectly drops ALL columns from target_entries:

python# In run_async() - BUGGY:
existing = (
all_results.filter(pc.is_valid(pc.field("_exists")))
.drop_columns(target_entries.column_names) # ❌ Drops tag columns too!
.drop_columns(["_exists"])
)

This removes the tag columns (e.g., subject, session_date, probe_id), which are then expected when creating TableStream(existing, tag_columns=tag_keys).

The Fix: PodNodeStream.run() has the correct implementation that only drops specific system columns:

Copy link
Contributor

@eywalker eywalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Thanks for the fix.

@eywalker eywalker merged commit 54e6287 into walkerlab:main Nov 11, 2025
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants