feat(pandas-search): plumb search term to JS as highlight_phrase via SDResult#764
Conversation
Pins the pandas equivalent of #758 polars Search → SDResult, but using `highlight_phrase` (list) rather than `highlight_regex` (string) — pandas search_df_str uses literal `Series.str.find`, so a phrase match on the JS side matches the actual filter semantics. - tests/unit/dataflow/autocleaning_pd_test.py: unit-level — Search contributes `highlight_phrase` keyed by the renamed (a/b) column under PandasAutocleaning, with the rekey running over `cleaning_sd` so the orig-named entry merges into the internal letter key. - tests/unit/dataflow/customizable_dataflow_test.py: end-to-end through BuckarooInfiniteWidget with NoCleaningConf — a `search` op should land `highlight_phrase` in `displayer_args` for each string column and skip the numeric column. Both fail today: pandas `Search.transform` still returns a bare df. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a35323cd6f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| _cleaned, cleaning_sd, _gen, _ops = ac.handle_ops_and_clean( | ||
| df, cleaning_method='', quick_command_args={}, existing_operations=[search_op]) | ||
|
|
||
| assert cleaning_sd.get('a', {}).get('highlight_phrase') == ['pizza'] |
There was a problem hiding this comment.
Include the pandas Search SDResult implementation
This assertion is unreachable with the code in this commit because only tests changed: Search.transform in buckaroo/customizations/pandas_commands.py still returns the bare search_df_str(df, val) result, so no SDResult metadata is merged and cleaning_sd remains empty. I verified the new tests with .venv/bin/python -m pytest tests/unit/dataflow/autocleaning_pd_test.py::test_search_threads_highlight_phrase_into_cleaning_sd_under_rename tests/unit/dataflow/customizable_dataflow_test.py::test_search_op_delivers_highlight_phrase_into_displayer_args -q; both fail, blocking CI until the pandas Search implementation is added.
Useful? React with 👍 / 👎.
📦 TestPyPI package publishedpip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.2.dev25999757528or with uv: uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.2.dev25999757528MCP server for Claude Codeclaude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.14.2.dev25999757528" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table📖 Docs preview🎨 Storybook preview |
…SDResult Mirrors #758 for the pandas backend. Pandas `Search.transform` now returns `SDResult(filtered_df, sd_updates)` so the search term flows into `cleaning_sd` as `highlight_phrase` on every string/object column. Together with the existing `style_column` reader (added in #758) the phrase lands in the string `displayer_args`, where the JS-side displayer already renders matches as `<mark>`. Uses `highlight_phrase` (list of literal needles) rather than the `highlight_regex` (single regex string) variant polars emits because `search_df_str` uses `Series.str.find` — a literal substring match. Matching the filter semantics on the highlight side avoids the case where a search term containing regex metacharacters would filter on literal text but try to highlight as a regex. The string-column detection mirrors `search_df_str`: union of `select_dtypes("string")` and `select_dtypes("object")` columns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ruff F401 on CI. SDResult was imported speculatively for the failing test but never used (only Search is referenced directly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Mirrors #758 for the pandas backend. Pandas
Search.transformnow returnsSDResult(filtered_df, sd_updates)so the search term flows intocleaning_sdashighlight_phraseon every string/object column. The existingDefaultMainStyling.style_columnreader (landed in #758) threads it into the stringdisplayer_args, where the JS side already renders matches as<mark>.Uses
highlight_phrase(list of literal needles) rather than thehighlight_regexvariant polars emits becausesearch_df_strusesSeries.str.find— a literal substring match. Matching the filter semantics on the highlight side avoids the case where a search term containing regex metacharacters would filter on literal text but try to highlight as a regex.String-column detection mirrors
search_df_strexactly: union ofselect_dtypes("string")andselect_dtypes("object")columns.Test plan
pytest tests/unit/dataflow/autocleaning_pd_test.py tests/unit/dataflow/customizable_dataflow_test.py tests/unit/commands/pandas_commands_test.py— passes locallyTests added
test_search_threads_highlight_phrase_into_cleaning_sd_under_rename(autocleaning_pd_test.py) — unit-level wiring of pandas Search → SDResult →_rekey_op_sd_to_internal. Asserts the orig-named entry (businessname) is rekeyed onto its internal letter (a) and the integer column (b/rating) gets no highlight.test_search_op_delivers_highlight_phrase_into_displayer_args(customizable_dataflow_test.py) — end-to-end throughBuckarooInfiniteWidgetwithNoCleaningConf. Sets an operation on the dataflow and assertshighlight_phrase == ['area']lands indisplayer_argsfor both string columns, skipping the numeric column.TDD: failing-tests commit was pushed first; CI run on that commit will be visible failing before the implementation commit lands.
🤖 Generated with Claude Code