Skip to content

feat(xorq): autocleaning + interpreter version with 5 ported commands#767

Open
paddymul wants to merge 1 commit into
mainfrom
feat/xorq-autoclean-interpreter
Open

feat(xorq): autocleaning + interpreter version with 5 ported commands#767
paddymul wants to merge 1 commit into
mainfrom
feat/xorq-autoclean-interpreter

Conversation

@paddymul
Copy link
Copy Markdown
Collaborator

Summary

  • XorqAutocleaning now flows through the same configure_buckaroo lisp interpreter that pandas/polars use, instead of a dict-dispatched _xorq_search shortcut. The override on handle_ops_and_clean is gone; the parent's full pipeline (quick-ops → merge → interpret → make_origs → code gen) handles ibis exprs unchanged.
  • Five commands ported into buckaroo/customizations/xorq_commands.pyNoOp, DropCol, FillNA, DropDuplicates, Search — wired up via a new NoCleaningConfXorq in xorq_autoclean_conf.py (matching the pandas/polars conf modules).
  • One upstream touch: buckaroo_transform in jlisp/configure_utils.py grew a third copy branch — pandas → .copy(), polars → .clone(), ibis exprs → pass-through (immutable, transforms return new exprs).

Test plan

  • uv run --extra xorq pytest tests/unit/test_xorq_commands.py — 9 new tests covering each command via the interpreter, a two-op pipeline, and conf registration
  • uv run --extra xorq pytest tests/unit/test_xorq_buckaroo_widget.py — 37 existing widget tests still green (Search regression via the new path)
  • uv run --extra xorq pytest tests/unit/ --ignore=tests/unit/contrib --ignore=tests/unit/file_cache — full unit suite (935 pass), confirming pandas/polars interpreter paths unaffected
  • uv run ruff check on touched files — clean

🤖 Generated with Claude Code

XorqAutocleaning previously sidestepped the lisp interpreter and ran a
single dict-dispatched _xorq_search handler. Route it through the same
configure_buckaroo interpreter that pandas/polars use, and port four
more commands (NoOp, DropCol, FillNA, DropDuplicates) into
customizations/xorq_commands.py.

The interpreter's df_copy fork in jlisp/configure_utils.py grew a third
branch: pandas → .copy(), polars → .clone(), ibis exprs → passthrough.
Ibis expressions are immutable, so transforms must return a new expr
anyway and a defensive copy is both unavailable and unnecessary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.2.dev26000095273

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.2.dev26000095273

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.14.2.dev26000095273" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

📖 Docs preview

🎨 Storybook preview

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 92b652db6e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +51 to +52
def transform_to_py(expr, col):
return f" expr = expr.drop('{col}')"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Generate runnable code for xorq transforms

When a user applies this command, the inherited code generator wraps these snippets as def clean(df): ... return df in configure_utils.buckaroo_to_py, but the new xorq snippets assign to expr without ever defining it. For example, a drop-column op generates a function that raises UnboundLocalError at expr = expr.drop(...) instead of providing usable generated code in operation_results['generated_py_code']; the same pattern appears in the other non-noop xorq commands.

Useful? React with 👍 / 👎.

paddymul added a commit that referenced this pull request May 17, 2026
Polars commands previously ran eager: each lisp op materialised a new
DataFrame, so an N-op cleaning pipeline paid N times for what polars's
query optimiser can fuse into one plan. Switch the polars autoclean conf
to thread a LazyFrame through the interpreter and collect once at exit.

- `AutocleaningConfig` grows two staticmethod hooks, `lazy_enter` and
  `lazy_exit`, defaulting to identity. Pandas inherits unchanged; xorq
  (when #767 lands) inherits the no-op default — ibis exprs are already
  lazy, so the unified pattern fits both dialects.
- `NoCleaningConfPl` overrides with `df.lazy()` on entry and
  `df.collect() if isinstance(df, pl.LazyFrame) else df` on exit. The
  isinstance guard handles `GroupBy.transform`, which materialises
  mid-pipeline; anything downstream of a groupby runs eager and the
  exit becomes a no-op.
- `_run_df_interpreter` wraps the interpreter call with the hooks. The
  no-op short-circuit fires *before* lazy_enter, preserving the
  by-reference identity contract the traitlets/anywidget init path
  depends on.
- `Search.transform` switches to `df.collect_schema()` to avoid polars's
  PerformanceWarning when handed a LazyFrame.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant