You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: expose variety of features from DF54 update (#1554)
* refactor: migrate FFI example table function to call_with_args
DataFusion 53 deprecated `TableFunctionImpl::call(args: &[Expr])` in
favor of `call_with_args(args: TableFunctionArgs)`. `PyTableFunction`
was migrated in 5a64b0d; this brings the FFI example along so it no
longer relies on the deprecated entry point.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: type SessionContext codec setters with exportable Protocols
PR #1541 introduced `with_logical_extension_codec` /
`with_physical_extension_codec` setters typed as `codec: Any`. The Rust
extractors accept either a raw `PyCapsule` or any object exposing
`__datafusion_logical_extension_codec__` /
`__datafusion_physical_extension_codec__`.
Add `LogicalExtensionCodecExportable` / `PhysicalExtensionCodecExportable`
Protocols in `python/datafusion/user_defined.py` (matching the existing
`ScalarUDFExportable` pattern) and tighten both setter signatures to
`Protocol | _PyCapsule`. Pure typing change; no runtime behavior diff.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: accept variadic field path in get_field
Upstream exposes both `get_field(expr, name)` and
`get_field_path(expr, [names...])`, but both ultimately call the same
scalar UDF with a base expression plus one or more name args. Collapse
the Python surface into a single variadic `get_field(expr, *names)`
that accepts either a one-step lookup or a path of names, dispatching
through a single Rust binding.
Note in `.ai/skills/check-upstream/SKILL.md` that `get_field_path` is
covered by the variadic form so future audits do not flag it as a gap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: SessionContext.read_batches / read_batch
Wrap upstream `SessionContext::read_batches`, which materializes a
DataFrame directly from a sequence of `RecordBatch`es without
registering a named table. The single-batch convenience
`SessionContext.read_batch` is implemented in pure Python by calling
`read_batches([batch])`, so the Rust side only needs the one binding.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: SessionContext UDF lookup helpers
Expose `udf(name)` / `udaf(name)` / `udwf(name)` lookups symmetric with
the existing `register_udf` / `register_udaf` / `register_udwf` setters,
plus `udfs()` / `udafs()` / `udwfs()` for enumerating registered
function names. Looked-up functions come back as the same
`ScalarUDF` / `AggregateUDF` / `WindowUDF` wrappers users already get
from registration, so they can be called as expressions or re-registered
into a different session.
Returns Vec<String> from the list helpers (sorted) rather than the raw
HashSet upstream returns, so calling code gets a stable ordering.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* bump pre-commit so it stops failing CI checks
* test: drop xfail on timestamp[s] parquet roundtrip
pyarrow.parquet promotes timestamp[s] to timestamp[ms] on write (apache/arrow#41382),
so the read array never matched the input. Cast the expected array to timestamp[ms]
in test_simple_select to assert DataFusion reads what Arrow actually stored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: capture deprecation warning in repr_rows conflict case
DataFrameHtmlFormatter(repr_rows=..., max_rows=...) fires the deprecation
warning before raising ValueError, but pytest.raises does not catch warnings.
The escaping warning surfaced in every pytest run. Wrap the call in both
pytest.raises and pytest.warns so the warning is asserted, not leaked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(udf): document SessionContext UDF lookup with worked examples
Add Examples docstrings (doctest) for `udf` / `udaf` / `udwf` / `udfs` /
`udafs` / `udwfs` that demonstrate the lookup pattern, including a
late-binding example where the function name comes from configuration.
Add tests covering config-driven dispatch and built-in UDAF / UDWF
lookup so the documented patterns are exercised end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(udf): raise KeyError on UDF/UDAF/UDWF lookup miss
`SessionContext.udf` / `udaf` / `udwf` previously surfaced upstream
`DataFusionError::Plan` as a generic exception whose message ("There
is no UDF named ...") is set by DataFusion and can drift between
releases. Pre-check membership via `udfs()` / `udafs()` / `udwfs()`
and raise `PyKeyError` on miss so callers get the Pythonic
dict-style lookup behavior and tests are no longer coupled to the
upstream wording.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(udf): add _from_internal classmethod to UDF wrappers
`SessionContext.udf` / `udaf` / `udwf` previously constructed wrapper
objects by calling `__new__` directly and writing the private `_udf`
/ `_udaf` / `_udwf` attribute from outside the owning module. Three
near-identical blocks coupled `context.py` to wrapper internals.
Add a `_from_internal` classmethod on each wrapper that takes an
already-constructed `df_internal` handle and returns a wrapper
without re-running `__init__`. The lookup methods now collapse to a
single call, the `__new__` bypass is documented on the wrapper class
itself, and renaming the private field is a one-spot edit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: widen SessionContext.read_batches to accept any iterable
The underlying PyArrow FFI extractor for `Vec<RecordBatch>` requires a
Python `list`, so the previous `list[pa.RecordBatch]` annotation was
accurate but unnecessarily strict. Accept any
`Iterable[pa.RecordBatch]` on the Python side and materialize to a
list before crossing the FFI boundary so callers can pass generators,
tuples, or other iterables without manual conversion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(context): trim codec docstrings, reference Exportable protocols
Drop prose restatement of the type union for `with_logical_extension_codec`
and `with_physical_extension_codec`. Keep the dunder name (not visible from
the type hint) and cross-link the `LogicalExtensionCodecExportable` /
`PhysicalExtensionCodecExportable` protocols so Sphinx resolves them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(udf): drop return-type cross-refs in udf/udaf/udwf docstrings
The `:py:class:` link back to the wrapper class shadowed the return type
annotation and risked drifting if the class were moved. Replace with a
plain backtick literal; surrounding contract prose is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(functions): use F alias in get_field doctest
The doctest namespace already imports `datafusion.functions as F`,
making `F.named_struct` / `F.get_field` shorter than the
`dfn.functions.*` form.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: .ai/skills/check-upstream/SKILL.md
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,11 +66,17 @@ The user may specify an area via `$ARGUMENTS`. If no area is specified or "all"
66
66
- Python API: `python/datafusion/functions.py` — each function wraps a call to `datafusion._internal.functions`
67
67
- Rust bindings: `crates/core/src/functions.rs` — `#[pyfunction]` definitions registered via `init_module()`
68
68
69
+
**Evaluated and not requiring separate Python exposure:**
70
+
-`get_field_path` — already covered by `get_field(expr, *names)`, which takes a
71
+
variadic field path and dispatches to the same underlying
72
+
`functions::core::get_field` UDF as the upstream `get_field_path` helper.
73
+
69
74
**How to check:**
70
75
1. Fetch the upstream scalar function documentation page
71
76
2. Compare against functions listed in `python/datafusion/functions.py` (check the `__all__` list and function definitions)
72
77
3. A function is covered if it exists in the Python API — it does NOT need a dedicated Rust `#[pyfunction]`. Many functions are aliases that reuse another function's Rust binding.
73
-
4. Only report functions that are missing from the Python `__all__` list / function definitions
78
+
4. Check against the "evaluated and not requiring exposure" list before flagging as a gap
79
+
5. Only report functions that are missing from the Python `__all__` list / function definitions
0 commit comments