From a657f7d3e9d4114ba333241e19cfc5e9cc2d3b0f Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Sun, 29 Mar 2026 11:56:52 -0400 Subject: [PATCH 01/14] Initial commit for skill to check upstream repo --- .claude/skills/check-upstream/SKILL.md | 205 +++++++++++++++++++++++++ 1 file changed, 205 insertions(+) create mode 100644 .claude/skills/check-upstream/SKILL.md diff --git a/.claude/skills/check-upstream/SKILL.md b/.claude/skills/check-upstream/SKILL.md new file mode 100644 index 000000000..0a20df64b --- /dev/null +++ b/.claude/skills/check-upstream/SKILL.md @@ -0,0 +1,205 @@ +--- +name: check-upstream +description: Check if upstream Apache DataFusion features (functions, DataFrame ops, SessionContext methods) are exposed in this Python project. Use when adding missing functions, auditing API coverage, or ensuring parity with upstream. +argument-hint: [area] (e.g., "scalar functions", "aggregate functions", "window functions", "dataframe", "session context", "all") +--- + +# Check Upstream DataFusion Feature Coverage + +You are auditing the datafusion-python project to find features from the upstream Apache DataFusion Rust library that are **not yet exposed** in this Python binding project. Your goal is to identify gaps and, if asked, implement the missing bindings. + +## Areas to Check + +The user may specify an area via `$ARGUMENTS`. If no area is specified or "all" is given, check all areas. + +### 1. Scalar Functions + +**Upstream source of truth:** +- Rust docs: https://docs.rs/datafusion/latest/datafusion/functions/index.html +- User docs: https://datafusion.apache.org/user-guide/sql/scalar_functions.html + +**Where they are exposed in this project:** +- Python API: `python/datafusion/functions.py` — each function wraps a call to `datafusion._internal.functions` +- Rust bindings: `crates/core/src/functions.rs` — `#[pyfunction]` definitions registered via `init_module()` + +**How to check:** +1. Fetch the upstream scalar function documentation page +2. Compare against functions listed in `python/datafusion/functions.py` (check the `__all__` list) +3. Also check `crates/core/src/functions.rs` for what's registered in `init_module()` +4. Report functions that exist upstream but are missing from this project + +### 2. Aggregate Functions + +**Upstream source of truth:** +- Rust docs: https://docs.rs/datafusion/latest/datafusion/functions_aggregate/index.html +- User docs: https://datafusion.apache.org/user-guide/sql/aggregate_functions.html + +**Where they are exposed in this project:** +- Python API: `python/datafusion/functions.py` (aggregate functions are mixed in with scalar functions) +- Rust bindings: `crates/core/src/functions.rs` + +**How to check:** +1. Fetch the upstream aggregate function documentation page +2. Compare against aggregate functions in `python/datafusion/functions.py` +3. Report missing aggregate functions + +### 3. Window Functions + +**Upstream source of truth:** +- Rust docs: https://docs.rs/datafusion/latest/datafusion/functions_window/index.html +- User docs: https://datafusion.apache.org/user-guide/sql/window_functions.html + +**Where they are exposed in this project:** +- Python API: `python/datafusion/functions.py` (window functions like `rank`, `dense_rank`, `lag`, `lead`, etc.) +- Rust bindings: `crates/core/src/functions.rs` + +**How to check:** +1. Fetch the upstream window function documentation page +2. Compare against window functions in `python/datafusion/functions.py` +3. Report missing window functions + +### 4. Table Functions + +**Upstream source of truth:** +- Rust docs: https://docs.rs/datafusion/latest/datafusion/functions_table/index.html +- User docs: https://datafusion.apache.org/user-guide/sql/table_functions.html (if available) + +**Where they are exposed in this project:** +- Python API: `python/datafusion/functions.py` and `python/datafusion/user_defined.py` (TableFunction/udtf) +- Rust bindings: `crates/core/src/functions.rs` and `crates/core/src/udtf.rs` + +**How to check:** +1. Fetch the upstream table function documentation +2. Compare against what's available in this project +3. Report missing table functions + +### 5. DataFrame Operations + +**Upstream source of truth:** +- Rust docs: https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html + +**Where they are exposed in this project:** +- Python API: `python/datafusion/dataframe.py` — the `DataFrame` class +- Rust bindings: `crates/core/src/dataframe.rs` — `PyDataFrame` with `#[pymethods]` + +**How to check:** +1. Fetch the upstream DataFrame documentation page listing all methods +2. Compare against methods in `python/datafusion/dataframe.py` +3. Also check `crates/core/src/dataframe.rs` for what's implemented +4. Report DataFrame methods that exist upstream but are missing + +### 6. SessionContext Methods + +**Upstream source of truth:** +- Rust docs: https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html + +**Where they are exposed in this project:** +- Python API: `python/datafusion/context.py` — the `SessionContext` class +- Rust bindings: `crates/core/src/context.rs` — `PySessionContext` with `#[pymethods]` + +**How to check:** +1. Fetch the upstream SessionContext documentation page listing all methods +2. Compare against methods in `python/datafusion/context.py` +3. Also check `crates/core/src/context.rs` for what's implemented +4. Report SessionContext methods that exist upstream but are missing + +## Output Format + +For each area checked, produce a report like: + +``` +## [Area Name] Coverage Report + +### Currently Exposed (X functions/methods) +- list of what's already available + +### Missing from Upstream (Y functions/methods) +- function_name — brief description of what it does +- function_name — brief description of what it does + +### Notes +- Any relevant observations about partial implementations, naming differences, etc. +``` + +## Implementation Pattern + +If the user asks you to implement missing features, follow these patterns: + +### Adding a New Function (Scalar/Aggregate/Window) + +**Step 1: Rust binding** in `crates/core/src/functions.rs`: +```rust +#[pyfunction] +#[pyo3(signature = (arg1, arg2))] +fn new_function_name(arg1: PyExpr, arg2: PyExpr) -> PyResult { + Ok(datafusion::functions::module::expr_fn::new_function_name(arg1.expr, arg2.expr).into()) +} +``` +Then register in `init_module()`: +```rust +m.add_wrapped(wrap_pyfunction!(new_function_name))?; +``` + +**Step 2: Python wrapper** in `python/datafusion/functions.py`: +```python +def new_function_name(arg1: Expr, arg2: Expr) -> Expr: + """Description of what the function does. + + Args: + arg1: Description of first argument. + arg2: Description of second argument. + + Returns: + Description of return value. + """ + return Expr(f.new_function_name(arg1.expr, arg2.expr)) +``` +Add to `__all__` list. + +### Adding a New DataFrame Method + +**Step 1: Rust binding** in `crates/core/src/dataframe.rs`: +```rust +#[pymethods] +impl PyDataFrame { + fn new_method(&self, py: Python, param: PyExpr) -> PyDataFusionResult { + let df = self.df.as_ref().clone().new_method(param.into())?; + Ok(Self::new(df)) + } +} +``` + +**Step 2: Python wrapper** in `python/datafusion/dataframe.py`: +```python +def new_method(self, param: Expr) -> DataFrame: + """Description of the method.""" + return DataFrame(self.df.new_method(param.expr)) +``` + +### Adding a New SessionContext Method + +**Step 1: Rust binding** in `crates/core/src/context.rs`: +```rust +#[pymethods] +impl PySessionContext { + pub fn new_method(&self, py: Python, param: String) -> PyDataFusionResult { + let df = wait_for_future(py, self.ctx.new_method(¶m))?; + Ok(PyDataFrame::new(df)) + } +} +``` + +**Step 2: Python wrapper** in `python/datafusion/context.py`: +```python +def new_method(self, param: str) -> DataFrame: + """Description of the method.""" + return DataFrame(self.ctx.new_method(param)) +``` + +## Important Notes + +- The upstream DataFusion version used by this project is specified in `crates/core/Cargo.toml` — check the `datafusion` dependency version to ensure you're comparing against the right upstream version. +- Some upstream features may intentionally not be exposed (e.g., internal-only APIs). Use judgment about what's user-facing. +- When fetching upstream docs, prefer the published docs.rs documentation as it matches the crate version. +- Function aliases (e.g., `array_append` / `list_append`) should both be exposed if upstream supports them. +- Check the `__all__` list in `functions.py` to see what's publicly exported vs just defined. From 1d27e8ffaa0f15a6d002886c35a7096cf808d2ba Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Sun, 29 Mar 2026 12:18:23 -0400 Subject: [PATCH 02/14] Add instructions on using the check-upstream skill --- README.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/README.md b/README.md index c24257876..b9f17495e 100644 --- a/README.md +++ b/README.md @@ -312,6 +312,33 @@ There are scripts in `ci/scripts` for running Rust and Python linters. ./ci/scripts/rust_toml_fmt.sh ``` +## Checking Upstream DataFusion Coverage + +This project includes a [Claude Code](https://claude.com/claude-code) skill for auditing which +features from the upstream Apache DataFusion Rust library are not yet exposed in these Python +bindings. This is useful when adding missing functions, auditing API coverage, or ensuring parity +with upstream. + +To use it, run the `/check-upstream` slash command inside Claude Code with an optional area argument: + +``` +/check-upstream scalar functions +/check-upstream aggregate functions +/check-upstream window functions +/check-upstream dataframe +/check-upstream session context +/check-upstream all +``` + +If no argument is provided, it defaults to checking all areas. The skill will fetch the upstream +DataFusion documentation, compare it against the functions and methods exposed in this project, and +produce a coverage report listing what is currently exposed and what is missing. + +The skill definition lives in `.claude/skills/check-upstream/SKILL.md`. Note that the `/check-upstream` +slash command is a [Claude Code](https://claude.com/claude-code) feature, but the underlying +methodology described in the skill file can be followed manually or by another AI coding agent if +directed to read and follow the instructions in that file. + ## How to update dependencies To change test dependencies, change the `pyproject.toml` and run From 0bc626a657f8863dc2a0933c17a519b12b6cf72f Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Sun, 29 Mar 2026 20:19:57 -0400 Subject: [PATCH 03/14] Add FFI type coverage and implementation pattern to check-upstream skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Document the full FFI type pipeline (Rust PyO3 wrapper → Protocol type → Python wrapper → ABC base class → exports → example) and catalog which upstream datafusion-ffi types are supported, which have been evaluated as not needing direct exposure, and how to check for new gaps. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/check-upstream/SKILL.md | 144 +++++++++++++++++++++++++ 1 file changed, 144 insertions(+) diff --git a/.claude/skills/check-upstream/SKILL.md b/.claude/skills/check-upstream/SKILL.md index 0a20df64b..d28a2bd96 100644 --- a/.claude/skills/check-upstream/SKILL.md +++ b/.claude/skills/check-upstream/SKILL.md @@ -196,6 +196,150 @@ def new_method(self, param: str) -> DataFrame: return DataFrame(self.ctx.new_method(param)) ``` +### Adding a New FFI Type + +FFI types require a full pipeline from C struct through to a typed Python wrapper. Each layer must be present. + +**Step 1: Rust PyO3 wrapper class** in a new or existing file under `crates/core/src/`: +```rust +use datafusion_ffi::new_type::FFI_NewType; + +#[pyclass(from_py_object, frozen, name = "RawNewType", module = "datafusion.module_name", subclass)] +pub struct PyNewType { + pub inner: Arc, +} + +#[pymethods] +impl PyNewType { + #[staticmethod] + fn from_pycapsule(obj: &Bound<'_, PyAny>) -> PyDataFusionResult { + let capsule = obj + .getattr("__datafusion_new_type__")? + .call0()? + .downcast::()?; + let ffi_ptr = unsafe { capsule.reference::() }; + let provider: Arc = ffi_ptr.into(); + Ok(Self { inner: provider }) + } + + fn some_method(&self) -> PyResult<...> { + // wrap inner trait method + } +} +``` +Register in the appropriate `init_module()`: +```rust +m.add_class::()?; +``` + +**Step 2: Python Protocol type** in the appropriate Python module (e.g., `python/datafusion/catalog.py`): +```python +class NewTypeExportable(Protocol): + """Type hint for objects providing a __datafusion_new_type__ PyCapsule.""" + + def __datafusion_new_type__(self) -> object: ... +``` + +**Step 3: Python wrapper class** in the same module: +```python +class NewType: + """Description of the type. + + This class wraps a DataFusion NewType, which can be created from a native + Python implementation or imported from an FFI-compatible library. + """ + + def __init__( + self, + new_type: df_internal.module_name.RawNewType | NewTypeExportable, + ) -> None: + if isinstance(new_type, df_internal.module_name.RawNewType): + self._raw = new_type + else: + self._raw = df_internal.module_name.RawNewType.from_pycapsule(new_type) + + def some_method(self) -> ReturnType: + """Description of the method.""" + return self._raw.some_method() +``` + +**Step 4: ABC base class** (if users should be able to subclass and provide custom implementations in Python): +```python +from abc import ABC, abstractmethod + +class NewTypeProvider(ABC): + """Abstract base class for implementing a custom NewType in Python.""" + + @abstractmethod + def some_method(self) -> ReturnType: + """Description of the method.""" + ... +``` + +**Step 5: Module exports** — add to the appropriate `__init__.py`: +- Add the wrapper class (`NewType`) to `python/datafusion/__init__.py` +- Add the ABC (`NewTypeProvider`) if applicable +- Add the Protocol type (`NewTypeExportable`) if it should be public + +**Step 6: FFI example** — add an example implementation under `examples/datafusion-ffi-example/src/`: +```rust +// examples/datafusion-ffi-example/src/new_type.rs +use datafusion_ffi::new_type::FFI_NewType; +// ... example showing how an external Rust library exposes this type via PyCapsule +``` + +**Checklist for each FFI type:** +- [ ] Rust PyO3 wrapper with `from_pycapsule()` method +- [ ] Python Protocol type (e.g., `NewTypeExportable`) for FFI objects +- [ ] Python wrapper class with full type hints on all public methods +- [ ] ABC base class (if the type can be user-implemented) +- [ ] Registered in Rust `init_module()` and Python `__init__.py` +- [ ] FFI example in `examples/datafusion-ffi-example/` +- [ ] Type appears in union type hints where accepted (e.g., `Table | TableProviderExportable`) + +### 7. FFI Types (datafusion-ffi) + +**Upstream source of truth:** +- Crate source: https://github.com/apache/datafusion/tree/main/datafusion/ffi/src +- Rust docs: https://docs.rs/datafusion-ffi/latest/datafusion_ffi/ + +**Where they are exposed in this project:** +- Rust bindings: various files under `crates/core/src/` and `crates/util/src/` +- FFI example: `examples/datafusion-ffi-example/src/` +- Dependency declared in root `Cargo.toml` and `crates/core/Cargo.toml` + +**Currently supported FFI types:** +- `FFI_ScalarUDF` — `crates/core/src/udf.rs` +- `FFI_AggregateUDF` — `crates/core/src/udaf.rs` +- `FFI_WindowUDF` — `crates/core/src/udwf.rs` +- `FFI_TableFunction` — `crates/core/src/udtf.rs` +- `FFI_TableProvider` — `crates/core/src/table.rs`, `crates/util/src/lib.rs` +- `FFI_TableProviderFactory` — `crates/core/src/context.rs` +- `FFI_CatalogProvider` — `crates/core/src/catalog.rs`, `crates/core/src/context.rs` +- `FFI_CatalogProviderList` — `crates/core/src/context.rs` +- `FFI_SchemaProvider` — `crates/core/src/catalog.rs` +- `FFI_LogicalExtensionCodec` — multiple files +- `FFI_ExtensionOptions` — `crates/core/src/context.rs` +- `FFI_TaskContextProvider` — `crates/core/src/context.rs` + +**Evaluated and not requiring direct Python exposure:** +These upstream FFI types have been reviewed and do not need to be independently exposed to end users: +- `FFI_ExecutionPlan` — already used indirectly through table providers; no need for direct exposure +- `FFI_PhysicalExpr` / `FFI_PhysicalSortExpr` — internal physical planning types not expected to be needed by end users +- `FFI_RecordBatchStream` — one level deeper than FFI_ExecutionPlan, used internally when execution plans stream results +- `FFI_SessionRef` / `ForeignSession` — session sharing across FFI; Python manages sessions natively via SessionContext +- `FFI_SessionConfig` — Python can configure sessions natively without FFI +- `FFI_ConfigOptions` / `FFI_TableOptions` — internal configuration plumbing +- `FFI_PlanProperties` / `FFI_Boundedness` / `FFI_EmissionType` — read from existing plans, not user-facing +- `FFI_Partitioning` — supporting type for physical planning +- Supporting/utility types (`FFI_Option`, `FFI_Result`, `WrappedSchema`, `WrappedArray`, `FFI_ColumnarValue`, `FFI_Volatility`, `FFI_InsertOp`, `FFI_AccumulatorArgs`, `FFI_Accumulator`, `FFI_GroupsAccumulator`, `FFI_EmitTo`, `FFI_AggregateOrderSensitivity`, `FFI_PartitionEvaluator`, `FFI_PartitionEvaluatorArgs`, `FFI_Range`, `FFI_SortOptions`, `FFI_Distribution`, `FFI_ExprProperties`, `FFI_SortProperties`, `FFI_Interval`, `FFI_TableProviderFilterPushDown`, `FFI_TableType`) — used as building blocks within the types above, not independently exposed + +**How to check:** +1. Compare the upstream `datafusion-ffi` crate's `lib.rs` exports against the lists above +2. If new FFI types appear upstream, evaluate whether they represent a user-facing capability +3. Check against the "evaluated and not requiring exposure" list before flagging as a gap +4. Report any genuinely new types that enable user-facing functionality + ## Important Notes - The upstream DataFusion version used by this project is specified in `crates/core/Cargo.toml` — check the `datafusion` dependency version to ensure you're comparing against the right upstream version. From 31fa6d7d16d2910432904900cfb6b1eb30a99660 Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Sun, 29 Mar 2026 20:23:16 -0400 Subject: [PATCH 04/14] Update check-upstream skill to include FFI types as a checkable area Add "ffi types" to the argument-hint and description so users can invoke the skill with `/check-upstream ffi types`. Also add pipeline verification step to ensure each supported FFI type has the full end-to-end chain (PyO3 wrapper, Protocol, Python wrapper with type hints, ABC, exports). Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/check-upstream/SKILL.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/.claude/skills/check-upstream/SKILL.md b/.claude/skills/check-upstream/SKILL.md index d28a2bd96..aff3cd2f4 100644 --- a/.claude/skills/check-upstream/SKILL.md +++ b/.claude/skills/check-upstream/SKILL.md @@ -1,7 +1,7 @@ --- name: check-upstream -description: Check if upstream Apache DataFusion features (functions, DataFrame ops, SessionContext methods) are exposed in this Python project. Use when adding missing functions, auditing API coverage, or ensuring parity with upstream. -argument-hint: [area] (e.g., "scalar functions", "aggregate functions", "window functions", "dataframe", "session context", "all") +description: Check if upstream Apache DataFusion features (functions, DataFrame ops, SessionContext methods, FFI types) are exposed in this Python project. Use when adding missing functions, auditing API coverage, or ensuring parity with upstream. +argument-hint: [area] (e.g., "scalar functions", "aggregate functions", "window functions", "dataframe", "session context", "ffi types", "all") --- # Check Upstream DataFusion Feature Coverage @@ -339,6 +339,14 @@ These upstream FFI types have been reviewed and do not need to be independently 2. If new FFI types appear upstream, evaluate whether they represent a user-facing capability 3. Check against the "evaluated and not requiring exposure" list before flagging as a gap 4. Report any genuinely new types that enable user-facing functionality +5. For each currently supported FFI type, verify the full pipeline is present using the checklist from "Adding a New FFI Type": + - Rust PyO3 wrapper with `from_pycapsule()` method + - Python Protocol type (e.g., `ScalarUDFExportable`) for FFI objects + - Python wrapper class with full type hints on all public methods + - ABC base class (if the type can be user-implemented) + - Registered in Rust `init_module()` and Python `__init__.py` + - FFI example in `examples/datafusion-ffi-example/` + - Type appears in union type hints where accepted ## Important Notes From c9539301bae47e24e8355872dc2b58dbeee21a74 Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Sun, 29 Mar 2026 20:29:53 -0400 Subject: [PATCH 05/14] Move FFI Types section alongside other areas to check Section 7 (FFI Types) was incorrectly placed after the Output Format and Implementation Pattern sections. Move it to sit after Section 6 (SessionContext Methods), consistent with the other checkable areas. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/check-upstream/SKILL.md | 102 ++++++++++++------------- 1 file changed, 51 insertions(+), 51 deletions(-) diff --git a/.claude/skills/check-upstream/SKILL.md b/.claude/skills/check-upstream/SKILL.md index aff3cd2f4..f1203b502 100644 --- a/.claude/skills/check-upstream/SKILL.md +++ b/.claude/skills/check-upstream/SKILL.md @@ -103,6 +103,57 @@ The user may specify an area via `$ARGUMENTS`. If no area is specified or "all" 3. Also check `crates/core/src/context.rs` for what's implemented 4. Report SessionContext methods that exist upstream but are missing +### 7. FFI Types (datafusion-ffi) + +**Upstream source of truth:** +- Crate source: https://github.com/apache/datafusion/tree/main/datafusion/ffi/src +- Rust docs: https://docs.rs/datafusion-ffi/latest/datafusion_ffi/ + +**Where they are exposed in this project:** +- Rust bindings: various files under `crates/core/src/` and `crates/util/src/` +- FFI example: `examples/datafusion-ffi-example/src/` +- Dependency declared in root `Cargo.toml` and `crates/core/Cargo.toml` + +**Currently supported FFI types:** +- `FFI_ScalarUDF` — `crates/core/src/udf.rs` +- `FFI_AggregateUDF` — `crates/core/src/udaf.rs` +- `FFI_WindowUDF` — `crates/core/src/udwf.rs` +- `FFI_TableFunction` — `crates/core/src/udtf.rs` +- `FFI_TableProvider` — `crates/core/src/table.rs`, `crates/util/src/lib.rs` +- `FFI_TableProviderFactory` — `crates/core/src/context.rs` +- `FFI_CatalogProvider` — `crates/core/src/catalog.rs`, `crates/core/src/context.rs` +- `FFI_CatalogProviderList` — `crates/core/src/context.rs` +- `FFI_SchemaProvider` — `crates/core/src/catalog.rs` +- `FFI_LogicalExtensionCodec` — multiple files +- `FFI_ExtensionOptions` — `crates/core/src/context.rs` +- `FFI_TaskContextProvider` — `crates/core/src/context.rs` + +**Evaluated and not requiring direct Python exposure:** +These upstream FFI types have been reviewed and do not need to be independently exposed to end users: +- `FFI_ExecutionPlan` — already used indirectly through table providers; no need for direct exposure +- `FFI_PhysicalExpr` / `FFI_PhysicalSortExpr` — internal physical planning types not expected to be needed by end users +- `FFI_RecordBatchStream` — one level deeper than FFI_ExecutionPlan, used internally when execution plans stream results +- `FFI_SessionRef` / `ForeignSession` — session sharing across FFI; Python manages sessions natively via SessionContext +- `FFI_SessionConfig` — Python can configure sessions natively without FFI +- `FFI_ConfigOptions` / `FFI_TableOptions` — internal configuration plumbing +- `FFI_PlanProperties` / `FFI_Boundedness` / `FFI_EmissionType` — read from existing plans, not user-facing +- `FFI_Partitioning` — supporting type for physical planning +- Supporting/utility types (`FFI_Option`, `FFI_Result`, `WrappedSchema`, `WrappedArray`, `FFI_ColumnarValue`, `FFI_Volatility`, `FFI_InsertOp`, `FFI_AccumulatorArgs`, `FFI_Accumulator`, `FFI_GroupsAccumulator`, `FFI_EmitTo`, `FFI_AggregateOrderSensitivity`, `FFI_PartitionEvaluator`, `FFI_PartitionEvaluatorArgs`, `FFI_Range`, `FFI_SortOptions`, `FFI_Distribution`, `FFI_ExprProperties`, `FFI_SortProperties`, `FFI_Interval`, `FFI_TableProviderFilterPushDown`, `FFI_TableType`) — used as building blocks within the types above, not independently exposed + +**How to check:** +1. Compare the upstream `datafusion-ffi` crate's `lib.rs` exports against the lists above +2. If new FFI types appear upstream, evaluate whether they represent a user-facing capability +3. Check against the "evaluated and not requiring exposure" list before flagging as a gap +4. Report any genuinely new types that enable user-facing functionality +5. For each currently supported FFI type, verify the full pipeline is present using the checklist from "Adding a New FFI Type": + - Rust PyO3 wrapper with `from_pycapsule()` method + - Python Protocol type (e.g., `ScalarUDFExportable`) for FFI objects + - Python wrapper class with full type hints on all public methods + - ABC base class (if the type can be user-implemented) + - Registered in Rust `init_module()` and Python `__init__.py` + - FFI example in `examples/datafusion-ffi-example/` + - Type appears in union type hints where accepted + ## Output Format For each area checked, produce a report like: @@ -297,57 +348,6 @@ use datafusion_ffi::new_type::FFI_NewType; - [ ] FFI example in `examples/datafusion-ffi-example/` - [ ] Type appears in union type hints where accepted (e.g., `Table | TableProviderExportable`) -### 7. FFI Types (datafusion-ffi) - -**Upstream source of truth:** -- Crate source: https://github.com/apache/datafusion/tree/main/datafusion/ffi/src -- Rust docs: https://docs.rs/datafusion-ffi/latest/datafusion_ffi/ - -**Where they are exposed in this project:** -- Rust bindings: various files under `crates/core/src/` and `crates/util/src/` -- FFI example: `examples/datafusion-ffi-example/src/` -- Dependency declared in root `Cargo.toml` and `crates/core/Cargo.toml` - -**Currently supported FFI types:** -- `FFI_ScalarUDF` — `crates/core/src/udf.rs` -- `FFI_AggregateUDF` — `crates/core/src/udaf.rs` -- `FFI_WindowUDF` — `crates/core/src/udwf.rs` -- `FFI_TableFunction` — `crates/core/src/udtf.rs` -- `FFI_TableProvider` — `crates/core/src/table.rs`, `crates/util/src/lib.rs` -- `FFI_TableProviderFactory` — `crates/core/src/context.rs` -- `FFI_CatalogProvider` — `crates/core/src/catalog.rs`, `crates/core/src/context.rs` -- `FFI_CatalogProviderList` — `crates/core/src/context.rs` -- `FFI_SchemaProvider` — `crates/core/src/catalog.rs` -- `FFI_LogicalExtensionCodec` — multiple files -- `FFI_ExtensionOptions` — `crates/core/src/context.rs` -- `FFI_TaskContextProvider` — `crates/core/src/context.rs` - -**Evaluated and not requiring direct Python exposure:** -These upstream FFI types have been reviewed and do not need to be independently exposed to end users: -- `FFI_ExecutionPlan` — already used indirectly through table providers; no need for direct exposure -- `FFI_PhysicalExpr` / `FFI_PhysicalSortExpr` — internal physical planning types not expected to be needed by end users -- `FFI_RecordBatchStream` — one level deeper than FFI_ExecutionPlan, used internally when execution plans stream results -- `FFI_SessionRef` / `ForeignSession` — session sharing across FFI; Python manages sessions natively via SessionContext -- `FFI_SessionConfig` — Python can configure sessions natively without FFI -- `FFI_ConfigOptions` / `FFI_TableOptions` — internal configuration plumbing -- `FFI_PlanProperties` / `FFI_Boundedness` / `FFI_EmissionType` — read from existing plans, not user-facing -- `FFI_Partitioning` — supporting type for physical planning -- Supporting/utility types (`FFI_Option`, `FFI_Result`, `WrappedSchema`, `WrappedArray`, `FFI_ColumnarValue`, `FFI_Volatility`, `FFI_InsertOp`, `FFI_AccumulatorArgs`, `FFI_Accumulator`, `FFI_GroupsAccumulator`, `FFI_EmitTo`, `FFI_AggregateOrderSensitivity`, `FFI_PartitionEvaluator`, `FFI_PartitionEvaluatorArgs`, `FFI_Range`, `FFI_SortOptions`, `FFI_Distribution`, `FFI_ExprProperties`, `FFI_SortProperties`, `FFI_Interval`, `FFI_TableProviderFilterPushDown`, `FFI_TableType`) — used as building blocks within the types above, not independently exposed - -**How to check:** -1. Compare the upstream `datafusion-ffi` crate's `lib.rs` exports against the lists above -2. If new FFI types appear upstream, evaluate whether they represent a user-facing capability -3. Check against the "evaluated and not requiring exposure" list before flagging as a gap -4. Report any genuinely new types that enable user-facing functionality -5. For each currently supported FFI type, verify the full pipeline is present using the checklist from "Adding a New FFI Type": - - Rust PyO3 wrapper with `from_pycapsule()` method - - Python Protocol type (e.g., `ScalarUDFExportable`) for FFI objects - - Python wrapper class with full type hints on all public methods - - ABC base class (if the type can be user-implemented) - - Registered in Rust `init_module()` and Python `__init__.py` - - FFI example in `examples/datafusion-ffi-example/` - - Type appears in union type hints where accepted - ## Important Notes - The upstream DataFusion version used by this project is specified in `crates/core/Cargo.toml` — check the `datafusion` dependency version to ensure you're comparing against the right upstream version. From 2d7ca471d681374b326b7771cf2849423e742ed6 Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Sun, 29 Mar 2026 20:39:52 -0400 Subject: [PATCH 06/14] Replace static FFI type list with dynamic discovery instruction The supported FFI types list would go stale as new types are added. Replace it with a grep instruction to discover them at check time, keeping only the "evaluated and not requiring exposure" list which captures rationale not derivable from code. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/check-upstream/SKILL.md | 17 +++-------------- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/.claude/skills/check-upstream/SKILL.md b/.claude/skills/check-upstream/SKILL.md index f1203b502..cb56c0ce5 100644 --- a/.claude/skills/check-upstream/SKILL.md +++ b/.claude/skills/check-upstream/SKILL.md @@ -114,19 +114,8 @@ The user may specify an area via `$ARGUMENTS`. If no area is specified or "all" - FFI example: `examples/datafusion-ffi-example/src/` - Dependency declared in root `Cargo.toml` and `crates/core/Cargo.toml` -**Currently supported FFI types:** -- `FFI_ScalarUDF` — `crates/core/src/udf.rs` -- `FFI_AggregateUDF` — `crates/core/src/udaf.rs` -- `FFI_WindowUDF` — `crates/core/src/udwf.rs` -- `FFI_TableFunction` — `crates/core/src/udtf.rs` -- `FFI_TableProvider` — `crates/core/src/table.rs`, `crates/util/src/lib.rs` -- `FFI_TableProviderFactory` — `crates/core/src/context.rs` -- `FFI_CatalogProvider` — `crates/core/src/catalog.rs`, `crates/core/src/context.rs` -- `FFI_CatalogProviderList` — `crates/core/src/context.rs` -- `FFI_SchemaProvider` — `crates/core/src/catalog.rs` -- `FFI_LogicalExtensionCodec` — multiple files -- `FFI_ExtensionOptions` — `crates/core/src/context.rs` -- `FFI_TaskContextProvider` — `crates/core/src/context.rs` +**Discovering currently supported FFI types:** +Grep for `use datafusion_ffi::` in `crates/core/src/` and `crates/util/src/` to find all FFI types currently imported and used. **Evaluated and not requiring direct Python exposure:** These upstream FFI types have been reviewed and do not need to be independently exposed to end users: @@ -141,7 +130,7 @@ These upstream FFI types have been reviewed and do not need to be independently - Supporting/utility types (`FFI_Option`, `FFI_Result`, `WrappedSchema`, `WrappedArray`, `FFI_ColumnarValue`, `FFI_Volatility`, `FFI_InsertOp`, `FFI_AccumulatorArgs`, `FFI_Accumulator`, `FFI_GroupsAccumulator`, `FFI_EmitTo`, `FFI_AggregateOrderSensitivity`, `FFI_PartitionEvaluator`, `FFI_PartitionEvaluatorArgs`, `FFI_Range`, `FFI_SortOptions`, `FFI_Distribution`, `FFI_ExprProperties`, `FFI_SortProperties`, `FFI_Interval`, `FFI_TableProviderFilterPushDown`, `FFI_TableType`) — used as building blocks within the types above, not independently exposed **How to check:** -1. Compare the upstream `datafusion-ffi` crate's `lib.rs` exports against the lists above +1. Discover currently supported types by grepping for `use datafusion_ffi::` in `crates/core/src/` and `crates/util/src/`, then compare against the upstream `datafusion-ffi` crate's `lib.rs` exports 2. If new FFI types appear upstream, evaluate whether they represent a user-facing capability 3. Check against the "evaluated and not requiring exposure" list before flagging as a gap 4. Report any genuinely new types that enable user-facing functionality From 88ed86ca9717c4c310a09e4c9922778a0854a3d2 Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Mon, 30 Mar 2026 07:54:33 -0400 Subject: [PATCH 07/14] Make Python API the source of truth for upstream coverage checks Functions exposed in Python (e.g., as aliases of other Rust bindings) were being falsely reported as missing because they lacked a dedicated #[pyfunction] in Rust. The user-facing API is the Python layer, so coverage should be measured there. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/check-upstream/SKILL.md | 35 +++++++++++++++----------- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/.claude/skills/check-upstream/SKILL.md b/.claude/skills/check-upstream/SKILL.md index cb56c0ce5..777120e8b 100644 --- a/.claude/skills/check-upstream/SKILL.md +++ b/.claude/skills/check-upstream/SKILL.md @@ -8,6 +8,8 @@ argument-hint: [area] (e.g., "scalar functions", "aggregate functions", "window You are auditing the datafusion-python project to find features from the upstream Apache DataFusion Rust library that are **not yet exposed** in this Python binding project. Your goal is to identify gaps and, if asked, implement the missing bindings. +**IMPORTANT: The Python API is the source of truth for coverage.** A function or method is considered "exposed" if it exists in the Python API (e.g., `python/datafusion/functions.py`), even if there is no corresponding entry in the Rust bindings. Many upstream functions are aliases of other functions — the Python layer can expose these aliases by calling a different underlying Rust binding. Do NOT report a function as missing if it appears in the Python `__all__` list and has a working implementation, regardless of whether a matching `#[pyfunction]` exists in Rust. + ## Areas to Check The user may specify an area via `$ARGUMENTS`. If no area is specified or "all" is given, check all areas. @@ -24,9 +26,9 @@ The user may specify an area via `$ARGUMENTS`. If no area is specified or "all" **How to check:** 1. Fetch the upstream scalar function documentation page -2. Compare against functions listed in `python/datafusion/functions.py` (check the `__all__` list) -3. Also check `crates/core/src/functions.rs` for what's registered in `init_module()` -4. Report functions that exist upstream but are missing from this project +2. Compare against functions listed in `python/datafusion/functions.py` (check the `__all__` list and function definitions) +3. A function is covered if it exists in the Python API — it does NOT need a dedicated Rust `#[pyfunction]`. Many functions are aliases that reuse another function's Rust binding. +4. Only report functions that are missing from the Python `__all__` list / function definitions ### 2. Aggregate Functions @@ -40,8 +42,9 @@ The user may specify an area via `$ARGUMENTS`. If no area is specified or "all" **How to check:** 1. Fetch the upstream aggregate function documentation page -2. Compare against aggregate functions in `python/datafusion/functions.py` -3. Report missing aggregate functions +2. Compare against aggregate functions in `python/datafusion/functions.py` (check `__all__` list and function definitions) +3. A function is covered if it exists in the Python API, even if it aliases another function's Rust binding +4. Report only functions missing from the Python API ### 3. Window Functions @@ -55,8 +58,9 @@ The user may specify an area via `$ARGUMENTS`. If no area is specified or "all" **How to check:** 1. Fetch the upstream window function documentation page -2. Compare against window functions in `python/datafusion/functions.py` -3. Report missing window functions +2. Compare against window functions in `python/datafusion/functions.py` (check `__all__` list and function definitions) +3. A function is covered if it exists in the Python API, even if it aliases another function's Rust binding +4. Report only functions missing from the Python API ### 4. Table Functions @@ -70,8 +74,9 @@ The user may specify an area via `$ARGUMENTS`. If no area is specified or "all" **How to check:** 1. Fetch the upstream table function documentation -2. Compare against what's available in this project -3. Report missing table functions +2. Compare against what's available in the Python API +3. A function is covered if it exists in the Python API, even if it aliases another function's Rust binding +4. Report only functions missing from the Python API ### 5. DataFrame Operations @@ -84,9 +89,9 @@ The user may specify an area via `$ARGUMENTS`. If no area is specified or "all" **How to check:** 1. Fetch the upstream DataFrame documentation page listing all methods -2. Compare against methods in `python/datafusion/dataframe.py` -3. Also check `crates/core/src/dataframe.rs` for what's implemented -4. Report DataFrame methods that exist upstream but are missing +2. Compare against methods in `python/datafusion/dataframe.py` — this is the source of truth for coverage +3. The Rust bindings (`crates/core/src/dataframe.rs`) may be consulted for context, but a method is covered if it exists in the Python API +4. Report only methods missing from the Python API ### 6. SessionContext Methods @@ -99,9 +104,9 @@ The user may specify an area via `$ARGUMENTS`. If no area is specified or "all" **How to check:** 1. Fetch the upstream SessionContext documentation page listing all methods -2. Compare against methods in `python/datafusion/context.py` -3. Also check `crates/core/src/context.rs` for what's implemented -4. Report SessionContext methods that exist upstream but are missing +2. Compare against methods in `python/datafusion/context.py` — this is the source of truth for coverage +3. The Rust bindings (`crates/core/src/context.rs`) may be consulted for context, but a method is covered if it exists in the Python API +4. Report only methods missing from the Python API ### 7. FFI Types (datafusion-ffi) From 0a82d67a2532f6634216810450e52bb6319b6502 Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Tue, 31 Mar 2026 14:18:56 -0400 Subject: [PATCH 08/14] Add exclusion list for DataFrame methods already covered by Python API show_limit is covered by DataFrame.show() and with_param_values is covered by SessionContext.sql(param_values=...), so neither needs separate exposure. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/check-upstream/SKILL.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/.claude/skills/check-upstream/SKILL.md b/.claude/skills/check-upstream/SKILL.md index 777120e8b..95fa87663 100644 --- a/.claude/skills/check-upstream/SKILL.md +++ b/.claude/skills/check-upstream/SKILL.md @@ -87,11 +87,16 @@ The user may specify an area via `$ARGUMENTS`. If no area is specified or "all" - Python API: `python/datafusion/dataframe.py` — the `DataFrame` class - Rust bindings: `crates/core/src/dataframe.rs` — `PyDataFrame` with `#[pymethods]` +**Evaluated and not requiring separate Python exposure:** +- `show_limit` — already covered by `DataFrame.show()`, which provides the same functionality with a simpler API +- `with_param_values` — already covered by the `param_values` argument on `SessionContext.sql()`, which accomplishes the same thing more robustly + **How to check:** 1. Fetch the upstream DataFrame documentation page listing all methods 2. Compare against methods in `python/datafusion/dataframe.py` — this is the source of truth for coverage 3. The Rust bindings (`crates/core/src/dataframe.rs`) may be consulted for context, but a method is covered if it exists in the Python API -4. Report only methods missing from the Python API +4. Check against the "evaluated and not requiring exposure" list before flagging as a gap +5. Report only methods missing from the Python API ### 6. SessionContext Methods From c9d6d98c06c288d08d95f61cef885d835cb3b517 Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Wed, 1 Apr 2026 12:30:40 -0400 Subject: [PATCH 09/14] Move skills to .ai/skills/ for tool-agnostic discoverability Moves the canonical skill definitions from .claude/skills/ to .ai/skills/ and replaces .claude/skills with a symlink, so Claude Code still discovers them while other AI agents can find them in a tool-neutral location. Co-Authored-By: Claude Opus 4.6 (1M context) --- {.claude => .ai}/skills/check-upstream/SKILL.md | 0 .claude/skills | 1 + 2 files changed, 1 insertion(+) rename {.claude => .ai}/skills/check-upstream/SKILL.md (100%) create mode 120000 .claude/skills diff --git a/.claude/skills/check-upstream/SKILL.md b/.ai/skills/check-upstream/SKILL.md similarity index 100% rename from .claude/skills/check-upstream/SKILL.md rename to .ai/skills/check-upstream/SKILL.md diff --git a/.claude/skills b/.claude/skills new file mode 120000 index 000000000..6838a1160 --- /dev/null +++ b/.claude/skills @@ -0,0 +1 @@ +../.ai/skills \ No newline at end of file From b237a647bcc6ae65527139dcf5fad287ed48e20a Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Wed, 1 Apr 2026 12:33:16 -0400 Subject: [PATCH 10/14] Add AGENTS.md for tool-agnostic agent instructions with CLAUDE.md symlink AGENTS.md points agents to .ai/skills/ for skill discovery. CLAUDE.md symlinks to it so Claude Code picks it up as project instructions. Co-Authored-By: Claude Opus 4.6 (1M context) --- AGENTS.md | 8 ++++++++ CLAUDE.md | 1 + 2 files changed, 9 insertions(+) create mode 100644 AGENTS.md create mode 120000 CLAUDE.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 000000000..0c4f707ae --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,8 @@ +# Agent Instructions + +This project uses AI agent skills stored in `.ai/skills/`. Each skill is a directory containing a `SKILL.md` file with instructions for performing a specific task. + +Skills follow the [Agent Skills](https://agentskills.io) open standard. Each skill directory contains: + +- `SKILL.md` — The skill definition with YAML frontmatter (name, description, argument-hint) and detailed instructions. +- Additional supporting files as needed. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 000000000..47dc3e3d8 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file From eeb23d76e3b638986e49d9616005e6789f23c508 Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Wed, 1 Apr 2026 12:35:43 -0400 Subject: [PATCH 11/14] Make README upstream coverage section tool-agnostic Remove Claude Code references and update skill path from .claude/skills/ to .ai/skills/ to match the new tool-neutral directory structure. Co-Authored-By: Claude Opus 4.6 (1M context) --- README.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index b9f17495e..7c1c71281 100644 --- a/README.md +++ b/README.md @@ -314,30 +314,30 @@ There are scripts in `ci/scripts` for running Rust and Python linters. ## Checking Upstream DataFusion Coverage -This project includes a [Claude Code](https://claude.com/claude-code) skill for auditing which +This project includes an [AI agent skill](.ai/skills/check-upstream/SKILL.md) for auditing which features from the upstream Apache DataFusion Rust library are not yet exposed in these Python bindings. This is useful when adding missing functions, auditing API coverage, or ensuring parity with upstream. -To use it, run the `/check-upstream` slash command inside Claude Code with an optional area argument: +The skill accepts an optional area argument: ``` -/check-upstream scalar functions -/check-upstream aggregate functions -/check-upstream window functions -/check-upstream dataframe -/check-upstream session context -/check-upstream all +scalar functions +aggregate functions +window functions +dataframe +session context +ffi types +all ``` If no argument is provided, it defaults to checking all areas. The skill will fetch the upstream DataFusion documentation, compare it against the functions and methods exposed in this project, and produce a coverage report listing what is currently exposed and what is missing. -The skill definition lives in `.claude/skills/check-upstream/SKILL.md`. Note that the `/check-upstream` -slash command is a [Claude Code](https://claude.com/claude-code) feature, but the underlying -methodology described in the skill file can be followed manually or by another AI coding agent if -directed to read and follow the instructions in that file. +The skill definition lives in `.ai/skills/check-upstream/SKILL.md` and follows the +[Agent Skills](https://agentskills.io) open standard. It can be used by any AI coding agent that +supports skill discovery, or followed manually. ## How to update dependencies From fd1b6f475a597e6508e8ef7befac5697b9ec160c Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Wed, 1 Apr 2026 12:37:54 -0400 Subject: [PATCH 12/14] Add GitHub issue lookup step to check-upstream skill When gaps are identified, search open issues at apache/datafusion-python before reporting. Existing issues are linked in the report rather than duplicated. Co-Authored-By: Claude Opus 4.6 (1M context) --- .ai/skills/check-upstream/SKILL.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/.ai/skills/check-upstream/SKILL.md b/.ai/skills/check-upstream/SKILL.md index 95fa87663..c2b6d8f2b 100644 --- a/.ai/skills/check-upstream/SKILL.md +++ b/.ai/skills/check-upstream/SKILL.md @@ -153,6 +153,13 @@ These upstream FFI types have been reviewed and do not need to be independently - FFI example in `examples/datafusion-ffi-example/` - Type appears in union type hints where accepted +## Checking for Existing GitHub Issues + +After identifying missing APIs, search the open issues at https://github.com/apache/datafusion-python/issues for each gap to see if an issue already exists requesting that API be exposed. Search using the function or method name as the query. + +- If an existing issue is found, include a link to it in the report. Do NOT create a new issue. +- If no existing issue is found, note that no issue exists yet. + ## Output Format For each area checked, produce a report like: @@ -164,8 +171,8 @@ For each area checked, produce a report like: - list of what's already available ### Missing from Upstream (Y functions/methods) -- function_name — brief description of what it does -- function_name — brief description of what it does +- function_name — brief description of what it does (existing issue: #123) +- function_name — brief description of what it does (no existing issue) ### Notes - Any relevant observations about partial implementations, naming differences, etc. From d37b3888af52e948c7ba5cbff486fe30fb0c9791 Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Wed, 1 Apr 2026 12:38:51 -0400 Subject: [PATCH 13/14] Require Python test coverage in issues created by check-upstream skill Co-Authored-By: Claude Opus 4.6 (1M context) --- .ai/skills/check-upstream/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.ai/skills/check-upstream/SKILL.md b/.ai/skills/check-upstream/SKILL.md index c2b6d8f2b..cbfbf659b 100644 --- a/.ai/skills/check-upstream/SKILL.md +++ b/.ai/skills/check-upstream/SKILL.md @@ -158,7 +158,7 @@ These upstream FFI types have been reviewed and do not need to be independently After identifying missing APIs, search the open issues at https://github.com/apache/datafusion-python/issues for each gap to see if an issue already exists requesting that API be exposed. Search using the function or method name as the query. - If an existing issue is found, include a link to it in the report. Do NOT create a new issue. -- If no existing issue is found, note that no issue exists yet. +- If no existing issue is found, note that no issue exists yet. If the user asks to create issues for missing APIs, each issue should specify that Python test coverage is required as part of the implementation. ## Output Format From 31a9a1b95e23edbeac0eacdf21cba16888beb7be Mon Sep 17 00:00:00 2001 From: Tim Saucer Date: Wed, 1 Apr 2026 14:25:27 -0400 Subject: [PATCH 14/14] Add license text --- .ai/skills/check-upstream/SKILL.md | 19 +++++++++++++++++++ AGENTS.md | 19 +++++++++++++++++++ 2 files changed, 38 insertions(+) diff --git a/.ai/skills/check-upstream/SKILL.md b/.ai/skills/check-upstream/SKILL.md index cbfbf659b..f77210371 100644 --- a/.ai/skills/check-upstream/SKILL.md +++ b/.ai/skills/check-upstream/SKILL.md @@ -1,3 +1,22 @@ + + --- name: check-upstream description: Check if upstream Apache DataFusion features (functions, DataFrame ops, SessionContext methods, FFI types) are exposed in this Python project. Use when adding missing functions, auditing API coverage, or ensuring parity with upstream. diff --git a/AGENTS.md b/AGENTS.md index 0c4f707ae..1853a84cd 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,3 +1,22 @@ + + # Agent Instructions This project uses AI agent skills stored in `.ai/skills/`. Each skill is a directory containing a `SKILL.md` file with instructions for performing a specific task.