Skip to content

Commit 7515ebc

Browse files
timsaucerclaude
andcommitted
fix: address review feedback on pythonic skill and function signatures
Update SKILL.md to prevent three classes of issues: clarify that float already accepts int per PEP 484 (avoiding redundant int | float that fails ruff PYI041), add backward-compat rule for Category B so existing Expr params aren't removed, and add guidance for inline coercion with many optional nullable params instead of local helpers. Replace regexp_instr's _to_raw() helper with inline coercion matching the pattern used throughout the rest of the file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 25e5ce3 commit 7515ebc

File tree

2 files changed

+44
-27
lines changed

2 files changed

+44
-27
lines changed

.ai/skills/make-pythonic/SKILL.md

Lines changed: 32 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,8 @@ Common mappings:
104104
| `logical_string()` / `TypeSignatureClass::String` | `str` |
105105
| `LogicalType::Boolean` | `bool` |
106106

107+
**Important:** In Python's type system (PEP 484), `float` already accepts `int` values, so `int | float` is redundant and will fail the `ruff` linter (rule PYI041). Use `float` alone when the Rust side accepts a float/numeric type — Python users can still pass integer literals like `log(10, col("a"))` or `power(col("a"), 3)` without issue. Only use `int` when the Rust side strictly requires an integer (e.g., `logical_int64()`).
108+
107109
#### Technique 3: Check `return_field_from_args()` for `scalar_arguments` usage
108110

109111
Functions that inspect literal values at query planning time use `args.scalar_arguments.get(n)` in their `return_field_from_args()` method. This indicates the argument is **expected to be a literal** for optimal behavior (e.g., to determine output type precision), but may still work as a column.
@@ -166,31 +168,30 @@ These are arguments where an `Expr` never makes sense because the value must be
166168

167169
**Type hint pattern:** `str`, `int`, `list[str]`, etc. (no `Expr` in the union)
168170

169-
**When to use:** When the argument is from a fixed enumeration or is always a compile-time constant:
170-
- Date/time part names (`"year"`, `"month"`, `"day"`, etc.)
171-
- Regex flags (`"g"`, `"i"`, etc.)
171+
**When to use:** When the argument is from a fixed enumeration or is always a compile-time constant, **AND** the parameter was not previously typed as `Expr`:
172+
- Separator in `concat_ws` (already typed as `str` in the Rust binding)
173+
- Index in `array_position` (already typed as `int` in the Rust binding)
172174
- Values that the Rust implementation already accepts as native types
173175

176+
**Backward compatibility rule:** If a parameter was previously typed as `Expr`, you **must** keep `Expr` in the union even if the Rust side requires a literal. Removing `Expr` would break existing user code like `date_part(lit("year"), col("a"))`. Use **Category A** instead — accept `Expr | str` — and let users who pass column expressions discover the runtime error from the Rust side. Never silently break backward compatibility.
177+
174178
```python
175-
def date_part(part: str, date: Expr) -> Expr:
176-
"""Extracts a subfield from the date.
179+
def concat_ws(separator: str, *args: Expr) -> Expr:
180+
"""Concatenates the list ``args`` with the separator.
177181
178-
Args:
179-
part: The part of the date to extract. Must be one of "year", "month",
180-
"day", "hour", "minute", "second", etc.
181-
date: The date expression to extract from.
182+
``separator`` is already typed as ``str`` in the Rust binding, so
183+
there is no backward-compatibility concern.
182184
183185
Examples:
184186
>>> ctx = dfn.SessionContext()
185-
>>> df = ctx.from_pydict({"a": ["2021-07-15T00:00:00"]})
186-
>>> df = df.select(dfn.functions.to_timestamp(dfn.col("a")).alias("a"))
187+
>>> df = ctx.from_pydict({"a": ["hello"], "b": ["world"]})
187188
>>> result = df.select(
188-
... dfn.functions.date_part("year", dfn.col("a")).alias("y"))
189-
>>> result.collect_column("y")[0].as_py()
190-
2021
189+
... dfn.functions.concat_ws("-", dfn.col("a"), dfn.col("b")).alias("c"))
190+
>>> result.collect_column("c")[0].as_py()
191+
'hello-world'
191192
"""
192-
part = Expr.literal(part)
193-
return Expr(f.date_part(part.expr, date.expr))
193+
args = [arg.expr for arg in args]
194+
return Expr(f.concat_ws(separator, args))
194195
```
195196

196197
### Category C: Arguments That Should Accept str as Column Name
@@ -269,6 +270,21 @@ if not isinstance(arg, Expr):
269270

270271
Do NOT create a new helper function for this — the inline check is clear and explicit. The project intentionally uses `ensure_expr()` to reject non-Expr values in contexts where coercion is not wanted; the pythonic coercion is the opposite pattern and should be visually distinct.
271272

273+
**Functions with many optional nullable parameters:** For parameters typed as `Expr | <type> | None`, combine the `None` check with the `isinstance` check inline. Repeat this for each parameter — do not factor it into a local helper function, even if the repetition feels verbose. Consistency across the file is more important than DRY within a single function.
274+
275+
```python
276+
# For each optional parameter:
277+
if start is not None and not isinstance(start, Expr):
278+
start = Expr.literal(start)
279+
280+
# When passing to the Rust binding, extract .expr or pass None:
281+
f.some_func(
282+
values.expr,
283+
start.expr if start is not None else None,
284+
n.expr if n is not None else None,
285+
)
286+
```
287+
272288
## What NOT to Change
273289

274290
- **Do not change arguments that represent data columns.** If an argument is the primary data being operated on (e.g., the `string` in `left(string, n)` or the `array` in `array_sort(array)`), it should remain `Expr` only. Users should use `col()` for column references.

python/datafusion/functions.py

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1686,22 +1686,23 @@ def regexp_instr(
16861686
"""
16871687
if not isinstance(regex, Expr):
16881688
regex = Expr.literal(regex)
1689-
1690-
def _to_raw(val: Any) -> Any:
1691-
if val is None:
1692-
return None
1693-
if not isinstance(val, Expr):
1694-
val = Expr.literal(val)
1695-
return val.expr
1689+
if start is not None and not isinstance(start, Expr):
1690+
start = Expr.literal(start)
1691+
if n is not None and not isinstance(n, Expr):
1692+
n = Expr.literal(n)
1693+
if flags is not None and not isinstance(flags, Expr):
1694+
flags = Expr.literal(flags)
1695+
if sub_expr is not None and not isinstance(sub_expr, Expr):
1696+
sub_expr = Expr.literal(sub_expr)
16961697

16971698
return Expr(
16981699
f.regexp_instr(
16991700
values.expr,
17001701
regex.expr,
1701-
_to_raw(start),
1702-
_to_raw(n),
1703-
_to_raw(flags),
1704-
_to_raw(sub_expr),
1702+
start.expr if start is not None else None,
1703+
n.expr if n is not None else None,
1704+
flags.expr if flags is not None else None,
1705+
sub_expr.expr if sub_expr is not None else None,
17051706
)
17061707
)
17071708

0 commit comments

Comments
 (0)