Deprecate positional integer indexing on user-facing name parameters? #650

ecomodeller · 2026-05-14T15:36:15Z

ecomodeller
May 14, 2026
Maintainer

TL;DR — Many ModelSkill APIs (item=, model=, observation=, x_item=, y_item=, z_item=, …) accept int | str | None, where the int is a positional index into an underlying list of valid names. This shorthand is fragile (silent miscorrection on reorder, hidden positional defaults like x_item=0, collisions with numeric station IDs) and the historical ergonomic justification has weakened in the AI-assisted-coding era. This post proposes deprecating int positional access on public name parameters in favour of strings only, in a staged migration. Open for discussion before promoting to an ADR.

Context

The shared resolver lives at modelskill._names.get_name / get_idx and handles int | str | None:

PointObservation(data, item=), TrackObservation(item=, x_item=0, y_item=1), VerticalObservation(item=, z_item=0)
PointModelResult(item=), TrackModelResult(item=, x_item=, y_item=), ...
Comparer.from_matched(obs_item=, mod_items=, aux_items=, x_item=, y_item=, z_item=)
ComparerCollection.sel(model=, observation=, variable=)
Comparer.skill(model=), ComparerCollection.__getitem__(x)
Several Iterable[str | int] variants on the same APIs

The original justification was REPL ergonomics: obs = PointObservation("file.dfs0", item=0) is shorter than item="WaterLevel", and cc.skill(model=0) is shorter than cc.skill(model="MIKE21-2024-run-A"). The shorthand has carried real cost:

Silent miscorrection. Reordering the underlying list (a sort, a new model inserted at the front, a dict-iteration change) makes model=0 point at a different model with no error. String keys raise KeyError with the available names — the failure is loud and self-explaining.
Negative indexing. get_idx supports x < 0 wrap-around. There is no scenario where item=-1 is clearer than naming the column.
Positional defaults that encode file conventions. TrackObservation(x_item=0, y_item=1) assumes "x is column 0, y is column 1." Works for many dfs0/csv files produced in-house, breaks silently on any file that violates the convention. The default is a hidden contract.
Numeric station IDs. Observation names like "42" collide with the int branch — users sometimes pass 42 meaning the station and silently get the 43rd entry.
Type-signature noise. Every public method that accepts a name has int | str | None, three branches of validation, and tests for each. The cost is paid forever in the API contract.

The AI-coding angle

In 2026 most new ModelSkill usage comes from code written with AI assistance — Copilot/Cursor in editors, Claude Code / Cursor agents in terminals, AI extensions in Jupyter. This reframes the trade-off:

Arguments that AI tooling makes positional indexing more attractive:

Autocomplete suggests item=0 instantly; users don't need to know the name.
AI agents can inspect runtime state via the Jupyter kernel namespace (cc.mod_names) or by running code, so they can resolve names on demand.

Arguments that AI tooling makes positional indexing more dangerous — and which I find more weighty:

AI inherits training-prior fragility. Models trained on pandas/numpy/xarray idioms emit model=0 confidently because the pattern is everywhere in their training data. They will write model=0 even when the surrounding code makes it brittle. A human typing model=0 usually just printed cc.mod_names in the cell above; an AI agent often has not.
Auto-generated code escapes manual review at scale. When a human types model=0, they made one deliberate keystroke. When an agent generates a 20-line script, no one scrutinises every integer. Fragile defaults compound.
Silent wrong-answer failures are worst-case in an AI loop. When AI writes wrong code, the next thing it sees is the error. KeyError: "ModlA". Available: ['ModelA', 'ModelB'] is recoverable on the next iteration — the agent can pick the right name and retry. model=0 silently picking the wrong model produces incorrect skill numbers with no error, so neither the agent nor the user has a signal to fix on. String-only APIs convert a class of silent bugs into noisy ones.
AI agents are good at inspecting runtime state — for strings. In a Jupyter kernel an agent can call cc.mod_names and copy the right string in. That capability removes the historical motivation for positional access ("user doesn't know the name yet"). The agent now does know.
Static-type benefits are larger. AI assistants prioritise suggestions consistent with declared types. A model: str parameter narrows their suggestion space; a model: int | str | None widens it.

Net read: the ergonomic win of positional indexing is smaller in 2026 (autocomplete and AI suggestions remove the typing cost of long string names) while the fragility cost is larger (AI generates more code than humans review, and silent miscorrection is worst-case in an agentic feedback loop).

Proposal

Deprecate int in user-facing name parameters; keep str-only on the public API. The internal name → index lookup (string in, int out, used to index into xarray/numpy arrays) is unaffected.

Two API categories, two slightly different stories:

Category A — collection-side parameters (ComparerCollection.sel, Comparer.skill, ComparerCollection.__getitem__, etc.). Names are programmatically constructed inside ModelSkill from filenames or explicit name= kwargs. The user knows or can trivially list the names. Deprecating int here is pure win — no real ergonomic loss.

Category B — item-on-source parameters (PointObservation(item=), TrackModelResult(x_item=, y_item=), etc.). These pick a column out of an external file. The historical default x_item=0, y_item=1 encodes a file convention. Deprecating int here is more invasive, but the defaults are the worst offenders and the deprecation should target them first.

Scope

Deprecate int on Category A immediately (one minor release of warnings, then removal in the next).
Deprecate the int-typed defaults (x_item=0, y_item=1, z_item=0) on Category B in the same window; require explicit named columns.
Deprecate user-passed int on Category B item parameters on a longer timeline (two minor releases of warnings) because more existing notebooks rely on it.
None semantics ("auto-select when only one exists") is out of scope — separate fragility, different decision.
Internal callers that already pass str (most of them) need no change. Internal callers that genuinely need index output (get_idx(name, names) used to index into a numpy array) keep working; the function becomes str → int after the deprecation.

Migration plan

v1.next: emit DeprecationWarning when an int is passed to any user-facing name parameter. Warning text names the string alternative and lists the valid names. Defaults x_item=0, y_item=1, z_item=0 warn if the user did not pass them explicitly.
v1.next+1 (Category A): drop int from public signatures. _names.get_name stops accepting int from Category A call sites; signatures become str | None.
v1.next+2 (Category B): drop int from item parameters. Constructors require named columns. The positional-default behaviour is removed.
Internal cleanup: _names.get_name / get_idx collapse to str-only. The two functions become a thin existence check, possibly a single function.

Each step is independent and can be staged.

Alternatives considered

Keep current behaviour and improve error messages. Lower-cost but does not fix the silent-miscorrection failure mode, which is the largest concrete hazard in an AI-assisted workflow.

Split into .iloc-style methods (e.g., cc.iget(0) alongside cc["name"]). Pandas does this. Rejected as adding API surface without removing the underlying fragility — users would still reach for .iget for the same fragile reasons, and the conflation in a single parameter is the actual smell.

Runtime warnings only, no deprecation horizon. Half-measure. Either int positional is supported (in which case it should be a first-class option, documented, with stable semantics) or it is not (in which case warnings should have a removal date). A permanent warning is the worst of both.

Restrict deprecation to Category A only. Tempting because Category B is where most notebooks live. Rejected because the worst offenders — the implicit positional defaults in TrackObservation/VerticalObservation — are in Category B, and leaving them in place preserves the failure mode this proposal is trying to address.

Trade-offs

Public signatures shrink. Every int | str | None becomes str | None. Type checkers, AI suggestions, and human readers all benefit from the tighter contract.
Silent miscorrection becomes loud. KeyError with the list of valid names instead of an off-by-one skill computation.
REPL/notebook ergonomic hit, mostly absorbed by tooling. Users lose item=0 shorthand. In a kernel, AI assistants and tab-completion fill the gap by reading the actual names. In a static script, the names are typically available in nearby code anyway.
Real migration cost for downstream notebooks. Two minor releases of warnings give users a window to migrate. The warning message names the fix verbatim, so the migration is mechanical.
The _names.py module simplifies. After the deprecation, get_name and get_idx become string-only operations and can be merged or further simplified. Internal use cases that need a string → int conversion remain.
One source of asymmetry with pandas. Pandas keeps both .loc and .iloc. ModelSkill will not. The asymmetry is intentional: pandas operates on opaque user data where positional access is a primary modality; ModelSkill operates on ModelSkill-constructed collections where names are always known to the producer.

Open questions for discussion

Is the two-category split the right cut, or should Category B item parameters keep int support indefinitely because they're closer in spirit to pandas.read_csv(usecols=[0, 1])?
For Category B, should the deprecation window be longer than two minor releases given the volume of existing notebooks?
Should None-as-default behaviour ("auto-select if only one") be revisited at the same time, or kept separate?
Anyone reading this in an AI-assisted workflow today — does the agent-introspection argument match your experience, or do you see agents reliably calling cc.mod_names before writing the index?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate positional integer indexing on user-facing name parameters? #650

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Deprecate positional integer indexing on user-facing name parameters? #650

Uh oh!

ecomodeller May 14, 2026 Maintainer

Context

The AI-coding angle

Proposal

Scope

Migration plan

Alternatives considered

Trade-offs

Open questions for discussion

Replies: 0 comments

ecomodeller
May 14, 2026
Maintainer