Deprecate positional integer indexing on user-facing name parameters? #650
ecomodeller
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
TL;DR — Many ModelSkill APIs (
item=,model=,observation=,x_item=,y_item=,z_item=, …) acceptint | str | None, where the int is a positional index into an underlying list of valid names. This shorthand is fragile (silent miscorrection on reorder, hidden positional defaults likex_item=0, collisions with numeric station IDs) and the historical ergonomic justification has weakened in the AI-assisted-coding era. This post proposes deprecating int positional access on public name parameters in favour of strings only, in a staged migration. Open for discussion before promoting to an ADR.Context
The shared resolver lives at
modelskill._names.get_name/get_idxand handlesint | str | None:PointObservation(data, item=),TrackObservation(item=, x_item=0, y_item=1),VerticalObservation(item=, z_item=0)PointModelResult(item=),TrackModelResult(item=, x_item=, y_item=), ...Comparer.from_matched(obs_item=, mod_items=, aux_items=, x_item=, y_item=, z_item=)ComparerCollection.sel(model=, observation=, variable=)Comparer.skill(model=),ComparerCollection.__getitem__(x)Iterable[str | int]variants on the same APIsThe original justification was REPL ergonomics:
obs = PointObservation("file.dfs0", item=0)is shorter thanitem="WaterLevel", andcc.skill(model=0)is shorter thancc.skill(model="MIKE21-2024-run-A"). The shorthand has carried real cost:model=0point at a different model with no error. String keys raiseKeyErrorwith the available names — the failure is loud and self-explaining.get_idxsupportsx < 0wrap-around. There is no scenario whereitem=-1is clearer than naming the column.TrackObservation(x_item=0, y_item=1)assumes "x is column 0, y is column 1." Works for many dfs0/csv files produced in-house, breaks silently on any file that violates the convention. The default is a hidden contract."42"collide with the int branch — users sometimes pass42meaning the station and silently get the 43rd entry.int | str | None, three branches of validation, and tests for each. The cost is paid forever in the API contract.The AI-coding angle
In 2026 most new ModelSkill usage comes from code written with AI assistance — Copilot/Cursor in editors, Claude Code / Cursor agents in terminals, AI extensions in Jupyter. This reframes the trade-off:
Arguments that AI tooling makes positional indexing more attractive:
item=0instantly; users don't need to know the name.cc.mod_names) or by running code, so they can resolve names on demand.Arguments that AI tooling makes positional indexing more dangerous — and which I find more weighty:
model=0confidently because the pattern is everywhere in their training data. They will writemodel=0even when the surrounding code makes it brittle. A human typingmodel=0usually just printedcc.mod_namesin the cell above; an AI agent often has not.model=0, they made one deliberate keystroke. When an agent generates a 20-line script, no one scrutinises every integer. Fragile defaults compound.KeyError: "ModlA". Available: ['ModelA', 'ModelB']is recoverable on the next iteration — the agent can pick the right name and retry.model=0silently picking the wrong model produces incorrect skill numbers with no error, so neither the agent nor the user has a signal to fix on. String-only APIs convert a class of silent bugs into noisy ones.cc.mod_namesand copy the right string in. That capability removes the historical motivation for positional access ("user doesn't know the name yet"). The agent now does know.model: strparameter narrows their suggestion space; amodel: int | str | Nonewidens it.Net read: the ergonomic win of positional indexing is smaller in 2026 (autocomplete and AI suggestions remove the typing cost of long string names) while the fragility cost is larger (AI generates more code than humans review, and silent miscorrection is worst-case in an agentic feedback loop).
Proposal
Deprecate
intin user-facing name parameters; keepstr-only on the public API. The internal name → index lookup (string in, int out, used to index into xarray/numpy arrays) is unaffected.Two API categories, two slightly different stories:
Category A — collection-side parameters (
ComparerCollection.sel,Comparer.skill,ComparerCollection.__getitem__, etc.). Names are programmatically constructed inside ModelSkill from filenames or explicitname=kwargs. The user knows or can trivially list the names. Deprecating int here is pure win — no real ergonomic loss.Category B — item-on-source parameters (
PointObservation(item=),TrackModelResult(x_item=, y_item=), etc.). These pick a column out of an external file. The historical defaultx_item=0, y_item=1encodes a file convention. Deprecating int here is more invasive, but the defaults are the worst offenders and the deprecation should target them first.Scope
inton Category A immediately (one minor release of warnings, then removal in the next).x_item=0,y_item=1,z_item=0) on Category B in the same window; require explicit named columns.inton Category B item parameters on a longer timeline (two minor releases of warnings) because more existing notebooks rely on it.Nonesemantics ("auto-select when only one exists") is out of scope — separate fragility, different decision.str(most of them) need no change. Internal callers that genuinely need index output (get_idx(name, names)used to index into a numpy array) keep working; the function becomesstr → intafter the deprecation.Migration plan
DeprecationWarningwhen an int is passed to any user-facing name parameter. Warning text names the string alternative and lists the valid names. Defaultsx_item=0, y_item=1, z_item=0warn if the user did not pass them explicitly.intfrom public signatures._names.get_namestops acceptingintfrom Category A call sites; signatures becomestr | None.intfromitemparameters. Constructors require named columns. The positional-default behaviour is removed._names.get_name/get_idxcollapse tostr-only. The two functions become a thin existence check, possibly a single function.Each step is independent and can be staged.
Alternatives considered
Keep current behaviour and improve error messages. Lower-cost but does not fix the silent-miscorrection failure mode, which is the largest concrete hazard in an AI-assisted workflow.
Split into
.iloc-style methods (e.g.,cc.iget(0)alongsidecc["name"]). Pandas does this. Rejected as adding API surface without removing the underlying fragility — users would still reach for.igetfor the same fragile reasons, and the conflation in a single parameter is the actual smell.Runtime warnings only, no deprecation horizon. Half-measure. Either int positional is supported (in which case it should be a first-class option, documented, with stable semantics) or it is not (in which case warnings should have a removal date). A permanent warning is the worst of both.
Restrict deprecation to Category A only. Tempting because Category B is where most notebooks live. Rejected because the worst offenders — the implicit positional defaults in
TrackObservation/VerticalObservation— are in Category B, and leaving them in place preserves the failure mode this proposal is trying to address.Trade-offs
int | str | Nonebecomesstr | None. Type checkers, AI suggestions, and human readers all benefit from the tighter contract.KeyErrorwith the list of valid names instead of an off-by-one skill computation.item=0shorthand. In a kernel, AI assistants and tab-completion fill the gap by reading the actual names. In a static script, the names are typically available in nearby code anyway._names.pymodule simplifies. After the deprecation,get_nameandget_idxbecome string-only operations and can be merged or further simplified. Internal use cases that need a string → int conversion remain..locand.iloc. ModelSkill will not. The asymmetry is intentional: pandas operates on opaque user data where positional access is a primary modality; ModelSkill operates on ModelSkill-constructed collections where names are always known to the producer.Open questions for discussion
pandas.read_csv(usecols=[0, 1])?None-as-default behaviour ("auto-select if only one") be revisited at the same time, or kept separate?cc.mod_namesbefore writing the index?Beta Was this translation helpful? Give feedback.
All reactions