Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
cd70929
feat(waterdata): Add multi-value GET-parameter chunker for OGC API
thodson-usgs May 17, 2026
46335b6
fix(waterdata): Reject smuggled lists for scalar-contract chunker inputs
thodson-usgs May 18, 2026
9bc342b
refactor(waterdata): Unify list and filter chunkers into one joint pl…
thodson-usgs May 18, 2026
10858e9
refactor(waterdata): Share a single URL-byte sizing primitive across …
thodson-usgs May 18, 2026
4e82722
refactor(waterdata): Tighten the joint chunker
thodson-usgs May 18, 2026
ee550be
refactor(waterdata): Polish — extract _resolve_max_chunks, tidy iter_…
thodson-usgs May 18, 2026
f1588ae
docs(waterdata): Frame _NEVER_CHUNK as exceptions to a default-chunk …
thodson-usgs May 18, 2026
493e4eb
test(waterdata): Add offline stress test for the joint chunker
thodson-usgs May 18, 2026
1348304
perf(test): Cut stress test wall-clock 55% — capture URL bytes inline…
thodson-usgs May 18, 2026
f16555d
refactor(waterdata): Address PR #283 review — relocate chunker helper…
thodson-usgs May 18, 2026
eeba277
docs(tests): Drop stale "two-decorator design" references in test prose
thodson-usgs May 18, 2026
01e579e
refactor(waterdata): Replace static max_chunks/safety_floor with dyna…
thodson-usgs May 19, 2026
592c207
test(waterdata): Split chunker tests into tests/waterdata_chunking_te…
thodson-usgs May 19, 2026
f615db8
test(waterdata): Drop tests/stress_chunker.py — invariants now covere…
thodson-usgs May 19, 2026
c475452
refactor(waterdata): /simplify pass — typed RateLimited exception, dr…
thodson-usgs May 19, 2026
24fd158
refactor(waterdata): Extract ChunkPlan + _ChunkExecution; unify passt…
thodson-usgs May 19, 2026
5d931fa
refactor(waterdata): /simplify pass on ChunkPlan — skip work on the p…
thodson-usgs May 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit. A common chained-query pattern — pull a long site list from `get_monitoring_locations`, then feed it into `get_daily` — previously failed with HTTP 414 once the resulting URL grew past the limit; it now fans out across multiple sub-requests under the hood and returns one combined DataFrame. The chunker coordinates with the existing CQL `filter` chunker (long top-level-`OR` filters still split correctly when used alongside long multi-value lists), caps cartesian-product plans at 1000 sub-requests (the default USGS hourly quota), and aborts mid-call with a structured `QuotaExhausted` exception — carrying the partial result and a resume offset — if `x-ratelimit-remaining` drops below a safety floor. Mirrors R `dataRetrieval`'s [#870](https://github.com/DOI-USGS/dataRetrieval/pull/870), generalized to N dimensions. Note one metadata-behavior change for paginated/chunked calls: `BaseMetadata.url` still reflects the user's original query (unchanged), but `BaseMetadata.header` now carries the *last* page's / sub-request's headers (so `x-ratelimit-remaining` is current) rather than the first, and `BaseMetadata.query_time` is now the cumulative wall-clock across pages instead of the first page's elapsed.

**05/16/2026:** Fixed silent truncation in the paginated `waterdata` request loops (`_walk_pages` and `get_stats_data`). Mid-pagination failures (HTTP 429, 5xx, network error) were previously swallowed — pagination would quietly stop and the function would return whatever rows it had collected, leaving callers with truncated DataFrames they had no way to detect. The loops now status-check every page like the initial request and raise `RuntimeError` on any failure, with the upstream exception chained as `__cause__` and a short menu of recovery actions (wait and retry, reduce the request, or obtain an API token) in the message. **Behavior change**: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception; retry the call (possibly with a smaller `limit` or narrower query).

**05/07/2026:** Bumped the declared minimum Python version from **3.8** to **3.9** (`pyproject.toml`'s `requires-python` and the ruff target). This brings the manifest in line with what was already being tested — CI's matrix has long covered only 3.9, 3.13, and 3.14, the `waterdata` test module already skipped itself on Python < 3.10, and several modules already use 3.9-only stdlib (e.g. `zoneinfo`). Users on 3.8 will no longer be able to install the package; please upgrade.
Expand Down
15 changes: 15 additions & 0 deletions dataretrieval/waterdata/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,21 @@ def get_daily(
... parameter_code="00060",
... last_modified="P7D",
... )

>>> # Chain queries: pull all stream sites in a state, then their
>>> # daily discharge for the last week. The site list can be hundreds
>>> # of values long — the request is transparently chunked across
>>> # multiple sub-requests so the URL stays under the server's byte
>>> # limit. Combined output looks like a single query.
>>> sites_df, _ = dataretrieval.waterdata.get_monitoring_locations(
... state_name="Ohio",
... site_type="Stream",
... )
>>> df, md = dataretrieval.waterdata.get_daily(
... monitoring_location_id=sites_df["monitoring_location_id"].tolist(),
... parameter_code="00060",
... time="P7D",
... )
"""
service = "daily"
output_id = "daily_id"
Expand Down
Loading
Loading