Skip to content

Comments

Add expandForValueSet to CodeSystemProvider for SQL-backed expansion#131

Open
jmandel wants to merge 2 commits intoHealthIntersections:mainfrom
jmandel:incremental-provider-enhancements
Open

Add expandForValueSet to CodeSystemProvider for SQL-backed expansion#131
jmandel wants to merge 2 commits intoHealthIntersections:mainfrom
jmandel:incremental-provider-enhancements

Conversation

@jmandel
Copy link
Contributor

@jmandel jmandel commented Feb 21, 2026

Summary

Adds one optional method to the CodeSystemProvider interface — expandForValueSet(spec) — that lets SQL-backed providers handle ValueSet expansion in a single query instead of the per-code iterator loop.

The worker groups compose includes/excludes by code system and passes the full hull (includes, excludes, activeOnly, paging) to the provider. Providers that implement this method return an async iterable of fully-resolved entries; the worker feeds them through includeCode() for dedup, import filtering, and FHIR construction. Providers that don't implement it (e.g., SNOMED) are completely unaffected — the existing iterator-oracle path runs unchanged.

What changed

Interface (tx/cs/cs-api.js)

  • New optional expandForValueSet(spec) method with JSDoc contract
  • Returns AsyncIterable<ExpandedEntry> or null (fall back to existing path)

Worker (tx/workers/expand.js)

  • Groups includes/excludes by code system before processing
  • Calls expandForValueSet first; marks handled systems to skip framework's manual include/exclude loops
  • Paging offset/count only passed for single-system composes (multi-system gets null — framework applies paging in finalization)

RxNorm (tx/cs/cs-rxnorm.js)

  • expandForValueSet using better-sqlite3 lazy cursors
  • TTY filter → WHERE IN, STY filter → JOIN rxnsty, concepts → WHERE RXCUI IN
  • GROUP BY RXCUI when JOINs cause row multiplication
  • Excludes, activeOnly, searchText pushed into SQL

LOINC (tx/cs/cs-loinc.js)

  • expandForValueSet using better-sqlite3 lazy cursors
  • Relationship filters → JOIN on Relationships table
  • STATUS → direct WHERE, CLASSTYPE → JOIN through Properties+PropertyValues
  • LIST/answers-for → reversed relationship JOIN
  • Multiple includes → UNION ALL with GROUP BY dedup

Other

  • Perf counters (tx/perf-counters.js) for timing expandForValueSet paths
  • locateMany / filterPage stubs on SNOMED, LOINC (may be useful for future IPC)
  • Bug fixes: searchFilter arg order, validate.js messages, TxParameters.assign paging fields

Benchmark results

RxNorm (13 tests, all pass, median 37x speedup):

Test Opt Base Speedup
filter-tty-sbd 6.8ms 249ms 37x
filter-tty-in-multi 1.4ms 476ms 342x
paged-offset-100 1.3ms 228ms 177x
text-aspirin 1.8ms TIMEOUT
filter-sty-t200 217ms 1522ms 7x

LOINC (14 tests, all pass, median 25x speedup):

Test Opt Base Speedup
filter-status-active 50ms 5325ms 106x
filter-class-chem 14ms 941ms 68x
filter-scale-qn 51ms 2611ms 51x
filter-classtype-lab 82ms 2021ms 25x

LOINC jest: 37/37 pass. Replay: no regressions on captured production queries.

Design doc

See docs/PROVIDER_INTERFACE_EVOLUTION.md for full rationale, SQL strategies, paging contract, and index decisions.

…nsion

Add one optional method to the CodeSystemProvider interface that lets
SQL-backed providers handle ValueSet expansion in a single query with
LIMIT/OFFSET instead of the per-code iterator loop.

The worker groups compose includes/excludes by code system and passes
the full hull to the provider. Providers returning an iterable skip the
framework's manual include/exclude loops entirely. Providers returning
null (e.g., SNOMED) fall back to the existing path unchanged.

RxNorm implementation: TTY/STY filter mapping to SQL, GROUP BY for JOIN
dedup, exclude/activeOnly/searchText push-down via better-sqlite3 cursors.

LOINC implementation: Relationship/Property/Status filter mapping, UNION
per include with GROUP BY dedup, all using existing indexes.

Paging offset/count are only passed for single-system composes where they
are exact. Multi-system composes still get filter/exclude push-down but
the framework handles paging in finalization.

Also includes: eager context loading in RxNorm (eliminates 3x redundant
SQL per code), searchFilter arg order fixes, TxParameters.assign fix,
perf counters, and props-only-when-requested optimization.

Benchmarks: RxNorm 13/13 pass (median 37x), LOINC 14/14 pass (median 25x).
No regressions on captured production query replay.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- scripts/test-expand-for-valueset.js: RxNorm benchmark (13 tests)
- scripts/test-loinc-expand.js: LOINC benchmark (14 tests)
- scripts/replay-sampled-terminology.js: replay test harness
- test-expand-results.txt: RxNorm results (7-341x speedup)
- test-loinc-expand-results.txt: LOINC results (2-106x speedup)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant