You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Give records real record_id + version columns (or a BIGSERIAL surrogate for the keyset, plus record_id for lookups), backfilled from the SRN. Today the bare internal id exists only as a substring of the srn PK, which forces three workarounds in the read path and keeps SRNs inside /data/ engine internals against the SRN-sidelining direction.
Motivation
get_record_by_id is a full-table scan. It matches srn LIKE 'urn:osa:%:rec:<id>@%' — the leading wildcard defeats any index — then filters versions in Python. With a real column it becomes an indexed equality + ORDER BY version DESC LIMIT 1. At the 1M-records scale SC-002 targets, this is the dominant cost of every single-record fetch.
Keyset tiebreaks and cursors lean on the SRN.sort=id aliases to the srn column, the route layer has to mirror that aliasing when encoding cursors (the bug fixed in 3f6b434), and cursors carry full SRNs. With real columns the aliasing disappears and cursors carry internal ids only.
Composite (record_id, version) unique pair vs a single BIGSERIAL surrogate for the keyset tiebreak (moot if design: record SRN id format — UUID vs sequential vs short-random #134 picks sequential). The composite is self-describing but makes the keyset 3 columns when the primary sort is something else; the surrogate keeps 2-column keysets and an int cursor component, but adds a column whose only job is ordering. (record_id is needed either way for the by-id lookup.)
Places to look during implementation
This is a non-exhaustive orientation map, not a plan. It reflects a sweep of today's tree (symbol-level references — line numbers drift). Step one of implementation is to redo the sweep (grep -rn 'RecordSRN.parse\|records_table\|:rec:') and diff against this list; trust the grep, not the list.
Schema / migration
server/osa/infrastructure/persistence/tables.py — records_table: srn TEXT PK, no id/version columns today.
server/migrations/ — needs a new Alembic revision with a backfill parsing existing SRNs (':rec:{id}@{version}').
server/osa/domain/ingest/handler/publish_batch.py — bulk-publish caller (likely unaffected if the mapper owns the columns; check).
Read path (unified /data/)
server/osa/infrastructure/data/postgres_data_read_store.py — get_record_by_id (the LIKE scan), _records_sort (sort=id → t.c.srn aliasing, srn tiebreak), _records_row_to_mapping (derives id/version by parsing the srn).
server/osa/application/api/v1/routes/data/_streaming.py — _next_cursor (the srn/tiebreak aliasing added in 3f6b434).
server/osa/domain/data/model/record_summary.py — RecordSummary.flatten (id derived from srn).
Tests that pin current srn-based behavior
server/tests/integration/conftest.py — seed_record (raw INSERT into records).
records.srn remains the PK and the citation/federation identity; /events changefeed and FR-015 response bodies keep carrying SRNs.
record_srn FKs from metadata.* / features.* tables stay on srn (metadata_table.py, feature_table.py).
/data/records/{id}[@{version}] route contract is unchanged — only its execution plan improves.
Sequencing
After #139 merges, and after (or jointly with) the #134 format decision. Roughly one focused PR: migration + write path + read path + tests. If #142 (infrastructure reorganization) lands first, the file paths above shift with its renames (persistence/ → postgres/, infrastructure/data/ → postgres/query/) — another reason to re-grep rather than follow the list.
Summary
Give
recordsrealrecord_id+versioncolumns (or aBIGSERIALsurrogate for the keyset, plusrecord_idfor lookups), backfilled from the SRN. Today the bare internal id exists only as a substring of thesrnPK, which forces three workarounds in the read path and keeps SRNs inside/data/engine internals against the SRN-sidelining direction.Motivation
get_record_by_idis a full-table scan. It matchessrn LIKE 'urn:osa:%:rec:<id>@%'— the leading wildcard defeats any index — then filters versions in Python. With a real column it becomes an indexed equality +ORDER BY version DESC LIMIT 1. At the 1M-records scale SC-002 targets, this is the dominant cost of every single-record fetch.sort=idaliases to thesrncolumn, the route layer has to mirror that aliasing when encoding cursors (the bug fixed in3f6b434), and cursors carry full SRNs. With real columns the aliasing disappears and cursors carry internal ids only.sort=idmay become meaningful — contingent on design: record SRN id format — UUID vs sequential vs short-random #134. Record ids are currently minted as UUID v4 (see design: record SRN id format — UUID vs sequential vs short-random #134; CLAUDE.md's "UUIDv7/ULID" line has drifted from the code), so id-order is random today. If design: record SRN id format — UUID vs sequential vs short-random #134 lands on sequential, v7, or ULID, id-order ≈ creation order; if v4 stays, this bullet evaporates but the other two motivations stand on their own.Design decisions to make first
record_id BIGINTis itself a perfect keyset key; the surrogate-vs-composite question below dissolves.record_idto be TEXT holding mixed shapes — unless the format decision includes a one-time backfill re-mint, which design: record SRN id format — UUID vs sequential vs short-random #134 leans against. Decide these together.(record_id, version)unique pair vs a singleBIGSERIALsurrogate for the keyset tiebreak (moot if design: record SRN id format — UUID vs sequential vs short-random #134 picks sequential). The composite is self-describing but makes the keyset 3 columns when the primary sort is something else; the surrogate keeps 2-column keysets and an int cursor component, but adds a column whose only job is ordering. (record_idis needed either way for the by-id lookup.)Places to look during implementation
This is a non-exhaustive orientation map, not a plan. It reflects a sweep of today's tree (symbol-level references — line numbers drift). Step one of implementation is to redo the sweep (
grep -rn 'RecordSRN.parse\|records_table\|:rec:') and diff against this list; trust the grep, not the list.Schema / migration
server/osa/infrastructure/persistence/tables.py—records_table:srnTEXT PK, no id/version columns today.server/migrations/— needs a new Alembic revision with a backfill parsing existing SRNs (':rec:{id}@{version}').Write path
server/osa/domain/record/service/record.py—RecordService.publish_record/bulk_publish(where Record identity is minted; also the mint sites design: record SRN id format — UUID vs sequential vs short-random #134 discusses).server/osa/infrastructure/persistence/mappers/record.py—record_to_dict/row_to_record(identity currently round-trips exclusively throughRecordSRN.parse).server/osa/infrastructure/persistence/repository/record.py—PostgresRecordRepository.save/save_many.server/osa/domain/ingest/handler/publish_batch.py— bulk-publish caller (likely unaffected if the mapper owns the columns; check).Read path (unified
/data/)server/osa/infrastructure/data/postgres_data_read_store.py—get_record_by_id(theLIKEscan),_records_sort(sort=id→t.c.srnaliasing, srn tiebreak),_records_row_to_mapping(derivesid/versionby parsing the srn).server/osa/application/api/v1/routes/data/_streaming.py—_next_cursor(the srn/tiebreak aliasing added in3f6b434).server/osa/domain/data/model/record_summary.py—RecordSummary.flatten(idderived fromsrn).Tests that pin current srn-based behavior
server/tests/integration/conftest.py—seed_record(raw INSERT intorecords).server/tests/integration/test_data_read_store_postgres.py—TestStreamPaginationOrdercursor tests encode srn values.server/tests/unit/application/api/v1/routes/data/test_streaming.py—test_paginated_records_id_sort_encodes_srn.server/tests/unit/infrastructure/data/test_cursor_validation.py— records cursor coercion pins.server/tests/integration/test_bulk_publish_dual_write.py— bulk-publish write coverage.Explicitly unchanged (by design)
records.srnremains the PK and the citation/federation identity;/eventschangefeed and FR-015 response bodies keep carrying SRNs.record_srnFKs frommetadata.*/features.*tables stay onsrn(metadata_table.py,feature_table.py)./data/records/{id}[@{version}]route contract is unchanged — only its execution plan improves.Sequencing
After #139 merges, and after (or jointly with) the #134 format decision. Roughly one focused PR: migration + write path + read path + tests. If #142 (infrastructure reorganization) lands first, the file paths above shift with its renames (
persistence/→postgres/,infrastructure/data/→postgres/query/) — another reason to re-grep rather than follow the list.Related