refactor: add record_id/version columns to records; key /data/ reads and cursors on internal ids

## Summary

Give `records` real `record_id` + `version` columns (or a `BIGSERIAL` surrogate for the keyset, plus `record_id` for lookups), backfilled from the SRN. Today the bare internal id exists only as a substring of the `srn` PK, which forces three workarounds in the read path and keeps SRNs inside `/data/` engine internals against the SRN-sidelining direction.

## Motivation

- **`get_record_by_id` is a full-table scan.** It matches `srn LIKE 'urn:osa:%:rec:<id>@%'` — the leading wildcard defeats any index — then filters versions in Python. With a real column it becomes an indexed equality + `ORDER BY version DESC LIMIT 1`. At the 1M-records scale SC-002 targets, this is the dominant cost of every single-record fetch.
- **Keyset tiebreaks and cursors lean on the SRN.** `sort=id` aliases to the `srn` column, the route layer has to mirror that aliasing when encoding cursors (the bug fixed in `3f6b434`), and cursors carry full SRNs. With real columns the aliasing disappears and cursors carry internal ids only.
- **`sort=id` may become meaningful — contingent on #134.** Record ids are currently minted as **UUID v4** (see #134; CLAUDE.md's "UUIDv7/ULID" line has drifted from the code), so id-order is random today. If #134 lands on sequential, v7, or ULID, id-order ≈ creation order; if v4 stays, this bullet evaporates but the other two motivations stand on their own.

## Design decisions to make first

1. **#134 (record SRN id format) is effectively a prerequisite.** The format choice shapes this issue's schema:
   - **Sequential integer** → `record_id BIGINT` is itself a perfect keyset key; the surrogate-vs-composite question below dissolves.
   - **UUID v7 / ULID** → sortable text/uuid column; either tiebreak design works.
   - **UUID v4 (status quo)** → works, but id-order stays meaningless.
   - #134's migration stance (old records keep UUID-shaped SRNs from a cutoff, no re-mint) forces `record_id` to be TEXT holding mixed shapes — unless the format decision includes a one-time backfill re-mint, which #134 leans against. Decide these together.
2. **Composite `(record_id, version)` unique pair vs a single `BIGSERIAL` surrogate** for the keyset tiebreak (moot if #134 picks sequential). The composite is self-describing but makes the keyset 3 columns when the primary sort is something else; the surrogate keeps 2-column keysets and an int cursor component, but adds a column whose only job is ordering. (`record_id` is needed either way for the by-id lookup.)

## Places to look during implementation

**This is a non-exhaustive orientation map, not a plan.** It reflects a sweep of today's tree (symbol-level references — line numbers drift). Step one of implementation is to redo the sweep (`grep -rn 'RecordSRN.parse\|records_table\|:rec:'`) and diff against this list; trust the grep, not the list.

### Schema / migration
- `server/osa/infrastructure/persistence/tables.py` — `records_table`: `srn` TEXT PK, no id/version columns today.
- `server/migrations/` — needs a new Alembic revision with a backfill parsing existing SRNs (`':rec:{id}@{version}'`).

### Write path
- `server/osa/domain/record/service/record.py` — `RecordService.publish_record` / `bulk_publish` (where Record identity is minted; also the mint sites #134 discusses).
- `server/osa/infrastructure/persistence/mappers/record.py` — `record_to_dict` / `row_to_record` (identity currently round-trips exclusively through `RecordSRN.parse`).
- `server/osa/infrastructure/persistence/repository/record.py` — `PostgresRecordRepository.save` / `save_many`.
- `server/osa/domain/ingest/handler/publish_batch.py` — bulk-publish caller (likely unaffected if the mapper owns the columns; check).

### Read path (unified `/data/`)
- `server/osa/infrastructure/data/postgres_data_read_store.py` — `get_record_by_id` (the `LIKE` scan), `_records_sort` (`sort=id` → `t.c.srn` aliasing, srn tiebreak), `_records_row_to_mapping` (derives `id`/`version` by parsing the srn).
- `server/osa/application/api/v1/routes/data/_streaming.py` — `_next_cursor` (the srn/tiebreak aliasing added in `3f6b434`).
- `server/osa/domain/data/model/record_summary.py` — `RecordSummary.flatten` (`id` derived from `srn`).

### Tests that pin current srn-based behavior
- `server/tests/integration/conftest.py` — `seed_record` (raw INSERT into `records`).
- `server/tests/integration/test_data_read_store_postgres.py` — `TestStreamPaginationOrder` cursor tests encode srn values.
- `server/tests/unit/application/api/v1/routes/data/test_streaming.py` — `test_paginated_records_id_sort_encodes_srn`.
- `server/tests/unit/infrastructure/data/test_cursor_validation.py` — records cursor coercion pins.
- `server/tests/integration/test_bulk_publish_dual_write.py` — bulk-publish write coverage.

### Explicitly unchanged (by design)
- `records.srn` remains the PK and the citation/federation identity; `/events` changefeed and FR-015 response bodies keep carrying SRNs.
- `record_srn` FKs from `metadata.*` / `features.*` tables stay on `srn` (`metadata_table.py`, `feature_table.py`).
- `/data/records/{id}[@{version}]` route contract is unchanged — only its execution plan improves.

## Sequencing

After #139 merges, and after (or jointly with) the #134 format decision. Roughly one focused PR: migration + write path + read path + tests. If #142 (infrastructure reorganization) lands first, the file paths above shift with its renames (`persistence/` → `postgres/`, `infrastructure/data/` → `postgres/query/`) — another reason to re-grep rather than follow the list.

## Related

- #134 — record SRN id format (prerequisite design decision; also documents the v4-vs-CLAUDE.md drift)
- #142 — infrastructure reorganization (renames the directories referenced above)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: add record_id/version columns to records; key /data/ reads and cursors on internal ids #143

Summary

Motivation

Design decisions to make first

Places to look during implementation

Schema / migration

Write path

Read path (unified `/data/`)

Tests that pin current srn-based behavior

Explicitly unchanged (by design)

Sequencing

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

refactor: add record_id/version columns to records; key /data/ reads and cursors on internal ids #143

Description

Summary

Motivation

Design decisions to make first

Places to look during implementation

Schema / migration

Write path

Read path (unified /data/)

Tests that pin current srn-based behavior

Explicitly unchanged (by design)

Sequencing

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Read path (unified `/data/`)