design: record SRN id format — UUID vs sequential vs short-random

Discussion seed for the one remaining open identifier question after #129. #129 settles the slug-based identity for human-named declarative resources (Convention, Ontology, by extension hooks/ingesters). It also explicitly says Record and Deposition should "keep UUIDs" — and the reasoning given (domain identifiers belong in metadata, not in identity) is sound and not what this issue questions. The narrower open question is: granted that record SRNs should be opaque server-minted IDs, **what should that ID actually look like?**

## What the code does today

Record SRNs are minted at `server/osa/domain/record/service/record.py:84-88` and `:122-126`, inside `RecordService.bulk_publish()` and `publish_record()`:

```python
record_srn = RecordSRN(
    domain=self.node_domain,
    id=LocalId(str(uuid4())),
    version=RecordVersion(1),
)
```

That's UUID v4 (not v7, despite CLAUDE.md's "UUIDv7/ULID" claim). The `LocalId` grammar at `server/osa/domain/shared/model/srn.py:47` is `[a-z0-9-]{3,64}`, so the SRN schema doesn't enforce UUID shape — that's a service-layer choice.

## What the code rules out

Ingesters never see or mint record SRNs. `IngesterOutput` (`server/osa/domain/shared/port/ingester_runner.py:32-38`) and `IngesterRecord` (`server/osa/domain/ingest/model/ingester_record.py:21-54`) carry only `source_id` (the upstream identifier — e.g. an accession from the source system), `metadata`, and `files`. The record SRN's `{id}` plays no role in the ingester contract.

Deduplication is on `(source.type, source.id)` where `source.id` is the composite `"{convention_srn}:{upstream_source}"` (`publish_batch.py:99`). The record SRN's `{id}` plays no role in idempotency either.

→ The format choice is purely a publish-time, server-side concern. There's no client round-trip constraint forcing a particular shape.

## Options

| Option | Example SRN | Pros | Cons |
|---|---|---|---|
| **UUID v4 (status quo)** | `urn:osa:localhost:rec:a1b2c3d4-…` | No coordination; works today | Hard to communicate verbally; long; doesn't sort meaningfully |
| **UUID v7** | `urn:osa:localhost:rec:01938b…` | Time-ordered (DB-friendly); still no coordination | Same verbal/length pain as v4 |
| **Sequential integer (Postgres sequence)** | `urn:osa:localhost:rec:12345@1` | Citation-friendly; tiny; sortable; matches GenBank/Zenodo/Crossref conventions for *public* record IDs | Requires single DB sequence per node; reveals record counts |
| **Short random alphanumeric (nanoid/ulid)** | `urn:osa:localhost:rec:k7n2pq8x` | Compact; no coordination; collision-resistant | No real advantage over sequential in a domain-scoped namespace; harder to read aloud |
| **Structured codes (YYMM.NNNNN)** | `urn:osa:localhost:rec:2606.00042@1` | Encodes time; familiar from arXiv | Couples ID to publish time; meaningless for backfill / re-publish of historical data |

## Code-level constraints on each option

- **Sequential**: needs a Postgres `SEQUENCE` (or `bigserial`). `bulk_publish` would reserve N at once with `nextval()` calls or `nextval('seq', n)` style. Compatible with the current `save_many` + `ON CONFLICT DO NOTHING` pattern (`infrastructure/persistence/repository/record.py:27-49`) — collisions are still caught at the source-composite key, not the SRN.
- **UUID v7**: drop-in replacement for v4, no other changes.
- **ULID / nanoid**: add a dep, single-line change at the mint sites.
- **YYMM-style**: would need a per-month sequence and a fresh field; not a small change.

## Domain scoping consideration

Per #129 and the SRN model: identifiers only need to be unique within a node's domain. So the "what if two archives both have record 42" concern is a non-issue — `urn:osa:archive.uni.edu:rec:42` and `urn:osa:another.org:rec:42` are different SRNs.

This actively unlocks short identifiers. It's also what every major scientific archive does: GenBank accessions, PDB IDs, DOIs are all short and domain-scoped (by source/registry).

## Versioning is unaffected

`RecordVersion` at `srn.py:108-121` enforces `>= 1`, integer-only. Sequential IDs + integer versions compose fine: `rec:12345@1`, `rec:12345@2`, etc. None of the options here change that.

## Migration

Records are immutable and the SRN is the PK at `records_table.srn` (`tables.py:64`). Switching minting strategy:
- For existing rows: leave them alone. Old records keep UUID-shaped SRNs.
- For new rows: take the new format from a cutoff.
- Anything that parses record SRN IDs structurally (citation rendering? URL routing?) needs to tolerate both.

A separate one-time backfill could rewrite old IDs if desired, but the cost-benefit is probably "not worth it" — same reasoning as #129's note on migration.

## Open questions for discussion

1. Does the team prefer **sequential integers** (citation-friendly, matches archive precedent) or **UUID v7** (no coordination, time-orderable) or **status-quo UUID v4** (zero change)?
2. If sequential: is exposing record counts a real concern, or just a theoretical one for an open archive that publishes count stats anyway?
3. For ULID/nanoid specifically: is there a use case for client-mintable record IDs in the future (offline deposition? federated mirroring?) that would make non-sequential the safer long-term call?
4. Should `Deposition` follow the same answer as `Record` (both system-created, both opaque) or could they diverge?
5. Should CLAUDE.md's "UUIDv7/ULID" line be updated to reflect whatever's decided here? (Currently it's drifted from the v4 reality.)

No recommendation pushed here — surfacing options for discussion. The previous turn's analysis happened in chat without #129's "keep UUIDs" framing in view, and I want this thread to engage with that decision rather than work around it.

## Related

- #129 — slug-based identity for Convention and Ontology (the sibling decision; explicitly out-of-scope for Records)
- #111 — `replace bare str IDs with NewType/algebraic types at domain boundaries`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design: record SRN id format — UUID vs sequential vs short-random #134

What the code does today

What the code rules out

Options

Code-level constraints on each option

Domain scoping consideration

Versioning is unaffected

Migration

Open questions for discussion

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Option	Example SRN	Pros	Cons
UUID v4 (status quo)	`urn:osa:localhost:rec:a1b2c3d4-…`	No coordination; works today	Hard to communicate verbally; long; doesn't sort meaningfully
UUID v7	`urn:osa:localhost:rec:01938b…`	Time-ordered (DB-friendly); still no coordination	Same verbal/length pain as v4
Sequential integer (Postgres sequence)	`urn:osa:localhost:rec:12345@1`	Citation-friendly; tiny; sortable; matches GenBank/Zenodo/Crossref conventions for public record IDs	Requires single DB sequence per node; reveals record counts
Short random alphanumeric (nanoid/ulid)	`urn:osa:localhost:rec:k7n2pq8x`	Compact; no coordination; collision-resistant	No real advantage over sequential in a domain-scoped namespace; harder to read aloud
Structured codes (YYMM.NNNNN)	`urn:osa:localhost:rec:2606.00042@1`	Encodes time; familiar from arXiv	Couples ID to publish time; meaningless for backfill / re-publish of historical data

design: record SRN id format — UUID vs sequential vs short-random #134

Description

What the code does today

What the code rules out

Options

Code-level constraints on each option

Domain scoping consideration

Versioning is unaffected

Migration

Open questions for discussion

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions