Discussion seed for the one remaining open identifier question after #129. #129 settles the slug-based identity for human-named declarative resources (Convention, Ontology, by extension hooks/ingesters). It also explicitly says Record and Deposition should "keep UUIDs" — and the reasoning given (domain identifiers belong in metadata, not in identity) is sound and not what this issue questions. The narrower open question is: granted that record SRNs should be opaque server-minted IDs, what should that ID actually look like?
What the code does today
Record SRNs are minted at server/osa/domain/record/service/record.py:84-88 and :122-126, inside RecordService.bulk_publish() and publish_record():
record_srn = RecordSRN(
domain=self.node_domain,
id=LocalId(str(uuid4())),
version=RecordVersion(1),
)
That's UUID v4 (not v7, despite CLAUDE.md's "UUIDv7/ULID" claim). The LocalId grammar at server/osa/domain/shared/model/srn.py:47 is [a-z0-9-]{3,64}, so the SRN schema doesn't enforce UUID shape — that's a service-layer choice.
What the code rules out
Ingesters never see or mint record SRNs. IngesterOutput (server/osa/domain/shared/port/ingester_runner.py:32-38) and IngesterRecord (server/osa/domain/ingest/model/ingester_record.py:21-54) carry only source_id (the upstream identifier — e.g. an accession from the source system), metadata, and files. The record SRN's {id} plays no role in the ingester contract.
Deduplication is on (source.type, source.id) where source.id is the composite "{convention_srn}:{upstream_source}" (publish_batch.py:99). The record SRN's {id} plays no role in idempotency either.
→ The format choice is purely a publish-time, server-side concern. There's no client round-trip constraint forcing a particular shape.
Options
| Option |
Example SRN |
Pros |
Cons |
| UUID v4 (status quo) |
urn:osa:localhost:rec:a1b2c3d4-… |
No coordination; works today |
Hard to communicate verbally; long; doesn't sort meaningfully |
| UUID v7 |
urn:osa:localhost:rec:01938b… |
Time-ordered (DB-friendly); still no coordination |
Same verbal/length pain as v4 |
| Sequential integer (Postgres sequence) |
urn:osa:localhost:rec:12345@1 |
Citation-friendly; tiny; sortable; matches GenBank/Zenodo/Crossref conventions for public record IDs |
Requires single DB sequence per node; reveals record counts |
| Short random alphanumeric (nanoid/ulid) |
urn:osa:localhost:rec:k7n2pq8x |
Compact; no coordination; collision-resistant |
No real advantage over sequential in a domain-scoped namespace; harder to read aloud |
| Structured codes (YYMM.NNNNN) |
urn:osa:localhost:rec:2606.00042@1 |
Encodes time; familiar from arXiv |
Couples ID to publish time; meaningless for backfill / re-publish of historical data |
Code-level constraints on each option
- Sequential: needs a Postgres
SEQUENCE (or bigserial). bulk_publish would reserve N at once with nextval() calls or nextval('seq', n) style. Compatible with the current save_many + ON CONFLICT DO NOTHING pattern (infrastructure/persistence/repository/record.py:27-49) — collisions are still caught at the source-composite key, not the SRN.
- UUID v7: drop-in replacement for v4, no other changes.
- ULID / nanoid: add a dep, single-line change at the mint sites.
- YYMM-style: would need a per-month sequence and a fresh field; not a small change.
Domain scoping consideration
Per #129 and the SRN model: identifiers only need to be unique within a node's domain. So the "what if two archives both have record 42" concern is a non-issue — urn:osa:archive.uni.edu:rec:42 and urn:osa:another.org:rec:42 are different SRNs.
This actively unlocks short identifiers. It's also what every major scientific archive does: GenBank accessions, PDB IDs, DOIs are all short and domain-scoped (by source/registry).
Versioning is unaffected
RecordVersion at srn.py:108-121 enforces >= 1, integer-only. Sequential IDs + integer versions compose fine: rec:12345@1, rec:12345@2, etc. None of the options here change that.
Migration
Records are immutable and the SRN is the PK at records_table.srn (tables.py:64). Switching minting strategy:
- For existing rows: leave them alone. Old records keep UUID-shaped SRNs.
- For new rows: take the new format from a cutoff.
- Anything that parses record SRN IDs structurally (citation rendering? URL routing?) needs to tolerate both.
A separate one-time backfill could rewrite old IDs if desired, but the cost-benefit is probably "not worth it" — same reasoning as #129's note on migration.
Open questions for discussion
- Does the team prefer sequential integers (citation-friendly, matches archive precedent) or UUID v7 (no coordination, time-orderable) or status-quo UUID v4 (zero change)?
- If sequential: is exposing record counts a real concern, or just a theoretical one for an open archive that publishes count stats anyway?
- For ULID/nanoid specifically: is there a use case for client-mintable record IDs in the future (offline deposition? federated mirroring?) that would make non-sequential the safer long-term call?
- Should
Deposition follow the same answer as Record (both system-created, both opaque) or could they diverge?
- Should CLAUDE.md's "UUIDv7/ULID" line be updated to reflect whatever's decided here? (Currently it's drifted from the v4 reality.)
No recommendation pushed here — surfacing options for discussion. The previous turn's analysis happened in chat without #129's "keep UUIDs" framing in view, and I want this thread to engage with that decision rather than work around it.
Related
Discussion seed for the one remaining open identifier question after #129. #129 settles the slug-based identity for human-named declarative resources (Convention, Ontology, by extension hooks/ingesters). It also explicitly says Record and Deposition should "keep UUIDs" — and the reasoning given (domain identifiers belong in metadata, not in identity) is sound and not what this issue questions. The narrower open question is: granted that record SRNs should be opaque server-minted IDs, what should that ID actually look like?
What the code does today
Record SRNs are minted at
server/osa/domain/record/service/record.py:84-88and:122-126, insideRecordService.bulk_publish()andpublish_record():That's UUID v4 (not v7, despite CLAUDE.md's "UUIDv7/ULID" claim). The
LocalIdgrammar atserver/osa/domain/shared/model/srn.py:47is[a-z0-9-]{3,64}, so the SRN schema doesn't enforce UUID shape — that's a service-layer choice.What the code rules out
Ingesters never see or mint record SRNs.
IngesterOutput(server/osa/domain/shared/port/ingester_runner.py:32-38) andIngesterRecord(server/osa/domain/ingest/model/ingester_record.py:21-54) carry onlysource_id(the upstream identifier — e.g. an accession from the source system),metadata, andfiles. The record SRN's{id}plays no role in the ingester contract.Deduplication is on
(source.type, source.id)wheresource.idis the composite"{convention_srn}:{upstream_source}"(publish_batch.py:99). The record SRN's{id}plays no role in idempotency either.→ The format choice is purely a publish-time, server-side concern. There's no client round-trip constraint forcing a particular shape.
Options
urn:osa:localhost:rec:a1b2c3d4-…urn:osa:localhost:rec:01938b…urn:osa:localhost:rec:12345@1urn:osa:localhost:rec:k7n2pq8xurn:osa:localhost:rec:2606.00042@1Code-level constraints on each option
SEQUENCE(orbigserial).bulk_publishwould reserve N at once withnextval()calls ornextval('seq', n)style. Compatible with the currentsave_many+ON CONFLICT DO NOTHINGpattern (infrastructure/persistence/repository/record.py:27-49) — collisions are still caught at the source-composite key, not the SRN.Domain scoping consideration
Per #129 and the SRN model: identifiers only need to be unique within a node's domain. So the "what if two archives both have record 42" concern is a non-issue —
urn:osa:archive.uni.edu:rec:42andurn:osa:another.org:rec:42are different SRNs.This actively unlocks short identifiers. It's also what every major scientific archive does: GenBank accessions, PDB IDs, DOIs are all short and domain-scoped (by source/registry).
Versioning is unaffected
RecordVersionatsrn.py:108-121enforces>= 1, integer-only. Sequential IDs + integer versions compose fine:rec:12345@1,rec:12345@2, etc. None of the options here change that.Migration
Records are immutable and the SRN is the PK at
records_table.srn(tables.py:64). Switching minting strategy:A separate one-time backfill could rewrite old IDs if desired, but the cost-benefit is probably "not worth it" — same reasoning as #129's note on migration.
Open questions for discussion
Depositionfollow the same answer asRecord(both system-created, both opaque) or could they diverge?No recommendation pushed here — surfacing options for discussion. The previous turn's analysis happened in chat without #129's "keep UUIDs" framing in view, and I want this thread to engage with that decision rather than work around it.
Related
replace bare str IDs with NewType/algebraic types at domain boundaries