feat: searchable entry links — name-enriched links metadata + raw links.json (#143 #389)#390
Open
drernie wants to merge 7 commits into
Open
feat: searchable entry links — name-enriched links metadata + raw links.json (#143 #389)#390drernie wants to merge 7 commits into
drernie wants to merge 7 commits into
Conversation
) Pure parser over a Benchling entry dict that surfaces the objects an entry points at, in one place for both upcoming features: - extract_entity_references(): entity IDs from days[].notes[].links[] (filtered to entity types, dropping non-entity links like sql_dashboard) and from entity-link fields; deduped by ID. -> #143 entity packaging. - extract_results_tables(): results_table notes carrying assayResultSchemaId. -> #68/#69 assay results. - extract_note_links(): low-level all-links primitive. No Benchling API calls and no behavior change -- nothing consumes it yet, so it lands independently of either feature. 13 unit tests; black/isort/pyright clean. Refs #143 #68 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new pure-Python discovery module to extract (a) entity references and (b) assay results table references from a Benchling entry payload, with accompanying unit tests, intended to be reused by upcoming packaging/features (#143, #68/#69).
Changes:
- Introduces
src.entry_referenceswith helpers and dataclasses for extracting note links, entity references, and results-table references. - Adds unit tests validating filtering, deduplication, field-shape handling, and defensive behavior on missing keys.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| docker/src/entry_references.py | New extraction utilities for entity links and results-table references from entry dicts. |
| docker/tests/test_entry_references.py | New unit tests covering key extraction behaviors and defensive parsing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Generalize the discovery layer from entities-only to the full EntryLink enum (18 types from test/openapi.yaml), per #389. - classify_links(entry): surfaces ALL note links, each labeled with a LinkCategory (entity/inventory/reference/metadata/not_packageable/uncertain/ external/unknown). Consumers filter, e.g. `r.is_packageable`. Unknown/future types surface as UNKNOWN rather than being silently dropped. - LINK_TYPE_CATEGORY: type -> category for all 18 tokens; PACKAGEABLE_CATEGORIES. - Fix entity set: ENTITY_LINK_TYPES now {custom_entity, dna_sequence, aa_sequence, batch}. Adds `batch` (a real registry entity, was missed); drops dna_oligo/rna_oligo (NOT EntryLink types -- can't appear as note links). - spec/entry-link-types.json: human-facing reference map (category, packageable, id prefix, GET endpoint, webhook events) for all 18 types, plus the not-inline-linkable resources. A test asserts its categories match the module so it can't drift. 26 unit tests; black/isort/pyright clean. Refs #143 #389 #68 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When packaging an entry, write a references.json alongside entry.json listing
the Benchling objects the entry points at (entities, classified links, results
tables), discovered from the entry's note links and fields. No records are
fetched -- discovery only.
- entry_references.summarize_references(entry): JSON-serializable payload
({schema_version, entities, links, results_tables}); REFERENCES_SCHEMA_VERSION.
- entry_packager._create_metadata_files: emit references.json + document it in
the package README.
Review fixes (Greptile + Copilot):
- Drop empty-string entity IDs in _field_value_ids, matching the note-link guard.
- Narrow RESULTS_TABLE_NOTE_TYPES to {"results_table"} -- the only type carrying
assayResultSchemaId; avoids latently capturing generic/registration tables.
- Modernize typing (dict/list/tuple) on the 3.11+ codebase.
- Remove committed spec/entry-link-types.json (relocated to the project's
scripts/ as a research artifact) and its drift-guard test.
Full suite green (437).
Refs #143 #389 #68
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
41916bd to
26bbfb1
Compare
Confirms _field_value_ids drops empty strings (single + isMulti list), matching the note-link guard. Requested in review. Refs #143 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
|
||
| A Benchling entry points at other Benchling objects in three places: | ||
|
|
||
| 1. Note links -- ``days[].notes[].links[]``, each ``{id, type, webURL}``. ``type`` |
| ## Metadata Files | ||
| - `entry.json`: Key entry metadata (display_id, name, creator, authors, timestamps) | ||
| - `entry_data.json`: Complete entry data from Benchling API | ||
| - `references.json`: Benchling objects this entry links to (entities, inventory, tables) |
… + disposition
`packageable` conflated two questions: can a record be fetched, vs. should it
live in its own package or nested in the entry. Split into orthogonal axes and a
derived disposition.
- FETCHABLE_CATEGORIES (GET-by-id exists) + EVENTABLE_CATEGORIES (own webhooks,
can arrive independent of an entry) + CATEGORY_DISPOSITION.
- LinkRef.is_fetchable / is_eventable / disposition (replaces is_packageable).
- references.json links now carry {category, fetchable, eventable, disposition}.
disposition makes explicit that nest-vs-standalone is a genuine product decision
ONLY for entities (fetchable AND eventable -> nest_or_standalone). Non-entities
are forced: inventory -> nest (no events); entry/request/workflow -> link (own
package); metadata -> pointer; dashboards/external -> skip.
Project artifact scripts/entry-link-types.json updated to match (not in repo).
Refs #143 #389
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ent)
Promote a curated `links` array into entry.json (the package's metadata_uri)
so packages are searchable by the human-readable name of the entities/objects
an entry references — the top-line use case from the 2026-06-15 call ("show me
all experiments where QB-2743.1 was used").
Each curated link carries four fields, each with one job:
- type, id — free; id supports downstream linking
- name — authoritative Benchling display name via best-effort GET-by-id,
or null when the lookup fails/isn't supported (never a slug)
- slug — lossy token parsed from the webURL, for eyeballing/debugging only
Verified the human name is NOT recoverable from the webURL: the trailing
segment is a lowercased, punctuation-flattened slug (sBN000 -> sbn000), so the
exact name must come from the API. Name resolution is best-effort and never
raises — it requires the app to be a registry/project collaborator.
Also rename references.json -> links.json and reduce it to raw facts only
(id/type/web_url + entities + results_tables). Derived classifications
(category/fetchable/eventable/disposition) are no longer persisted; they are
recomputed from type in code, so the raw archive stays reprocessable and a
future classification change needs no re-fetch. Schema bumped to v2.
Refs #143 #389
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
When packaging a Benchling entry, the packager now makes the objects an entry references searchable by their human-readable name, and records the raw discovery separately. This is the top-line use case from the 2026-06-15 customer call: "show me all the experiments where QB-2743.1 was used."
Two artifacts, with a hard split between curated/searchable and raw facts:
1.
links— curated, searchable (promoted intoentry.json)entry.jsonis the package'smetadata_uri, so anything in it is queryable in Elasticsearch. We add a flatlinksarray, one entry per referenced object, four fields each with one job:type,id— free;idsupports downstream linking.name— authoritative Benchling display name via best-effortget_by_id, ornullwhen the lookup fails/isn't supported. The only search target, so every non-nulllinks.nameis a real name.slug— lossy token parsed from thewebURL, for eyeballing/debugging only. Never searched as a name, never folded intoname.Search scopes to
links.name; a bare keyword still matches the metadata.2.
links.json— raw facts (renamed fromreferences.json)Raw discovery of what the entry points at —
id/type/web_urlper link, plusentitiesandresults_tablesrows — so a re-parse or a changed classification never needs a re-fetch. Drops the derived classifications (category/fetchable/eventable/disposition): those are recomputed fromtypein code at runtime, not frozen to disk.schema_versionbumped to 2.Why
nameneeds an API call (verified)A note link is only
{id, type, webURL}— no name. ThewebURLtrailing segment is a lossy slug, confirmed againsttest/openapi.yaml:webURLslugname…/bfi-xCUXNVyG-sbn000/edit→sbn000sBN000…/seq_bhuDUw9D-test-oligo-abc/edit→test-oligo-abcExample DNA OligoQB-2743.1would appear asqb-2743-1(case +.gone), so a search would miss. The exact name must come from the API. Name resolution is best-effort and never raises, and requires the app to be a registry/project collaborator (setup requirement, not code).Discovery layer (
docker/src/entry_references.py, pure — no API calls)summarize_references(entry)→ the rawlinks.jsonpayload.link_metadata(entry)→ the curated{type, id, name: None, slug}skeleton (caller fillsname).slug_from_web_url(web_url, id)→ strict slug parse (returnsNoneunless an{id}-prefix is matched).classify_links/extract_entity_references/extract_results_tablescover the full 18-tokenEntryLinkenum.Name enrichment lives in the caller (
EntryPackager._enrich_link_names) via atype → SDK servicemap (entities first, then inventory and entries — the types flagged on the call); unmapped types keepname=null.Tests
Parser + packager suites updated for the v2 shape; added coverage for
slug_from_web_url,link_metadata, and theentry.json.links/links.jsonsplit. Full suite green (443).black+isort+pyrightclean.Refs #143 #389
🤖 Generated with Claude Code