This guide explains how to tailor CellScope for a specific domain or VRE without rewriting the core. It follows the actual code paths and data contracts in this repository.
If you only do one thing: use the review dialog (roles + file metadata) and
optionally a metadata config file (CELLSCOPE_METADATA_CONFIG). That covers
most domain adaptation needs.
UI review dialog / CLI hints
↓
RO-Crate entities (rocrate_io)
↓
SPARQL triples (indexer)
↓
Analyzer list + graph (labextension / sparql_summary)
Personalization is additive: you can add metadata without breaking existing parsers or consumers as long as you map it consistently across the pipeline.
In the JupyterLab UI, Analyze opens a review dialog before export.
Users can edit:
- Variable roles (free text): e.g.,
parameter,feature,dataset. - File metadata:
encodingFormat(MIME type)keywords(comma-separated)accessURL(source URL for remote data)etag(version tag)retrievedAt(ISO 8601 timestamp)
These fields are stored under hints and then embedded in the RO-Crate and
SPARQL projection.
Settings are stored in localStorage under cellscope:config:
- SPARQL endpoint and auth (token or basic auth)
- retry/backoff
- data source (
localorsparql) - environment/config files to package
Config files (e.g., requirements.txt, pyproject.toml, environment.yml)
are copied into env/ and parsed into softwareRequirements entries.
The CLI build command accepts a YAML/JSON hints file:
roles:
threshold: parameter
df: dataset
domains:
climate_readings.csv:
encodingFormat: text/csv
keywords: [climate, sensor]
accessURL: https://example.org/data/climate_readings.csv
etag: "W/\"abc123\""
retrievedAt: "2025-01-20T10:00:00Z"This structure matches what the UI generates.
cellscope/personalization.py loads a JSON file if the env var is set:
export CELLSCOPE_METADATA_CONFIG=/path/to/metadata_config.jsonCurrent behavior:
file_fieldsare mapped to RDF predicates incellscope/indexer.py.variable_fieldsis parsed but not used by default (see below).
Example config:
{
"file_fields": [
{"key": "encodingFormat", "predicate": "schema:encodingFormat"},
{"key": "accessURL", "predicate": "dcat:accessURL"},
{"key": "localPath", "predicate": "https://cellscope.dev/terms/localPath"},
{"key": "sensitivity", "predicate": "https://example.org/vocab#sensitivity"}
],
"variable_fields": [
{"key": "unit", "predicate": "https://qudt.org/schema/qudt/unit"}
]
}Important:
- File fields are read directly from file entities (
ro-crate-metadata.json). - Variable fields are reserved; to project them you must extend
cellscope/indexer.pyto read those fields from#var-*entities.
To add a new field that appears everywhere (RO-Crate, SPARQL, UI):
- Add the field to the UI review dialog (optional):
labextension/src/index.tsin_showReviewDialog().
- Store it in hints (roles/domains or a new hints section).
- Attach it to RO-Crate entities in
cellscope/rocrate_io.py. - Project it in
cellscope/indexer.py(or viaCELLSCOPE_METADATA_CONFIG). - Surface it in the UI list/filters in
labextension/src/index.ts.
This ensures parity between local mode and SPARQL mode.
Use sidecar JSON entities for domain objects that are not code cells (e.g., instruments, protocols, external registry entries).
Example sidecar:
{
"id": "https://example.org/instrument/CTD-42",
"type": "Instrument",
"name": "CTD-42",
"producer": 3,
"consumers": [5],
"role": "instrument"
}How it flows:
- Added by
rocrate_io.build_rocrate(). - Stored as a
ContextEntity. - Linked via
prov:wasGeneratedByorprov:used. - Indexed by the SPARQL generator like any other entity.
Add new patterns in cellscope/ast_capture.py:
- Extend
_collect_file_io()with new read/write APIs. - Extend
_collect_python_defs()for new definition patterns. - Update label extraction in
_extract_cell_label()if your notebooks use a different convention.
cellscope/containerizer_adapter.py is regex-based:
- Add read/write functions to
READ_CALLSandWRITE_CALLS. - Add path argument names in
FILE_ARG_NAMES. - Update
KEYWORDSif you see false positives.
cellscope/visualize.py controls the offline graph style:
- Node shapes and sizes
- Physics layout
- Popup panel HTML
The SPARQL graph handler (/cellscope/sparql_graph) injects the same
hover/click panel so local and SPARQL graphs stay consistent.
labextension/src/index.ts controls:
- Search highlighting and pinned exact matches
- Filter facets (kernel, roles, file metadata, edge via)
- Grouping by notebook label
Filter persistence is global:
- Key:
cellscope:filters:global
Hints persistence is per notebook:
- Key:
cellscope:hints:<encoded notebook path>
If you want per-notebook filters, modify _filterStorageKey().
Files referenced by the notebook are handled in rocrate_io.py:
- Local file exists -> copied into
files/and hashed. - Local file missing -> entity still created with
cellscope:localPath. - Remote file URL ->
accessURLstored; optional metadata retrieval.
These are opt-in:
CELLSCOPE_FETCH_REMOTE_METADATA=1(HEAD request for etag + dateModified)CELLSCOPE_FETCH_REMOTE_ARTIFACTS=1(download into crate)CELLSCOPE_REMOTE_MAX_BYTES(size cap)
Scenario: a virtual lab wants to track dataset sensitivity and instrument IDs.
- Add fields to the review dialog or hints file:
domains:
readings.csv:
sensitivity: restricted- Map that field into SPARQL:
{
"file_fields": [
{"key": "sensitivity", "predicate": "https://example.org/vocab#sensitivity"}
]
}- Add an instrument sidecar:
{
"id": "https://example.org/instrument/CTD-42",
"type": "Instrument",
"name": "CTD-42",
"producer": 2
}This approach remains VRE-agnostic: you do not need any NaaVRE-specific APIs or schema to integrate.
Recommended checks:
- Run CLI export:
python -m cellscope_cli build <notebook> --out out-lab - Inspect
ro-crate-metadata.jsonfor the new fields. - Run
cellscope_cli validateon the crate. - If SPARQL indexing is enabled, verify the new predicate appears in the
index/last_update.sparqloutput. - Confirm UI filters and graph panels show the new metadata.
- Static analysis only; dynamic path construction may be missed.
- R parser is heuristic and may miss advanced constructs.
variable_fieldsinCELLSCOPE_METADATA_CONFIGare not projected by default; add code incellscope/indexer.pyif needed.- File metadata hints are keyed by basename, not full path.
- Capture logic:
cellscope/ast_capture.py,cellscope/containerizer_adapter.py - RO-Crate mapping:
cellscope/rocrate_io.py - SPARQL mapping:
cellscope/indexer.py,cellscope/personalization.py - UI review dialog:
labextension/src/index.ts - Graph rendering:
cellscope/visualize.py