Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
6bbaa97
config, write, validate, read planning prompts
reneyagmur Jun 1, 2026
d9a9e32
edited plan
reneyagmur Jun 1, 2026
ccd5b26
editing architechture and plans, maximalist version
reneyagmur Jun 2, 2026
d2fe925
final iteration -1
reneyagmur Jun 9, 2026
dd2e940
final plans
reneyagmur Jun 10, 2026
e1f8d3a
config for global outdir and open to more config settings
reneyagmur Jun 10, 2026
822df72
sync local co
reneyagmur Jun 10, 2026
eff8f99
write plan edits
reneyagmur Jun 10, 2026
ac5e9b0
write spec and registry and functions
reneyagmur Jun 11, 2026
7842182
io init and validation based on write registry
reneyagmur Jun 11, 2026
954d589
notebook migration refinement
reneyagmur Jun 12, 2026
f573b99
notebook miogration and patches to writespec
reneyagmur Jun 15, 2026
046df98
minnie save v 1300 migration notebooks
reneyagmur Jun 15, 2026
82aaad8
rewired output root in write functions, v1dd explore
reneyagmur Jun 16, 2026
76c904f
added calcium imaging to correlative connectivity
reneyagmur Jun 17, 2026
20b77f8
v1dd skeleton and initial push
reneyagmur Jun 17, 2026
0122e2d
notebook migration clean up
reneyagmur Jun 23, 2026
29c9852
tighten tests before pr, plans
reneyagmur Jun 23, 2026
aa7aaf4
tests tightening done
reneyagmur Jun 23, 2026
468cf69
start to clean up documentation and pr message
reneyagmur Jun 23, 2026
d2bec9b
etl example prompt and readme updated
reneyagmur Jun 23, 2026
306d263
edited readme
reneyagmur Jun 23, 2026
77c32a8
docstring and header cleanup, pr message drafting cleanup
reneyagmur Jun 23, 2026
49c821a
arrange past prompts planning as docu, changelog structure
reneyagmur Jun 23, 2026
2dab4e9
read union merge noteboo migration regression for patchseq notebooks
reneyagmur Jun 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .codeocean/datasets.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@
"id": "78a80081-c645-4e38-beb7-b9d9308a35d9",
"mount": "microns1412"
},
{
"id": "aafc99cc-92ee-4d04-b152-92f1063a3268",
"mount": "v1dd_1196"
},
{
"id": "aff09b9b-5cdc-49ef-8e39-358a8ead98d8",
"mount": "visp-patchseq-taxonomy-info"
Expand Down
54 changes: 54 additions & 0 deletions .github/instructions/changelog.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
description: "Use when editing CHANGELOG.md, drafting release notes, or summarizing user-visible changes. Enforces Keep a Changelog format, SemVer scope, and the user-voice rule."
applyTo: "CHANGELOG.md"
---
# Changelog rules

The changelog is the user-facing log of what changed in
`connects_common_connectivity`. It is **not** an internal work journal.

## Format
- [Keep a Changelog 1.1.0](https://keepachangelog.com/en/1.1.0/) +
[SemVer](https://semver.org/spec/v2.0.0.html).
- All new entries go under `## [Unreleased]` until a release is cut.
- Use only the standard sections: `Added`, `Changed`, `Deprecated`, `Removed`,
`Fixed`, `Security`. Omit empty sections in released versions; keep them as
empty headers under `[Unreleased]` so contributors see the slots.
- Newest version on top. Releases are `## [X.Y.Z] - YYYY-MM-DD`.

## Voice and scope (the rule that actually matters)
- Write in **user voice**: what changed for someone who imports
`connects_common_connectivity`, runs the `ccc` CLI, or follows the README.
- One bullet per change. Past tense, present-perfect-style is fine
(`Added …`, `Moved …`, `Fixed …`). No first person, no narrative.
- **Include**: new public names, removed public names, moved import paths,
changed signatures, changed defaults, behavior fixes a user could observe,
new CLI flags, new config keys, dropped Python versions.
- **Exclude**: internal refactors, test-only changes, planning-doc edits,
prompt/agent-customization edits, dev-tooling tweaks, comment-only changes.
If a user couldn't notice it, it doesn't belong here.
- If a change has both an internal and a user-visible side, log only the
user-visible side.

## Linking
- Reference public names in backticks: `` `write_models` ``, `` `io.writers` ``.
- Link to issues/PRs only when they add information a user would want
(`#123`); do not link to internal planning docs.

## Deprecations and removals
- Announce in `Deprecated` first (one release minimum) before moving to
`Removed`, except for genuinely unused or never-released names.
- Name the replacement when there is one: "Deprecated `X`; use `Y` instead."

## Releasing (manual for now)
1. Rename `## [Unreleased]` to `## [X.Y.Z] - YYYY-MM-DD` (today's date).
2. Drop empty subsections from the released block.
3. Add a fresh `## [Unreleased]` at the top with all six empty sub-headers.
4. Bump the version in `pyproject.toml` in the same commit.

## Anti-patterns
- "Refactored internals." — internal, drop it.
- "Updated planning docs." — internal, drop it.
- "Various fixes." — split into specific bullets or drop.
- "Added new feature." — name the public symbol or describe the behavior.
- Long prose paragraphs — one bullet, one change.
90 changes: 90 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

### Changed

### Deprecated

### Removed

### Fixed

### Security

## [0.2.0] - 2026-06-23

### Added

- Added `connects_common_connectivity.config` with `Settings`,
`get_settings()`, `find_config_file()`, `output_root()`, and
`table_path()`. Settings are discovered from a `ccc_config.yaml` at (or
above) the cwd; `CCC_OUTPUT_ROOT` overrides `output_root`. Relative
`output_root` values are anchored at the config file's directory so a
notebook in `code/` and a script at the repo root resolve to the same
place.
- Added curated public API at `connects_common_connectivity.io`:
`write_models()` (single dispatch core for all generated pydantic
models), `write_projection_matrix()`, `WriteResult`,
`WRITABLE_CLASSES`, and re-exports of `get_settings`, `Settings`, and
`table_path`. The surface is pinned by `__all__`.
- Added write-time validation: `write_models()` now re-validates each
model through a runtime-derived strict subclass that flips
`WriteSpec.required_for_write` slots to non-optional, raising
`ValueError` before any IO if a write-required slot is missing or
`None`. Public helpers `strict_model_for()` and `validate_for_write()`
live in `connects_common_connectivity.io.write_validation`.
- Added `WriteSpec` registry entries for `AlgorithmRun` and
`HierarchyCategory` (both project-agnostic, scope=`["id"]`,
`overwrite_scoped`). These classes are now writable through
`write_models(...)` and surface in `WRITABLE_CLASSES`.
- Added an `output_root=` keyword to `write_models()` and
`write_projection_matrix()` for per-call overrides of the on-disk root.
Accepts a `str` or `Path` and writes to `<output_root>/<spec.subdir>/`,
bypassing `ccc_config.yaml` for that call. Mutually exclusive with
`settings=` (passing both raises `TypeError`). Lets a single notebook
redirect its writes (e.g. an isolated test dataset) without mutating
process-global config or environment variables.
- Added `populate_region_coverage()` in
`connects_common_connectivity.io.write_utils` for deriving
`ProjectionMeasurementMatrix.region_coverage` from a dense matrix.
- Added `CALCIUM_IMAGING` value to the `Modality` enum for calcium
imaging based functional correlations.

### Changed

- Migrated `code/etl_*.ipynb` notebooks to the curated IO API:
hardcoded `OUTPUT_ROOT = "../scratch/..."` strings are replaced with
`output_root()` from `connects_common_connectivity.config`, and
hand-rolled `write_deltalake(..., mode=..., predicate=..., partition_by=...)`
calls for every registry-backed model are replaced with `write_models(...)`
(and `write_projection_matrix(...)` for projection-matrix metadata rows).
Wide cell-feature / projection-matrix parquets and `CellCellConnectivityLong`
writes remain on raw `write_deltalake` pending registry support.
- Moved `connects_common_connectivity.arrow_utils` and
`connects_common_connectivity.write_utils` under
`connects_common_connectivity.io.*`.

### Removed

- Removed the deprecated re-export shims
`connects_common_connectivity.arrow_utils` and
`connects_common_connectivity.write_utils`. Import from
`connects_common_connectivity.io.arrow_utils` /
`connects_common_connectivity.io.write_utils` instead.

### Fixed

- Fixed `DataSet` writes to scope on `(project_id, id)` instead of
`project_id` alone, so sibling notebooks sharing a `project_id` (e.g.
patchseq exc/inh) no longer overwrite each other's `DataSet` rows.
- Fixed `write_models()` to honor `Settings.dry_run=True`: writes are now
skipped, `rows_written` is reported as `0`, and no Delta table
directories are created.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ The pilot of the Common Connectivity Pilot is focused on developing a framework
- Packaged with `pyproject.toml` and intended to be managed via `uv`
- BrainRegion ETL example from Parquet (S3/local) via `examples/etl_brain_regions.py` or CLI `ccc etl-brain-regions`
- Generic Parquet→LinkML loader utility (`parquet_loader.py`) for any class in the schema
- Curated IO layer (`connects_common_connectivity.io`) for writing generated pydantic models to a shared Delta lake — `write_models(...)` / `write_projection_matrix(...)` dispatched via a `WriteSpec` registry, with output location resolved from `ccc_config.yaml`

## Getting Started (with uv)

Expand Down Expand Up @@ -147,7 +148,7 @@ Pydantic models; this repository currently favors agility for early design.

## ETL Notebooks

A set of ETL Jupyter notebooks in `code/` registers real datasets into the shared Delta Lake store under `results/em_patchseq_wnm_v1/`. These serve as concrete working examples for every schema class.
A set of ETL Jupyter notebooks in `code/` registers real datasets into a shared Delta Lake store via the `connects_common_connectivity.io` layer (`write_models`, `write_projection_matrix`). The output location is resolved from `ccc_config.yaml` at the repo root (or the `CCC_OUTPUT_ROOT` environment variable), so notebooks do not hard-code a destination path. These serve as concrete working examples for every schema class.

- **`code/etl_examples_readme.ipynb`** — markdown-only overview of all registered datasets and feature sets: what each dataset contains, why cell counts differ between sources, and how shared feature sets work across projects. Start here if you're new to the data.

Expand Down
5 changes: 5 additions & 0 deletions ccc_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Package-wide settings for ConnectsCommonConnectivity.
# Discovered by walking up from cwd (pyproject.toml/ruff/pytest pattern).
# Edit this file (or set CCC_OUTPUT_ROOT) to repoint writers/readers.
output_root: scratch/em_patchseq_wnm_v2/
dry_run: false
6 changes: 3 additions & 3 deletions code/etl_examples_readme.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"\n",
"A quick-reference guide to what was registered and why. Use this notebook to orient yourself before diving into a specific ETL notebook.\n",
"\n",
"> **All notebooks live in `code/`.** Outputs land in `../scratch/em_patchseq_wnm_v1/`."
"> **All notebooks live in `code/`.** Outputs land in `../scratch/em_patchseq_wnm_v1/`. Registry-backed model tables are written with `write_models(...)` (projection rows use `write_projection_matrix(...)`)."
]
},
{
Expand Down Expand Up @@ -123,7 +123,7 @@
"| `etl_tasic_01_cluster.ipynb` | `tasic_2018_visp_taxonomy` | Tasic 2018 VISp scRNA-seq taxonomy (class → subclass → cluster) |\n",
"| `etl_visp_met_types_01_cluster.ipynb` | `visp_met_types_taxonomy` | VISp MET-types (class → cluster), 45 leaves |\n",
"\n",
"Both write `algorithmrun/`, `clusterhierarchy/`, `cluster/`, and `hierarchycategory/` rows. No `project_id`; rows are scoped by `hierarchy_id` so multiple taxonomies can coexist in the same Delta tables.\n"
"Both write `algorithmrun/`, `clusterhierarchy/`, `cluster/`, and `hierarchycategory/` rows. No `project_id`; `Cluster` rows are scoped by `hierarchy_id`, while the others are id-scoped in the write registry.\n"
]
},
{
Expand Down Expand Up @@ -289,7 +289,7 @@
"\n",
"Source: `ProjectionMatrix_tip_and_branch_roll_up.csv`. Cell ids are the SWC filename with `.swc` stripped (matches `_01`).\n",
"\n",
"Adds **+4 new cells** found in the projection CSV but not yet in `dataitem/` — the same late-addition pattern as the `_02` notebooks. Registered via `append_new_dataitems`.\n"
"Adds **+4 new cells** found in the projection CSV but not yet in `dataitem/` — the same late-addition pattern as the `_02` notebooks. Registered via `write_models(DataItem(...))` (append-new-by-id mode).\n"
]
},
{
Expand Down
Loading