AllenInstitute · reneyagmur · Jun 1, 2026 · Jun 1, 2026 · Jun 2, 2026 · Jun 9, 2026
diff --git a/.codeocean/datasets.json b/.codeocean/datasets.json
@@ -17,6 +17,10 @@
 			"id": "78a80081-c645-4e38-beb7-b9d9308a35d9",
 			"mount": "microns1412"
 		},
+		{
+			"id": "aafc99cc-92ee-4d04-b152-92f1063a3268",
+			"mount": "v1dd_1196"
+		},
 		{
 			"id": "aff09b9b-5cdc-49ef-8e39-358a8ead98d8",
 			"mount": "visp-patchseq-taxonomy-info"

diff --git a/.github/instructions/changelog.instructions.md b/.github/instructions/changelog.instructions.md
@@ -0,0 +1,54 @@
+---
+description: "Use when editing CHANGELOG.md, drafting release notes, or summarizing user-visible changes. Enforces Keep a Changelog format, SemVer scope, and the user-voice rule."
+applyTo: "CHANGELOG.md"
+---
+# Changelog rules
+
+The changelog is the user-facing log of what changed in
+`connects_common_connectivity`. It is **not** an internal work journal.
+
+## Format
+- [Keep a Changelog 1.1.0](https://keepachangelog.com/en/1.1.0/) +
+  [SemVer](https://semver.org/spec/v2.0.0.html).
+- All new entries go under `## [Unreleased]` until a release is cut.
+- Use only the standard sections: `Added`, `Changed`, `Deprecated`, `Removed`,
+  `Fixed`, `Security`. Omit empty sections in released versions; keep them as
+  empty headers under `[Unreleased]` so contributors see the slots.
+- Newest version on top. Releases are `## [X.Y.Z] - YYYY-MM-DD`.
+
+## Voice and scope (the rule that actually matters)
+- Write in **user voice**: what changed for someone who imports
+  `connects_common_connectivity`, runs the `ccc` CLI, or follows the README.
+- One bullet per change. Past tense, present-perfect-style is fine
+  (`Added …`, `Moved …`, `Fixed …`). No first person, no narrative.
+- **Include**: new public names, removed public names, moved import paths,
+  changed signatures, changed defaults, behavior fixes a user could observe,
+  new CLI flags, new config keys, dropped Python versions.
+- **Exclude**: internal refactors, test-only changes, planning-doc edits,
+  prompt/agent-customization edits, dev-tooling tweaks, comment-only changes.
+  If a user couldn't notice it, it doesn't belong here.
+- If a change has both an internal and a user-visible side, log only the
+  user-visible side.
+
+## Linking
+- Reference public names in backticks: `` `write_models` ``, `` `io.writers` ``.
+- Link to issues/PRs only when they add information a user would want
+  (`#123`); do not link to internal planning docs.
+
+## Deprecations and removals
+- Announce in `Deprecated` first (one release minimum) before moving to
+  `Removed`, except for genuinely unused or never-released names.
+- Name the replacement when there is one: "Deprecated `X`; use `Y` instead."
+
+## Releasing (manual for now)
+1. Rename `## [Unreleased]` to `## [X.Y.Z] - YYYY-MM-DD` (today's date).
+2. Drop empty subsections from the released block.
+3. Add a fresh `## [Unreleased]` at the top with all six empty sub-headers.
+4. Bump the version in `pyproject.toml` in the same commit.
+
+## Anti-patterns
+- "Refactored internals." — internal, drop it.
+- "Updated planning docs." — internal, drop it.
+- "Various fixes." — split into specific bullets or drop.
+- "Added new feature." — name the public symbol or describe the behavior.
+- Long prose paragraphs — one bullet, one change.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,90 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+### Added
+
+### Changed
+
+### Deprecated
+
+### Removed
+
+### Fixed
+
+### Security
+
+## [0.2.0] - 2026-06-23
+
+### Added
+
+- Added `connects_common_connectivity.config` with `Settings`,
+  `get_settings()`, `find_config_file()`, `output_root()`, and
+  `table_path()`. Settings are discovered from a `ccc_config.yaml` at (or
+  above) the cwd; `CCC_OUTPUT_ROOT` overrides `output_root`. Relative
+  `output_root` values are anchored at the config file's directory so a
+  notebook in `code/` and a script at the repo root resolve to the same
+  place.
+- Added curated public API at `connects_common_connectivity.io`:
+  `write_models()` (single dispatch core for all generated pydantic
+  models), `write_projection_matrix()`, `WriteResult`,
+  `WRITABLE_CLASSES`, and re-exports of `get_settings`, `Settings`, and
+  `table_path`. The surface is pinned by `__all__`.
+- Added write-time validation: `write_models()` now re-validates each
+  model through a runtime-derived strict subclass that flips
+  `WriteSpec.required_for_write` slots to non-optional, raising
+  `ValueError` before any IO if a write-required slot is missing or
+  `None`. Public helpers `strict_model_for()` and `validate_for_write()`
+  live in `connects_common_connectivity.io.write_validation`.
+- Added `WriteSpec` registry entries for `AlgorithmRun` and
+  `HierarchyCategory` (both project-agnostic, scope=`["id"]`,
+  `overwrite_scoped`). These classes are now writable through
+  `write_models(...)` and surface in `WRITABLE_CLASSES`.
+- Added an `output_root=` keyword to `write_models()` and
+  `write_projection_matrix()` for per-call overrides of the on-disk root.
+  Accepts a `str` or `Path` and writes to `<output_root>/<spec.subdir>/`,
+  bypassing `ccc_config.yaml` for that call. Mutually exclusive with
+  `settings=` (passing both raises `TypeError`). Lets a single notebook
+  redirect its writes (e.g. an isolated test dataset) without mutating
+  process-global config or environment variables.
+- Added `populate_region_coverage()` in
+  `connects_common_connectivity.io.write_utils` for deriving
+  `ProjectionMeasurementMatrix.region_coverage` from a dense matrix.
+- Added `CALCIUM_IMAGING` value to the `Modality` enum for calcium
+  imaging based functional correlations.
+
+### Changed
+
+- Migrated `code/etl_*.ipynb` notebooks to the curated IO API:
+  hardcoded `OUTPUT_ROOT = "../scratch/..."` strings are replaced with
+  `output_root()` from `connects_common_connectivity.config`, and
+  hand-rolled `write_deltalake(..., mode=..., predicate=..., partition_by=...)`
+  calls for every registry-backed model are replaced with `write_models(...)`
+  (and `write_projection_matrix(...)` for projection-matrix metadata rows).
+  Wide cell-feature / projection-matrix parquets and `CellCellConnectivityLong`
+  writes remain on raw `write_deltalake` pending registry support.
+- Moved `connects_common_connectivity.arrow_utils` and
+  `connects_common_connectivity.write_utils` under
+  `connects_common_connectivity.io.*`.
+
+### Removed
+
+- Removed the deprecated re-export shims
+  `connects_common_connectivity.arrow_utils` and
+  `connects_common_connectivity.write_utils`. Import from
+  `connects_common_connectivity.io.arrow_utils` /
+  `connects_common_connectivity.io.write_utils` instead.
+
+### Fixed
+
+- Fixed `DataSet` writes to scope on `(project_id, id)` instead of
+  `project_id` alone, so sibling notebooks sharing a `project_id` (e.g.
+  patchseq exc/inh) no longer overwrite each other's `DataSet` rows.
+- Fixed `write_models()` to honor `Settings.dry_run=True`: writes are now
+  skipped, `rows_written` is reported as `0`, and no Delta table
+  directories are created.
diff --git a/README.md b/README.md
@@ -16,6 +16,7 @@ The pilot of the Common Connectivity Pilot is focused on developing a framework
 - Packaged with `pyproject.toml` and intended to be managed via `uv`
 - BrainRegion ETL example from Parquet (S3/local) via `examples/etl_brain_regions.py` or CLI `ccc etl-brain-regions`
 - Generic Parquet→LinkML loader utility (`parquet_loader.py`) for any class in the schema
+- Curated IO layer (`connects_common_connectivity.io`) for writing generated pydantic models to a shared Delta lake — `write_models(...)` / `write_projection_matrix(...)` dispatched via a `WriteSpec` registry, with output location resolved from `ccc_config.yaml`
 
 ## Getting Started (with uv)
 
@@ -147,7 +148,7 @@ Pydantic models; this repository currently favors agility for early design.
 
 ## ETL Notebooks
 
-A set of ETL Jupyter notebooks in `code/` registers real datasets into the shared Delta Lake store under `results/em_patchseq_wnm_v1/`. These serve as concrete working examples for every schema class.
+A set of ETL Jupyter notebooks in `code/` registers real datasets into a shared Delta Lake store via the `connects_common_connectivity.io` layer (`write_models`, `write_projection_matrix`). The output location is resolved from `ccc_config.yaml` at the repo root (or the `CCC_OUTPUT_ROOT` environment variable), so notebooks do not hard-code a destination path. These serve as concrete working examples for every schema class.
 
 - **`code/etl_examples_readme.ipynb`** — markdown-only overview of all registered datasets and feature sets: what each dataset contains, why cell counts differ between sources, and how shared feature sets work across projects. Start here if you're new to the data.
 

diff --git a/ccc_config.yaml b/ccc_config.yaml
@@ -0,0 +1,5 @@
+# Package-wide settings for ConnectsCommonConnectivity.
+# Discovered by walking up from cwd (pyproject.toml/ruff/pytest pattern).
+# Edit this file (or set CCC_OUTPUT_ROOT) to repoint writers/readers.
+output_root: scratch/em_patchseq_wnm_v2/
+dry_run: false
diff --git a/code/etl_examples_readme.ipynb b/code/etl_examples_readme.ipynb
@@ -8,7 +8,7 @@
     "\n",
     "A quick-reference guide to what was registered and why. Use this notebook to orient yourself before diving into a specific ETL notebook.\n",
     "\n",
-    "> **All notebooks live in `code/`.** Outputs land in `../scratch/em_patchseq_wnm_v1/`."
+    "> **All notebooks live in `code/`.** Outputs land in `../scratch/em_patchseq_wnm_v1/`. Registry-backed model tables are written with `write_models(...)` (projection rows use `write_projection_matrix(...)`)."
    ]
   },
   {
@@ -123,7 +123,7 @@
     "| `etl_tasic_01_cluster.ipynb` | `tasic_2018_visp_taxonomy` | Tasic 2018 VISp scRNA-seq taxonomy (class → subclass → cluster) |\n",
     "| `etl_visp_met_types_01_cluster.ipynb` | `visp_met_types_taxonomy` | VISp MET-types (class → cluster), 45 leaves |\n",
     "\n",
-    "Both write `algorithmrun/`, `clusterhierarchy/`, `cluster/`, and `hierarchycategory/` rows. No `project_id`; rows are scoped by `hierarchy_id` so multiple taxonomies can coexist in the same Delta tables.\n"
+    "Both write `algorithmrun/`, `clusterhierarchy/`, `cluster/`, and `hierarchycategory/` rows. No `project_id`; `Cluster` rows are scoped by `hierarchy_id`, while the others are id-scoped in the write registry.\n"
    ]
   },
   {
@@ -289,7 +289,7 @@
     "\n",
     "Source: `ProjectionMatrix_tip_and_branch_roll_up.csv`. Cell ids are the SWC filename with `.swc` stripped (matches `_01`).\n",
     "\n",
-    "Adds **+4 new cells** found in the projection CSV but not yet in `dataitem/` — the same late-addition pattern as the `_02` notebooks. Registered via `append_new_dataitems`.\n"
+    "Adds **+4 new cells** found in the projection CSV but not yet in `dataitem/` — the same late-addition pattern as the `_02` notebooks. Registered via `write_models(DataItem(...))` (append-new-by-id mode).\n"
    ]
   },
   {