Ingestion v2 - config outpath, write registry and validation, write utils, single write_models#5
Open
reneyagmur wants to merge 25 commits into
Open
Ingestion v2 - config outpath, write registry and validation, write utils, single write_models#5reneyagmur wants to merge 25 commits into
reneyagmur wants to merge 25 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
IO layer: write path + validation
Ships the curated
connects_common_connectivity.iowrite path end-to-end: package-wide configuration, a registry-driven write API, write-time validation derived from that same registry, ETL notebook migration to the new API, and the test suite to back it.Design: WriteSpec as the single source of truth
The
WriteSpecregistered per writable class is one declaration that drives both Delta dispatch (subdir, partitioning, scope columns, write mode) and write-time validation (required_for_writeslots are flipped non-optional in auto-derived strict submodels and re-validated before any IO). Generatedmodels.pyis never touched.Configuration
connects_common_connectivity.config: pydanticSettings, cachedget_settings(), walk-up discovery ofccc_config.yaml, plusoutput_root()/table_path()helpers. Relative values anchor at the config file's directory viaos.path.abspath(avoids Code Ocean'sscratch -> /scratchsymlink).CCC_OUTPUT_ROOTenv >ccc_config.yaml> error.ccc_config.yamlseeded.Write registry and dispatch
io/write_spec.py:WriteSpec,REGISTRY(14 entries),get_spec().io/writers.py:write_models()single-dispatch over the registry (no per-class wrappers), frozenWriteResultdataclass,WRITABLE_CLASSEStuple.write_projection_matrix()is the only non-write_modelswriter so far, justified by its non-uniform signature (dense matrix + model).populate_region_coverage()added inio/write_utils.py; derivesregion_coveragefrom the dense values before write.DataSetscope widened to(project_id, id)so patchseq exc/inhDataSetrows coexist (today's predicate-only-on-project_idbehavior would overwrite one with the other).Write-time validation
io/write_validation.py:strict_model_for(cls)flipsWriteSpec.required_for_writeslots to non-optional and stripsOptionalfrom those annotations (cached per class, no mutation of generatedmodels.py).validate_for_write()re-validates instances and raisesValueErrornaming the missing slots before any IO. Wired intowrite_models.required_for_writepopulated forCluster,ClusterMembership,CellFeatureDefinition.Public API surface
io/__init__.pyre-exports pinned by__all__:get_settings,Settings,table_path,write_models,write_projection_matrix,WriteResult,WRITABLE_CLASSES.output_root=keyword onwrite_models()/write_projection_matrix()(mutually exclusive withsettings=) so a single notebook can redirect its writes without mutating process-global config.Modality.CALCIUM_IMAGINGadded (for functional correlations in microns or v1dd-like datasets with EM + CI experiments).connects_common_connectivity.arrow_utils/connects_common_connectivity.write_utilsre-export shims;arrow_utils.pyandwrite_utils.pynow live exclusively underio/.ETL notebook migration
write_models/write_projection_matrixin the ETL notebooks. Hand-rolledwrite_deltalakemigrated. Per-notebook imports trimmed.OUTPUT_ROOT = "../scratch/..."strings replaced withoutput_root().DataSetscope fix above).Tests
tests/conftest.pyfoundations (settings/cache/cwd isolation + shared fixtures); duplicated helpers removed.match=checks.WRITABLE_CLASSES; registry-drift guard; no-shim regression (test_shim_modules_deleted,_not_importable,_no_source_references_shim_paths).output_root=override, strict-validation failures, public-API surface.Not in this PR
write_deltalakedirectly).CellCellConnectivityLong— no registry entry yet; thewrite_cellcellconnectivitylongstub inio/writers.pydocuments the migration plan.etl_v1dd_01new dataset ingestion prototype ongoing in parallel.merge_by_id(read-existing → union → overwrite) write mode for shared scopes like(visp_patchseq, visp_inh_patchseq)where multiple notebooks contribute disjoint subsets. The union is currently inlined in patch-seq / WNM notebooks; seeplanning/multi_writer_scope_design.mdfor the draft design discussion.Verification
uv run pytest -q→ 160 passed.