Draft
Conversation
- Updated `write_image_omezarr` and `write_labels_omezarr` functions to accept pixel sizes as float, tuple, or dictionary, allowing for more flexible input formats. - Introduced `_parse_pixel_sizes` helper function to standardize pixel size extraction and validation. - Enhanced metadata extraction in `extract_metadata_tile_nd2` and `extract_metadata_well_nd2` to include pixel size, objective magnification, zoom magnification, and binning information. - Updated `export_omezarr_image` script to read image data from TIFF or raw formats, improving compatibility with different data sources. - Added warnings for potential inconsistencies in pixel size calibration.
- Introduced `conftest.py` to ensure the repository root is included in `sys.path` for test imports. - Updated `test_preprocess.py` to assert required columns in metadata instead of exact counts. - Modified `test_omezarr_exports.py` to check for any Zarr files in the output directory. - Enhanced `write_image_omezarr` to accept new parameters: `coarsening_factor`, `max_levels`, and `is_label`, improving flexibility in image writing. - Added error handling for `max_levels` and `coarsening_factor` to ensure valid values. - Updated metadata handling in `write_image_omezarr` to accommodate label images and ensure proper storage of pixel sizes.
omezarr_writer.py moved under lib/shared
resolves issue with failed labels import into napari
…ss_zarr_v4 Cherry-picked from 1161474 on 79f1eea_preprocess_zarr_v4. - Add dynamic key selection (CONVERT_SBS_KEY/CONVERT_PHENOTYPE_KEY) based on OME_ZARR_ENABLED - IC fields respect IC_EXT (zarr vs tiff) based on config - Downstream rules (sbs.smk, phenotype.smk) use dynamic keys for preprocess inputs
- Add image_to_omezarr.py script that uses convert_to_array + write_image_omezarr - Add convert_sbs_omezarr and convert_phenotype_omezarr rules - Update CONVERT_*_KEY selection to use _omezarr variants when USE_OME_ZARR=True - This allows direct ND2→Zarr conversion, bypassing TIFF intermediates entirely
- Added integration tests for Zarr preprocessing functionality, ensuring nd2_to_zarr conversion produces outputs equivalent to TIFF conversion. - Updated pytest markers to include integration tests. - Modified existing tests to prioritize Zarr format over TIFF where applicable. - Introduced new rules for Zarr conversion in the Snakemake workflow, allowing for flexible output formats based on configuration. - Implemented a script for direct ND2 to standard Zarr conversion, streamlining the preprocessing pipeline.
- Add unit on T axis ("second") and spatial axes in _axes_str_to_dicts
- Separate label axis unit patching from pixel scale patching so units
are always set even without preprocess metadata
- Re-inject downsamplingMethod after iohub dump_meta (which strips it)
- segmentation.method now includes model (e.g. "cellpose.cyto3")
- segmentation.stitching uses string "none" instead of boolean false
- Add statistics.n_cells by counting unique labels in the array
- Validated: 0 errors with ops-schema validator
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…with output_to_input() lambdas using new _merge_well_expand helpers. also add row to cell_data_metadata_cols.tsv so aggregate steps treat it as metadata, not a feature.
Template CSV with 188 cp_emulator feature patterns. {Compartment} and
{Channel} placeholders are expanded at submission time by the finalize
rule using channel names from config.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New add_ensembl_ids() function maps Entrez gene IDs to Ensembl IDs using either a static TSV mapping file or Ensembl REST API fallback. Wired into standardize_barcode_design() via ensembl_mapping_path parameter. Non-targeting controls are automatically labeled "non-targeting". Required for OPS Data Standard perturbation_library.csv which requires Ensembl gene IDs (ENSG format) instead of Entrez IDs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- get_gene_mapping(): downloads gene_symbol/entrez_id/ensembl_gene_id mapping from Ensembl BioMart at runtime (same pattern as UniProt download) - resolve_gene_ids(): fills in missing gene identifiers from any starting point (symbol only, Entrez only, Ensembl only, or mixed) - Wired into standardize_barcode_design() via gene_mapping_path parameter - Replaces the earlier add_ensembl_ids() which only handled Entrez → Ensembl Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaced BioMart bulk download (unreliable) with MyGene.info querymany() for targeted symbol→Ensembl/Entrez resolution. Only looks up genes present in the user's barcode library — fast and doesn't hit API limits. Requires: uv pip install mygene Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Brings in main fixes: recombination detection, spatial heatmaps, aggregate edge cases, resolve_path, file_manifest, custom cellpose, aggregation/clustering cleanup. Keep version at 1.5.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- gene_id now contains Ensembl IDs (replaces Entrez)
- Preserve full sgRNA as protospacer_sequence before prefix truncation
- Derive role (targeting/control) and control_type from nontargeting patterns
- Add protospacer_adjacent_motif ("3' NGG" for Cas9)
- At export time, prep_cellxstate.sh just renames prefix→barcode and
adds perturbation_id=gene_symbol
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Main's spatial heatmap changes introduced hardcoded ["well"] expansion values in eval rules. In zarr mode, wildcards use row/col instead of well. Replace with _phen_well_expand/_sbs_well_expand/_sbs_tile_expand which dispatch correctly based on IMG_FMT. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
barcode_col pointed to sgRNA which no longer exists in the new barcode library format. Use prefix_col: prefix instead, which is the truncated barcode used for read matching. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge outputs (parquets) always use {well} paths regardless of format.
Only cross-module references (SBS/phenotype/preprocess outputs) need
format-aware expansion. Distinguish between:
- Merge own outputs: always expansion_values=["well"]
- SBS/phenotype data outputs: _merge_well_expand_all (row/col in zarr)
- Preprocess metadata: _merge_well_expand_all (row/col in zarr)
Also adds _combos_with_well() helper for future use.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…es_singlecell.parquet outputs into a single AnnData .h5ad per channel combo, combining all cell classes
In zarr mode, the montage pipeline now writes individual cell crops to
an examples.zarr store ({gene}/{barcode}/0..N/) instead of tiled
PNG + TIFF montages. TIFF mode unchanged.
Changes:
- montage_utils.py: add_filenames() zarr-aware, grid_view() uses read_image()
- generate_montage.py: dispatches on IMG_FMT (zarr crops vs PNG/TIFF grid)
- aggregate.smk/targets: conditional outputs for zarr vs tiff mode
- rule_utils.py: get_montage_inputs() handles None overlay template
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each cell crop is now written as a proper OME-Zarr with channel names, axes, and coordinate transforms via save_image(). This means each crop carries its own channel metadata rather than relying on the parent store. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Label groups inside a .zarr store should not have .zarr suffix — e.g. labels/nuclei not labels/nuclei.zarr. The suffix caused napari-ome-zarr to silently skip segmentations because the labels index listed ["nuclei"] but the directory was "nuclei.zarr". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
iohub's channel_display_settings only recognizes standard fluorophore names (DAPI, GFP, etc.) — marker names like COXIV, CENPA, WGA got white/inactive defaults. Now: - All channels set to active: true - Colors: config color > iohub color > default palette fallback - Default palette: blue, green, red, magenta, yellow, cyan, orange, purple Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same white-color issue as HCS metadata — hardcoded FFFFFF for all channels. Now uses the same default palette (blue, green, red, magenta, etc.) so example zarr crops and all OME-Zarr writes get distinct colors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move DEFAULT_CHANNEL_COLORS to io.py as shared constant, import in write_hcs_metadata.py (removes duplicate palette definition) - Example zarr crops: max_levels=1 (no pyramids for 80px images) - Remove unused _combos_with_well() helper from merge.smk Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phenotype: DAPI=blue, COXIV=green, CENPA=red, WGA=magenta SBS: DAPI=blue, G=green, T=red, A=yellow, C=magenta Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tile-based workflows don't need pyramids — images are ~2400x2400. Changed default max_levels from 5/4 to 1 in save_image() and write_image_omezarr(). Added zarr_max_levels config option for documentation. Users wanting pyramids pass max_levels explicitly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
image-label metadata lives at attributes.ome.image-label in zarr v3, not attributes.image-label. Without this fix, the labels container zarr.json never gets written because no label stores are detected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Brings in Ege's generate_anndata rule and anndata dependency. Run script updated to include all pipeline stages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Brings in Ege's param validation and direct param access across all scripts. Resolved 6 conflicts — kept our zarr-aware read_image(), save_image(), uint32 labels, and zarr/tiff montage dispatch while adopting param validation and direct snakemake.params access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…220) * fix params for multi mode (#218) * Rename int → integrated in CP emulator features Aligns with OPS Data Standard feature naming convention. Changes: - cp_emulator.py: feature key "int" → "integrated", column mappings "int" → "integrated", "int_edge" → "integrated_edge" - feature_definitions.csv: updated template column names - CP_EMULATOR_FEATURES.md: updated documentation This is the only rename needed — all other feature names already match the standardized Vesuvius feature set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add compartment/channel columns, update feature types in template Per updated OPS spec: - morphology → shape - Correlation features (K, manders, overlap, etc.) → correlation - New compartment and channel columns for metadata Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add format_cluster_anndata rule for spec-compliant aggregated h5ad New cluster step that produces aggregated_data.h5ad per the OPS spec: - obs = perturbations indexed by perturbation_id - var = standardized feature set (shape + intensity + correlation) - X = mean aggregated feature values per perturbation - obsm = PHATE embedding coordinates - uns = schema_version, default_embedding, title - Bootstrap p-values wired but optional (TODO: reshape) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add format_cluster_anndata rule for cluster-level h5ad Combines perturbation-level features with PHATE embedding and cluster assignments into cluster.h5ad. Includes all available metadata and features with parsed var annotations (type, compartment, channel). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Extract format_cluster_anndata logic into lib function Move core AnnData construction into workflow/lib/cluster/, keep script as thin caller. Follows brieflow lib/scripts pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update classifier feature names: int → integrated Renamed features in the test classifier dill to match the CP emulator rename. Reverted compatibility shim in train.py — fix at source instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Retrain dummy classifier with integrated feature names XGBoost's feature_names_in_ is read-only — can't patch the dill. Retrained a simple dummy classifier on random data with the correct feature names (int → integrated). Same class structure (Interphase/Mitotic). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add dummy_classifier.dill with integrated feature names Properly trained dummy classifier with: - Feature names using _integrated (not _int) - Labels 1=Mitotic, 2=Interphase (matching original config mapping) - LabelEncoder for XGBoost 0-indexed compatibility - Config updated to use dummy_classifier.dill Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove old classifier with outdated feature names Replaced by dummy_classifier.dill which uses _integrated feature names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Wire bootstrap p-values into cluster h5ad Add _add_bootstrap_layers() that reshapes per-feature p-values and FDR from the combined gene bootstrap TSV into AnnData layers: - layers["p_values"]: per-feature p-values per perturbation - layers["neg_log10_fdr"]: -log10(FDR) per feature per perturbation Bootstrap results wired as input to format_cluster_anndata rule. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add percentile_rank layer to cluster h5ad Per-feature percentile rank (0-100) across all perturbations. Useful for human-readable interpretation of feature values. Dropped during cellxstate export but retained in pipeline h5ad. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add neg_log10_fdr to bootstrap, dump all layers into cluster h5ad Bootstrap now computes and outputs: - {feature}_neg_log10_pval: -log10(p-value) - {feature}_fdr: FDR-corrected p-value - {feature}_neg_log10_fdr: -log10(FDR) Cluster h5ad reads all four bootstrap columns directly as layers: p_values, fdr, neg_log10_pval, neg_log10_fdr. No computation at the cluster step — bootstrap does all the work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Combine all leiden resolutions into single cluster h5ad format_cluster_anndata now accepts a dict of clusterings (one per resolution) and merges cluster assignments as separate obs columns: cluster_group_2, cluster_group_5, etc. Output is one h5ad per cell_class/channel_combo at cluster/{combo}/{class}/h5ad/cluster.h5ad. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix bootstrap column reorder to match renamed columns The ordered_cols list in apply_multiple_hypothesis_correction still referenced _log10 after we renamed to _neg_log10_pval and added _neg_log10_fdr. Updated to match the actual column names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Clean obs: drop PHATE duplicates and merge-suffix columns PHATE_0/1 belong in obsm not obs. cell_count_cluster is a merge artifact. cluster column replaced by per-resolution cluster_group_N. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add col to cell_data_metadata_cols.tsv The col column (well column index from split_well_to_cols) was missing from the metadata cols list, causing it to leak into feature columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add row and col to DEFAULT_METADATA_COLS These columns are added by split_well_to_cols in zarr mode but were missing from the default metadata list, causing them to leak into feature columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * merge ege's work, final improvements --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tage_confidence obs placement
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Thank you for your contribution to Brieflow!
Please succinctly summarize your proposed change.
What motivated you to make this change?
Please also link to any relevant issues that your code is associated with.
What is the nature of your change?
Checklist
Please ensure that all boxes are checked before indicating that a pull request is ready for review.
pyproject.tomlto reflect the change as designated by semantic versioning.ruff checkandruff format.