Skip to content

Select wells and FOVs for triplet data#250

Merged
edyoshikun merged 12 commits intodynaclr_v2from
select-well-triplet
May 30, 2025
Merged

Select wells and FOVs for triplet data#250
edyoshikun merged 12 commits intodynaclr_v2from
select-well-triplet

Conversation

@ziw-liu
Copy link
Copy Markdown
Collaborator

@ziw-liu ziw-liu commented May 27, 2025

Implement FOV selection for the TripletDataModule during the fit stage:

  • Only include certain wells (cell lines or conditions)
  • Within the included wells, exclude certain FOVs (e.g. where auto-focus failed)

@ziw-liu
Copy link
Copy Markdown
Collaborator Author

ziw-liu commented May 27, 2025

CI failing for Python 3.13 due to ppwwyyxx/cocoapi#27.

@ziw-liu
Copy link
Copy Markdown
Collaborator Author

ziw-liu commented May 27, 2025

@Soorya19Pradeep @edyoshikun do you want this to target dynaclr_v2 or main?

@ziw-liu ziw-liu added this to the v0.4.0 milestone May 27, 2025
@ziw-liu ziw-liu mentioned this pull request May 27, 2025
@Soorya19Pradeep
Copy link
Copy Markdown
Contributor

@Soorya19Pradeep @edyoshikun do you want this to target dynaclr_v2 or main?

I need it with dynaclr_v2 only. This will be helpful for the model training for organelle box paper.

@Soorya19Pradeep
Copy link
Copy Markdown
Contributor

@ziw-liu , the Python requirement seen in the toml file is >=3.11.
The DynaCLR environment I have been using is Python 3.10.14. Do you think it's essential to upgrade my existing environment?

@ziw-liu
Copy link
Copy Markdown
Collaborator Author

ziw-liu commented May 28, 2025

Do you think it's essential to upgrade my existing environment?

Yes.

@Soorya19Pradeep
Copy link
Copy Markdown
Contributor

Thanks @ziw-liu ! I used the fov selection in the dataloader and it is working for me. I did not get any error.
Is there a way to check if the right wells (specified wells in the list) are being used for the training?

@ziw-liu
Copy link
Copy Markdown
Collaborator Author

ziw-liu commented May 28, 2025

Thanks @ziw-liu ! I used the fov selection in the dataloader and it is working for me. I did not get any error. Is there a way to check if the right wells (specified wells in the list) are being used for the training?

TripletDataModule.train_dataset.tracks would be a dataframe with all the cells used in the training split.

@ziw-liu ziw-liu added enhancement New feature or request representation Representation learning (SSL) labels May 28, 2025
@ziw-liu ziw-liu marked this pull request as ready for review May 28, 2025 22:56
@ziw-liu
Copy link
Copy Markdown
Collaborator Author

ziw-liu commented May 28, 2025

@edyoshikun this doesn't need to block #240. I can rebase later.

Comment thread tests/data/test_triplet.py
Copy link
Copy Markdown
Member

@edyoshikun edyoshikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test for the exclude and the selection of the wells. This LGTM now. Soorya is already training with this.

@edyoshikun edyoshikun merged commit 1dbc583 into dynaclr_v2 May 30, 2025
4 checks passed
@edyoshikun edyoshikun mentioned this pull request May 30, 2025
8 tasks
@ziw-liu ziw-liu deleted the select-well-triplet branch May 30, 2025 17:18
edyoshikun pushed a commit that referenced this pull request May 30, 2025
* refactor select well mixin into its own module

* refactor filter functions

* rename hcs tests

* fix import

* triplet: select fovs for training

* add test

* increase example size

* test exclude fovs

* use full fov path to exclude

* add unit test

* test fov names
edyoshikun pushed a commit that referenced this pull request May 30, 2025
* refactor select well mixin into its own module

* refactor filter functions

* rename hcs tests

* fix import

* triplet: select fovs for training

* add test

* increase example size

* test exclude fovs

* use full fov path to exclude

* add unit test

* test fov names
ziw-liu added a commit that referenced this pull request Jun 23, 2025
* delete outdatted figure making scripts

* delte unused infection classification scripts

* rename infection classfication README.md

* adding old visualization code

* deleting old evaluation code

* cherry pick commit adding xyz coordinates to xarray

* cherry-pick flexible number of PC components

* cherry-pick updating distance and ALFI MSF measurments

* Knowledge distillation between channels (#222)

* caching dataloader

* caching data module

* black

* ruff

* Bump torch to 2.4.1 (#174)

* update torch >2.4.1

* black

* ruff

* adding timeout to ram_dataloader

* bandaid to cached dataloader

* fixing the dataloader using torch collate_fn

* replacing dictionary with single array

* loading prior to epoch 0

* Revert "replacing dictionary with single array"

This reverts commit 8c13f49.

* using multiprocessing manager

* add sharded distributed sampler

* add example script for ddp caching

* format and lint

* addding the custom distrb sampler to hcs_ram.py

* adding sampler to val train dataloader

* fix divisibility of the last shard

* hcs_ram format and lint

* data module that only crops and does not collate

* wip: execute transforms on the GPU

* path for if not ddp

* fix randomness in inversion transform

* add option to pop the normalization metadata

* move gpu transform definition back to data module

* add tiled crop transform for validation

* add stack channel transform for gpu augmentation

* fix typing

* collate before sending to gpu

* inherit gpu transforms for livecell dataset

* update fcmae engine to apply per-dataset augmentations

* format and lint hcs_ram

* fix abc type hint

* update docstring style

* disable grad for validation transforms

* improve sample image logging in fcmae

* fix dataset length when batch size is larger than the dataset

* fix docstring

* add option to disable normalization metadata

* inherit gpu transform for ctmc

* remove duplicate method overrride

* update docstring for ctmc

* allow skipping caching for large datasets

* make the fcmae module compatible with image translation

* remove prototype implementation

* fix import path

* Arbitrary prediction time transforms (#209)

* fix spelling in docstring and comment

* add batched zoom transform for tta

* add standalone lightning module for arbitrary TTA

* fix composition of different zoom factors

* add docstrings

* wip: segmentation module

* avoid casting

* update import path from iohub

* make integer array in fixture

* labels fixture

* test segmentation metrics modules

* less strings

* test non-empty

* select which wells to include in fit
#205

* make well selection a mixin

* wip: mmap cache data module

* support exclusion of FOVs

* wip: precompute normalization

* add augmentations benchmark

* fix cpu threads default

* fix probability (affects cpu results)

* disable metadata tracking

* fix non-distributed initialization

* refactor transforms into submodules

* wip: bootstrap and distillation

* wip: balance distillation loss

* re-define cropping transforms

* wip: joint only

* redefine random flip dict transform

* cell classification data module

* supervised cell classifier

* do not import type hints at runtime

* update docstring

* backwards compatible import path

* fix annotations

* fix style

* fix dice score import

* fix dice score parameters

* apply formatting to exercise

* fix labels data type

* fix labels input shape

---------

Co-authored-by: Eduardo Hirata-Miyasaki <edhiratam@gmail.com>

* cherry-pick moving occlusion script to figures

* adding demo prototype for dynaclr

* cleaning up demo imagenet vs dynaclr

* abstracting the writing to xarray format so it doesn't dependo on the pl trainer

* fixing patch size mismatch

* benchmark_demos

* making pca compute similar to phate

* imagenet lm module

* update openphenom lm module

* update embedding_writer to accept configurable PHATE,PCA,UMAP

* add plotting utils for dtw

* update imagenet and openpheno to accept multi channel

* readme for demo

* moving examples and simplifying example to load and display

* update README.md with instructions to run inference

* missing plotly dash in visual

* adding a test case with dash

* format

* fix imports

* removing paths and use the ones relative to download

* dynaclr-denv-vs and interactive visualizer update

* Add evaluation script for infection classifier models (#241)

* add evaluation script for infection classifier models

* show data points in bins instead of bar plot

* adding dtw initial simulations

* dtw evaluation with other methods

* add dynaclr_v2 schmeatic

* updating the readme with new schematic

* removing deprecated CTC demos

* removing visualizatinon file in favor of the demo

* moving the old cli scripts for the dynaclr demo to examples

* Select wells and FOVs for triplet data (#250)

* refactor select well mixin into its own module

* refactor filter functions

* rename hcs tests

* fix import

* triplet: select fovs for training

* add test

* increase example size

* test exclude fovs

* use full fov path to exclude

* add unit test

* test fov names

* Tweak VCP tutorials (#251)

* relax patch version

* tweak visualization

* Readme updates (#246)

* point to stable version

* minor wording edit

* link tutorials on the main branch

* split demo from tutorials

* add vcp links

* fix typo

* add link to hek tutorial

* update descriptions and add neuromast

* wip: try to patch CI

* Revert "wip: try to patch CI"

This reverts commit 2ecf722.

* explain versioning policy

* add todo label

* uploading demo as html file since jupyter cannot render it

* updating demo readmes to point to public

* remove the play pause buttons for the phate demo

* standardizing viscy hosted models to have the description in the parenthesis

* numpy docstring on openphenom and removing torchno grad redundancy

* replacing go.heatmap to go.image for visualization

* reviewed README

* added link to DynaCLR demo

* adding script to generate pseudotracks

* adding global paths and todos to pseudo tracks

* Reiterate cell feature vs PC script (#254)

* restructure and test script

* black formatted

* rename utils.py

* add doctrings

* add numpy style doctrings

* ruff fixed import error

* remove redundant comments

* black formatted

* ruff corrected

* modified docstrings

* fix typing

* fix docstring type hint

* adding tests for general functionality and  simplifiying the normalization

* adding mahotas to metrics

* removing the classes since we will add independent unit tests.

* adding some unit tests

* cleaning up the pytests functions

* moving eps as class attribute.

* removing self.

---------

Co-authored-by: Ziwen Liu <ziwen.liu@czbiohub.org>
Co-authored-by: Ziwen Liu <67518483+ziw-liu@users.noreply.github.com>
Co-authored-by: Eduardo Hirata-Miyasaki <edhiratam@gmail.com>

* Update examples/DynaCLR/setup.sh

Co-authored-by: Ziwen Liu <67518483+ziw-liu@users.noreply.github.com>

* adding documentation to the visualization class

* visualization app will take z-range as tuple

* removing the redundant 'reduction' attribute in the embedding_writter. setting configurable phate and pca.

* remove dynacell metrics config

* replacing prints with assertions

* ONNX support for DynaCLR (#258)

* adding convenience function to testimate the dataloader settings.

* adding pin_memory, consistent workers and prefetching to the tripletdatamodule

* modify the dynaclr forward method for onnx support

* adding demo config file

* removing the anchor and positive features from the train step

* moving transformers to example dependency

* logic to save the clusters with a name

* attribute to save the outoput csv file location

* fixing the bug that didnt clear the clusters when pressing the button

* fixing readme links and formatting

* camel to snake

* add output path to save PC Viewer outputs

* adding default parameters for dataloader settings

* removing unecessary set of unique tracks and wrangling the dataframe

* ruff

* Plot teacher accuracy (#260)

* plot teacher model accuracy

* use a fixed ylimit

* removing percent in dataloader slurm convenience function

* abstracting phate  #263

* update format

* format tests

---------

Co-authored-by: Ziwen Liu <67518483+ziw-liu@users.noreply.github.com>
Co-authored-by: Ziwen Liu <ziwen.liu@czbiohub.org>
Co-authored-by: Shalin Mehta <shalin.mehta@czbiohub.org>
Co-authored-by: Soorya19Pradeep <101817974+Soorya19Pradeep@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request representation Representation learning (SSL)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants