Description of the bug
Currently, there are more than 20 container images in the pipeline totaling ~60 GB of disk usage
I remember that for nf-core modules the recommended principle is "one tool – one container", but here we mostly have local modules and I think some redundancy (caused by incremental development) can be reduced.
| IMAGE |
ID |
DISK USAGE |
CONTENT SIZE |
| SEQERA: anndata2ri_bioconductor-singlecellexperiment_anndata_r-seurat:5fae42aabf7a1c5f |
291a48658716 |
3.38GB |
846MB |
| SEQERA: anndata:0.10.9--1eab54e300e1e584 |
3471efcc8b48 |
936MB |
231MB |
| SEQERA: anndata_pyyaml:82c6914e861435f7 |
7946ce8a97db |
1.06GB |
261MB |
| SEQERA: anndata_upsetplot:784e0f450da10178 |
766c2dff4b54 |
1.18GB |
306MB |
| SEQERA: bbknn_pyyaml_scanpy:4cf2984722da607f |
453b5e1f8972 |
1.6GB |
382MB |
| SEQERA: bioconductor-celldex_bioconductor-hdf5array_bioconductor-singlecellexperiment_r-yaml:13bf33457e3e7490 |
fdb9aa052292 |
2.33GB |
634MB |
| SEQERA: celltypist_scanpy:44b604b24dd4cf33 |
bfe009b0a96c |
1.78GB |
431MB |
| SEQERA: harmonypy_pyyaml_scanpy:f6cc57196369fb1e |
0c62b23a31d6 |
1.63GB |
392MB |
| SEQERA: leidenalg_python-igraph_pyyaml_scanpy:4936fa196b5f4340 |
8644a451da2a |
1.66GB |
401MB |
| SEQERA: liana_pyyaml:776fdd7103df146d |
131e6bd9dccb |
2.24GB |
507MB |
| SEQERA: multiqc:1.33--ee7739d47738383b |
abd5751768f8 |
2.01GB |
432MB |
| SEQERA: pandas:2.2.3--9b034ee33172d809 |
50da2ef5f060 |
765MB |
190MB |
| SEQERA: python-igraph_pyyaml_scanpy:cc0304f4731f72f9 |
8f65ff8a2191 |
1.66GB |
401MB |
| SEQERA: python_pyyaml_scanpy:b5509a698e9aae25 |
e0dac9eda4d7 |
1.85GB |
461MB |
| SEQERA: python_pyyaml_scanpy_scikit-image:750e7b74b6d036e4 |
e2816307a73f |
2.04GB |
509MB |
| SEQERA: pyyaml_scanpy:3c9e9f631f45553d |
7ed2839670f9 |
1.63GB |
392MB |
| SEQERA: pyyaml_scanpy:a3a797e09552fddc |
228c2994c5f4 |
1.86GB |
466MB |
| SEQERA: scanpy_upsetplot:1ce883f3ff369ca8 |
a91e0a660553 |
1.67GB |
414MB |
| SEQERA: scvi-tools:1.3.3--df115aabdccb7d6b |
551e3b44c383 |
4.66GB |
1.08GB |
| SEQERA: scvi-tools:1.4.1--47f5b0e6b70fd131 |
0ac460cb48b1 |
3.47GB |
797MB |
| nicotru/celda:1d48a68e9d534b2b |
3a4f38d26238 |
2.95GB |
759MB |
| nicotru/scds:7788dbeb87bc7eec |
e6aac618e327 |
2.48GB |
651MB |
| nicotru/seurat:b3b12d17271014d9 |
22f891364efc |
3.35GB |
853MB |
| nicotru/soupx:f6297681695fbfcf |
222d79287a15 |
2.82GB |
700MB |
| saditya88/singler:0.0.1 |
cb267ab7d826 |
9.13GB |
2.64GB |
(This issue has been brought to my attention as I rent the server and also pay for disk space 😃)
I asked Codex to analyze repo structure and find some ways to optimize container usage, not touching nf-core/modules and accounting for python version pinning you mentioned. Here's the output:
Implementation Plan
- Use nf-core module containers as the canonical baseline for overlapping local tool families.
- Align local
SCVITOOLS_SCVI and SCVITOOLS_SCANVI to the same scvi-tools=1.3.3 container/env family already used by vendored SCVITOOLS_SOLO and SCVITOOLS_SCAR.
- Collapse the local generic
scanpy 1.11.5 / 1.11.2 split onto one pinned local baseline that is compatible with the nf-core scrublet stack.
- Standardize local
scanpy modules that only need core scanpy functionality on python=3.12.11, scanpy=1.11.2, pyyaml=6.0.2.
- Apply the same base version to additive local scanpy envs (
neighbors, paga, leiden, harmony, bbknn) while keeping their extra packages.
- Collapse the local upsetplot fork.
- Change
ADATA_UPSETGENES to use anndata directly instead of scanpy for reading .h5ad.
- Move
ADATA_UPSETGENES and DOUBLET_REMOVAL onto one shared pinned local env: python=3.12.11, anndata=0.12.7, upsetplot=0.9.0.
- Replace
docker.io/saditya88/singler:0.0.1 with a local Dockerfile built from a minimal R/Wave-compatible base containing the actual R dependencies used by singleR.R, including bioconductor-hdf5array and anndataR.
Test Plan
- Run nf-tests for affected local modules and subworkflows:
integrate
quality_control
doublet_detection
celltype_assignment
- affected local modules under
scanpy, scvitools, adata/upsetgenes, doublet_detection/doublet_removal, and celltypes/singler
- Run pipeline tests with
-profile test,docker and -profile test_full,docker.
Does this plan make sense?
You also mentioned that private docker hub images can be replaced with Seqera ones, but Codex thought it's too much for one conservative pass :)
Command used and terminal output
Relevant files
No response
System information
No response
Description of the bug
Currently, there are more than 20 container images in the pipeline totaling ~60 GB of disk usage
I remember that for nf-core modules the recommended principle is "one tool – one container", but here we mostly have local modules and I think some redundancy (caused by incremental development) can be reduced.
(This issue has been brought to my attention as I rent the server and also pay for disk space 😃)
I asked Codex to analyze repo structure and find some ways to optimize container usage, not touching nf-core/modules and accounting for python version pinning you mentioned. Here's the output:
Does this plan make sense?
You also mentioned that private docker hub images can be replaced with Seqera ones, but Codex thought it's too much for one conservative pass :)
Command used and terminal output
Relevant files
No response
System information
No response