Skip to content

S3 cache implementation#438

Open
wooferclaw wants to merge 12 commits intomainfrom
s3_etl_cache
Open

S3 cache implementation#438
wooferclaw wants to merge 12 commits intomainfrom
s3_etl_cache

Conversation

@wooferclaw
Copy link
Copy Markdown
Collaborator

Related to #419

initial attempt to create S3 cache alternative and slightly refactor on-dmand processing accordingly

@wooferclaw wooferclaw self-assigned this Jan 20, 2026
Making feature branch up to date with depedency package fixes
@jimmymathews
Copy link
Copy Markdown
Collaborator

I made a branch off of this branch: https://github.com/nadeemlab/smprofiler/tree/merge_main_s3_etl_cache

In that branch I merged in main to make it up to date, and I also made a quick-fix patch to the references to cell identifiers from the GNN-related code, with a warning message that it breaks some functionality. This can be fixed but it is unrelated to the present issue, so it can be done later.

@jimmymathews
Copy link
Copy Markdown
Collaborator

Since this PR modifies the ETL, in the local testing one needs to be sure to run

make force-rebuild-data-loaded-images

before running tests, in case any old images are present in your docker cache. This will force the ETL to run on the test datasets.

@jimmymathews
Copy link
Copy Markdown
Collaborator

On that previously mentioned branch, all the data-loaded images build without errors (in my environment).

Grigory Frantsuzov and others added 6 commits February 20, 2026 18:53
…idation (#444)

* start cleaning up unused or unnecessary code in refactored areas, and do some related deprecations, and start getting builds and tests to pass

* Include all counts in initial etl

* start adding context docs

* Start deprecating old parsing loop, add docs, and break down new parsing loop into readable units

* continue refactor, adding docs

* More deprecations and notes

* Deprecate intermediate caching steps and start deprecating feature matrix extraction implementation, old

* Implement read operations on cache store

* Start reducing channel metadata passing to just feature order

* Reimplement feature matrix extractor to start from saved payloads

* Deprecate "autocomputed" squidpy metrics, just use normal accessor

* Further deprecations

* More deprecations

* move insert count / cache counts

* typing

* fix circular import

* Restore small bitwise operation

* fix typos

* linting

* some version bumps

* Update expressions matrix test

* Simplify feature matrix extraction output, add cache store exists check

* Version bump for major dependencies, expected versions list

* Update expected record counts after removing big tables

* Update some tests, typos etc

* Simplify headers test

* Fix 0 indexing

* Fix bug that included IDs as channel values subsampler

* Fix function signatures

* version bumps

* Deprecate obsolete test

* Deprecate obsolete test

* Deprecate obsolete test

* Deprecate obsolete test

* Fix functional call signature

* Update test for continuous intensity channel matrix integrity

* Deprecate portion of test

* Comment

* Linting

* Use new cache store in cells data accessor; put together additional byte-level ops into compressed matrix handling module; partially normalize the ondemand-workers access to cells data to use same system as feature matrix aggregation, less duplication

* Reduce duplicatoin

* Linter

* Deprecate specialized blob type for umap, re-use existing

* Linter

* Linter

* Fix typo

* Version bump and include change log notes.

* Renaming

* Update version

* Deprecate constraint drop/recreate module

* Deprecate constraint drop/recreate module

* Add rough scale-down for continuous channel data in UMAP case, conform to other samples

* Deprecate recording/output of unused channel metadata
@jimmymathews
Copy link
Copy Markdown
Collaborator

jimmymathews commented Apr 9, 2026

I just merged a large refactor into this branch, including many deprecations. After conforming this branch to main it will be ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants