Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,16 @@
# v1.70.0
# v1.0.71
- A new cache store abstraction (`CacheStore`) is introduced, with optional database/S3 backing, for the preprocessed cell-level data payloads
- Many ETL steps are merged into one, to reduce the bandwidth usage associated with saving/retrieving from the database for operations done only once, as well as to reduce the database storage usage and associated access performance problems. This includes bypassing the big-data portion of the database tables (`histological_structure_identification`, `shape_file`, etc.) at least for cell-level use. The tables remain since they can be used for smaller data (e.g. tissue region annotations). Overall the effect is to vastly improve ETL performance, and make database management easier (enable fast backups, total extraction as SQLite database, etc.). For details see [#419](https://github.com/nadeemlab/smprofiler/issues/419).
- The deprecations include:
- `assess-recreate-cache` script
- `cache_assessment`
- `cache_pulling`
- `sparse_matrix_puller`
- `structure_centroids_puller`
- `count-cells` script
- A bug is fixed in continuous dataframe parsing where the first 4 bytes of each row were accidentally parsed as channel data (these contain a cell ID), and corresponding test data artifact update.

# v1.0.70
Adds support for survival-type data (overall survival, disease progression, etc.). This is achieved by including two new metadata tables, annotating diagnosis condition and result (pairs) representing the types of events recorded. New records in the diagnosis table (referring to the new metadata) can then be added with event time data.

New datasets prepared for the ETL process must now include `permanent_condition_diagnosis.tsv` and `condition_lack.tsv`, as well as any needed `diagnosis.tsv` rows. These additions will be picked up by the normal ETL for the whole dataset, but they can also be uploaded on an isolated one-time basis with:
Expand Down
8 changes: 4 additions & 4 deletions build/build_scripts/expected_counts_1and2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,19 @@
data_file 11
diagnosis 6
diagnostic_selection_criterion 6
expression_quantification 345791
expression_quantification 0
feature_specification 0
feature_specifier 0
histological_structure 10627
histological_structure_identification 10627
histological_structure 0
histological_structure_identification 0
histology_assessment_process 11
intervention 6
permanent_condition_diagnosis 1
plane_coordinates_reference_system 0
publication 4
quantitative_feature_value 0
research_professional 43
shape_file 10627
shape_file 0
specimen_collection_process 11
specimen_collection_study 2
specimen_data_measurement_process 11
Expand Down
18 changes: 9 additions & 9 deletions build/build_scripts/expected_table_counts.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,29 +5,29 @@
cell_phenotype_criterion 27
chemical_species 26
condition_lack 1
data_analysis_study 2
data_analysis_study 1
data_file 7
diagnosis 2
diagnostic_selection_criterion 4
expression_quantification 18200
feature_specification 2
feature_specifier 8
histological_structure 700
histological_structure_identification 700
expression_quantification 0
feature_specification 0
feature_specifier 0
histological_structure 0
histological_structure_identification 0
histology_assessment_process 7
intervention 2
permanent_condition_diagnosis 1
plane_coordinates_reference_system 0
publication 2
quantitative_feature_value 1400
quantitative_feature_value 0
research_professional 32
shape_file 700
shape_file 0
specimen_collection_process 7
specimen_collection_study 1
specimen_data_measurement_process 7
specimen_measurement_study 1
study 1
study_component 4
study_component 3
study_contact_person 1
subject 2
two_cohort_feature_association_test 0
18 changes: 9 additions & 9 deletions build/build_scripts/expected_table_counts_1small.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,29 +5,29 @@
cell_phenotype_criterion 27
chemical_species 26
condition_lack 1
data_analysis_study 2
data_analysis_study 1
data_file 2
diagnosis 2
diagnostic_selection_criterion 4
expression_quantification 5200
feature_specification 1
feature_specifier 4
histological_structure 200
histological_structure_identification 200
expression_quantification 0
feature_specification 0
feature_specifier 0
histological_structure 0
histological_structure_identification 0
histology_assessment_process 7
intervention 2
permanent_condition_diagnosis 1
plane_coordinates_reference_system 0
publication 2
quantitative_feature_value 200
quantitative_feature_value 0
research_professional 32
shape_file 200
shape_file 0
specimen_collection_process 7
specimen_collection_study 1
specimen_data_measurement_process 2
specimen_measurement_study 1
study 1
study_component 4
study_component 3
study_contact_person 1
subject 2
two_cohort_feature_association_test 0
4 changes: 2 additions & 2 deletions build/build_scripts/import_test_dataset1small.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ rm file_manifest.tsv.bak
cp study.json.bak $STUDY_JSON
rm study.json.bak

smprofiler graphs upload-importances --config_path=build/build_scripts/.graph.small.config --importances_csv_path test/test_data/gnn_importances/3.csv
smprofiler db count-cells --database-config-file=build/db/.smprofiler_db.config.local
#smprofiler graphs upload-importances --config_path=build/build_scripts/.graph.small.config --importances_csv_path test/test_data/gnn_importances/3.csv
#smprofiler db count-cells --database-config-file=build/db/.smprofiler_db.config.local

cat work/*/*/.command.log
smprofiler db status --database-config-file build/db/.smprofiler_db.config.local > table_counts.txt
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ cat build/build_scripts/.workflow.config | sed 's/YYY/3/g' > .workflow.config
smprofiler workflow configure --workflow='tabular import' --config-file=.workflow.config
nextflow run .

smprofiler graphs upload-importances --config_path=build/build_scripts/.graph.config --importances_csv_path=test/test_data/gnn_importances/3.csv
#smprofiler graphs upload-importances --config_path=build/build_scripts/.graph.config --importances_csv_path=test/test_data/gnn_importances/3.csv

cat work/*/*/.command.log
smprofiler db status --database-config-file build/db/.smprofiler_db.config.local > table_counts.txt
Expand Down
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
Condition Presence result Absence result
Test condition Test presence Test absence
Original file line number Diff line number Diff line change
Expand Up @@ -36,5 +36,5 @@ Subjects file subjects.tsv Manifest of subjects Melanoma CyTOF ICI
Study file study.json Project-level data Melanoma CyTOF ICI
Diagnosis file diagnosis.tsv List of subject diagnoses Melanoma CyTOF ICI
Interventions file interventions.tsv List of subject intervention events Melanoma CyTOF ICI
Permanent condition diagnosis file permanent_condition_diagnosis.tsv List of survival-type-data conditions
Condition lack file condition_lack.tsv List of survival-type-data polarities
Permanent condition diagnosis file permanent_condition_diagnosis.tsv List of survival-data-type events Melanoma CyTOF ICI
Condition lack file condition_lack.tsv List of survival-metadata-type polarities Melanoma CyTOF ICI
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
Condition Result
Test condition Test result
1 change: 0 additions & 1 deletion docs/cells.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

# Data structure to represent summarized cell-level data for one slide

For memory- and time-efficient manipulation, a simple binary data structure is used to represent the cell data for one slide.
Expand Down
7 changes: 3 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
[build-system]
requires = [
"setuptools>=63",
"setuptools>=82",
"wheel"
]
build-backend = "setuptools.build_meta"

[project]
name = "smprofiler"
version = "1.0.70"
version = "1.0.71"
authors = [
{ name = "James Mathews", email = "mathewj2@mskcc.org" }
]
Expand Down Expand Up @@ -54,6 +54,7 @@ all = [
"cryptography",
"dask[dataframe]",
"dask-expr",
"dask-image>=2025.11.0",
"fastapi",
"h5py",
"jinja2",
Expand Down Expand Up @@ -217,7 +218,6 @@ packages = [
]
"smprofiler.ondemand.scripts" = [
"read_expression_dump_file.py",
"assess_recreate_cache.py",
"start.py",
]
"smprofiler.db.scripts" = [
Expand All @@ -233,7 +233,6 @@ packages = [
"interactive_uploader.py",
"load_testing.py",
"sync_annotations.py",
"count_cells.py",
"cache_subsample.py",
]
"smprofiler.db.data_model" = [
Expand Down
18 changes: 9 additions & 9 deletions requirements.apiserver.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ annsel==0.1.2
anyio==4.13.0
array-api-compat==1.14.0
attrs==26.1.0
boto3==1.42.83
botocore==1.42.83
boto3==1.42.87
botocore==1.42.87
brotli==1.2.0
cattrs==26.1.0
certifi==2026.2.25
Expand All @@ -20,7 +20,7 @@ click==8.3.2
cloudpickle==3.1.2
colorcet==3.1.0
contourpy==1.3.3
cryptography==46.0.6
cryptography==46.0.7
cycler==0.12.1
dask==2026.1.1
dask-image==2025.11.0
Expand All @@ -34,7 +34,7 @@ frozenlist==1.8.0
fsspec==2026.1.0
geopandas==1.1.3
google-crc32c==1.8.0
greenlet==3.3.2
greenlet==3.4.0
h11==0.16.0
h5py==3.16.0
idna==3.11
Expand All @@ -51,7 +51,7 @@ markdown-it-py==4.0.0
markupsafe==3.0.3
matplotlib==3.10.8
mdurl==0.1.2
more-itertools==11.0.1
more-itertools==11.0.2
msgpack==1.1.2
multidict==6.7.1
multipledispatch==1.0.0
Expand All @@ -62,7 +62,7 @@ networkx==3.6.1
numba==0.62.1
numcodecs==0.16.5
numpy==2.3.5
ome-zarr==0.14.0
ome-zarr==0.15.0
packaging==26.0
pandas==3.0.2
param==2.3.3
Expand All @@ -72,7 +72,7 @@ patsy==1.0.2
pillow==12.2.0
pims==0.7
pip==25.3
platformdirs==4.9.4
platformdirs==4.9.6
pooch==1.9.0
propcache==0.4.1
psutil==7.2.2
Expand Down Expand Up @@ -105,7 +105,7 @@ scikit-learn==1.8.0
scipy==1.17.1
seaborn==0.13.2
secure==1.0.1
session-info2==0.4
session-info2==0.4.1
setuptools==82.0.1
shapely==2.1.2
six==1.17.0
Expand All @@ -125,7 +125,7 @@ tornado==6.5.5
tqdm==4.67.3
typing-extensions==4.15.0
typing-inspection==0.4.2
umap-learn==0.5.11
umap-learn==0.5.12
universal-pathlib==0.3.10
urllib3==2.6.3
uvicorn==0.44.0
Expand Down
20 changes: 10 additions & 10 deletions requirements.ondemand.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
aiobotocore==3.3.0
aiobotocore==3.4.0
aiohappyeyeballs==2.6.1
aiohttp==3.13.5
aioitertools==0.13.0
Expand All @@ -8,8 +8,8 @@ annotated-types==0.7.0
annsel==0.1.2
array-api-compat==1.14.0
attrs==26.1.0
boto3==1.42.70
botocore==1.42.70
boto3==1.42.84
botocore==1.42.84
brotli==1.2.0
cattrs==26.1.0
certifi==2026.2.25
Expand All @@ -27,7 +27,7 @@ deprecated==1.3.1
distributed==2026.1.1
docrep==0.3.2
donfig==0.8.1.post1
fast-array-utils==1.4
fast-array-utils==1.4.1
fonttools==4.62.1
frozenlist==1.8.0
fsspec==2026.3.0
Expand All @@ -51,7 +51,7 @@ markupsafe==3.0.3
matplotlib==3.10.8
matplotlib-scalebar==0.9.0
mdurl==0.1.2
more-itertools==11.0.1
more-itertools==11.0.2
msgpack==1.1.2
multidict==6.7.1
multipledispatch==1.0.0
Expand All @@ -62,7 +62,7 @@ networkx==3.6.1
numba==0.62.1
numcodecs==0.15.1
numpy==2.3.5
ome-zarr==0.14.0
ome-zarr==0.15.0
omnipath==1.0.12
packaging==26.0
pandas==3.0.2
Expand All @@ -73,7 +73,7 @@ patsy==1.0.2
pillow==12.2.0
pims==0.7
pip==25.3
platformdirs==4.9.4
platformdirs==4.9.6
pooch==1.9.0
propcache==0.4.1
psutil==7.2.2
Expand Down Expand Up @@ -102,15 +102,15 @@ scikit-image==0.26.0
scikit-learn==1.8.0
scipy==1.17.1
seaborn==0.13.2
session-info2==0.4
session-info2==0.4.1
setuptools==82.0.1
shapely==2.1.2
six==1.17.0
slicerator==1.1.0
sortedcontainers==2.4.0
spatial-image==1.2.3
spatialdata==0.7.2
spatialdata-plot==0.3.2
spatialdata-plot==0.3.3
squidpy==1.8.1
statsmodels==0.14.6
tabulate==0.10.0
Expand All @@ -124,7 +124,7 @@ tqdm==4.67.3
typeguard==4.5.1
typing-extensions==4.15.0
typing-inspection==0.4.2
umap-learn==0.5.11
umap-learn==0.5.12
universal-pathlib==0.3.10
urllib3==2.6.3
validators==0.35.0
Expand Down
Loading