-
Notifications
You must be signed in to change notification settings - Fork 33
Support for partial download of zarrs #1596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
implemented by adding "subpath" within an Asset record and adding support
to assess it for .zarrs.
Note also that it adjusts logic from try/except to explicit checking -- much
better since avoids swallowing unrelated exceptions.
Tried with
dandi -l debug download -f debug --preserve-tree dandi://dandi/000108/sub-MITU01/ses-20210521h17m17s06/micr/sub-MITU01_ses-20210521h17m17s06_sample-178_stain-LEC_run-1_chunk-1_SPIM.ome.zarr/0/0/0/0/0
which seems succeeded downloading, we seems still need to
- [ ] disable checksumming for such partial downloads (since there is no
partial checksums ATM at API level)
- [ ] add tests and verify that works on full file paths
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1596 +/- ##
===========================================
- Coverage 88.58% 62.82% -25.76%
===========================================
Files 78 78
Lines 10872 10885 +13
===========================================
- Hits 9631 6839 -2792
- Misses 1241 4046 +2805
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dandi/dandiapi.py
Outdated
| ) | ||
| asset = assets[0] | ||
| elif not assets: | ||
| zarr_suf = ".zarr/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about .ngff/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, should also be supported
dandi/dandiapi.py
Outdated
| if (zarr_suf in path) and (zarr_suffix_idx := path.index(zarr_suf)): | ||
| # If path ends with .zarr/, we might have a zarr asset without | ||
| # a trailing slash in the path. Try again: | ||
| zarr_path_len = zarr_suffix_idx + len(zarr_suf) | ||
| zarr_path = path[: zarr_path_len - 1] # -1 for trailing / | ||
| subpath = path[len(zarr_path) :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have several issues with this:
- This code accepts paths containing a component that's just the Zarr suffix, e.g.,
foo/.zarr/bar. - The behavior differs from dandidav, which tests each
.zarr/.ngff-suffixed leading portion for Zarr-ness. - Instead of futzing around with strings, I think it'd better to convert
pathto apathlib.PurePosixPathand checkpath.partsto see if any part ends with.zarr(or.ngff, and also that the suffix is not the whole of the part).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I agree that dealing with Path instance might be more robust
dandi/dandiapi.py
Outdated
| #: The asset's (forward-slash-separated) path | ||
| path: str | ||
| #: The path within the asset, as e.g. path in a zarr/ | ||
| subpath: str | None = Field(default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of a BaseRemoteAsset object pointing to a resource inside an asset; the class is supposed to be for assets only. It'd be better to introduce a separate class for Zarr prefixes (and possibly also add a variant of get_asset_by_path() for returning this separate class).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I also disliked it a little but it is already quite a hierarchy of classes there leading to nearly combinatorial explosion. If you see a sensible way, e.g. through adding a single extra class - please go ahead.
Co-authored-by: John T. Wodder II <jwodder@users.noreply.github.com>
implemented by adding "subpath" within an Asset record and adding support to assess it for .zarrs.
Note also that it adjusts logic from try/except to explicit checking -- much better since avoids swallowing unrelated exceptions.
Tried with
dandi -l debug download -f debug --preserve-tree dandi://dandi/000108/sub-MITU01/ses-20210521h17m17s06/micr/sub-MITU01_ses-20210521h17m17s06_sample-178_stain-LEC_run-1_chunk-1_SPIM.ome.zarr/0/0/0/0/0
which seems succeeded downloading, we seems still need to