-
Notifications
You must be signed in to change notification settings - Fork 128
WIP: Storage device API cleanup #770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…he system level to a different interface number
…eted AddImage and replaced with append.
…ets from different storage devices; added explicit deviceLabel to alternative create and load methods
Using two different integers ("core" and "device" handles) to represent
the same dataset is unnecessary.
Instead, let the Core assign a globally unique dataset handle, and
eliminate handle assignment from devices entirely. This also avoids
copy-pasting the handle assignment logic every time writing a new
storage device.
In the Core, instead of mapping handles to device labels
(`std::string`), map to `std::weak_ptr<StorageInstance>`. This is
important, because otherwise there could be ABA problems: the storage
device with the same label may not be the same device, or it may have
been reloaded, invalidating all handles. (That could also be prevented
by making sure to remove handles when devices are unloaded, but it's
better to eliminate even the possibility of such an error.)
|
I'm thinking the
|
Each device should write this function to match how they allocate string buffers. An implicit default is easy to miss and dangerous.
It is not clear that this is the best API (as opposed to, say, a callback). Let's leave out until we have at least a proof-of-principle implementation.
It doesn't make sense to put an arbitrary selection of projects in the official repo. Added *.slnf to .gitignore so that people (including myself!) can use our own solution filters locally without adding to Git.
AcquireZarrStorage and G2SBigTiffStorage are completely independent of each other, so this makes more sense. Should help with discoverability, build configuration, and code comprehension.
a190e20 to
0a66d05
Compare
Inconsistency can cause problems on Linux. go2scope -> Go2ScopeTmp Go2Scope -> Go2ScopeTmp The 'Tmp' will be removed in the next commit -- using two separate commits helps Git get things right on Windows and macOS.
Go2ScopeTmp -> Go2Scope
|
After further thinking, I've updated the to-do items above and categorized. There's a lot to fix here, and I notice that limiting an initial version to write-only storage would probably allow us to make Storage device available sooner (an idea that I think some of us have floated previously). Also, it will be easier to decouple the API and MMCore implementation from the G2STiff adapter, for the time being, if we do that (rather than awkwardly leaving around unexposed reading capabilities). Conversely, I think it will be good to ensure that (an up-to-date version of) acquire-zarr is a good fit for the device API. So one direction I'm entertaining is to break down the process as follows:
Note that I'm not proposing to eliminate the read capabilities of G2STiff, only to temporarily proceed without them so that we can work on one thing at a time. The minimal API (on the application side) would be something close to this:
I think this maps nicely to acquire-zarr, but is also implementable by most backends. How do people feel about this? I'd especifically appreciate feedback on:
Cc: @nclack, @aliddell, @talonchandler, @dpshepherd, @go2scope, @nenada, @tlambert03, @nicost. |
|
i definitely think a step-wise, build up from minimal API approach is the right way to go for something this large. It's nice to have the giant "working principle" branch to work with, but it's simply too large to evaluate. Having you distill the essence of the core from the scope creep would be a great way to move forward on this. |
|
@marktsuchida thanks for all the work here! One of the things I like about the approach with the Storage API is that it promises to give a route to more pluggable file format support. I'm very supportive of splitting out the G2STiff and AcquireZarr writers, and I think an OME-TIFF writer would be great. As you noted, zarr doesn't have great conventions for per-frame metadata at the moment but Tiff does. I think @tlambert03 got things right wrt the acquire-zarr api. We have changed it a bit since #553. @aliddell maybe you can weigh in here? |
|
Sorry I was out of the loop for the last couple of months and missed the API discussion. I am catching up now. @marktsuchida thank you for effort of integrating the storage API. Let me know what should I do to help. I am also available for meetings. A couple of thoughts: We can remove go2scope and G2STiff entirely from micromanager devices. I will add a separate dll with the g2stiff when the API becomes stable. I agree with Mark's plan for incremental API development. Regarding all other issues, I think there is a lot of material and I do think it requires a meeting to discuss and coordinate. I can definitely put some work towards moving things forward, but not sure where to begin. |
Talley's assessment about what the aqz API needs is spot on. Several things in the (configuration) API have changed since #535, but probably the most relevant change is that multiple output arrays are now supported within a single Zarr dataset, so the properties of the dataset that pertain specifically to arrays have been consolidated into a |
|
Thanks, all, for your comments and support!
Absolutely happy to discuss on Zoom! Let me hammer at this a bit more and see if I can come up with something concrete that we can use as a basis for discussion -- API design being so much about details, I think that might be more productive (and I won't mind if that results in more code churn on my side).
This is a good topic to think ahead about (thanks also @tlambert03 for helpful discussion that brought this up). @aliddell Does If so, specifying the array name (output key) in the dataset configuration (JSON passed at creation time) would allow the use of multi-array datasets -- so long as their writing doesn't overlap in time (my understanding; please correct me if wrong). Regardless, I think there is a path to fairly cleanly supporting simultaneous writing to multiple arrays, and this could be useful for OME-TIFF (which can potentially be a multi-file dataset) as well. Maybe not necessary in the first version of the API, but can be added subsequently. Here's the idea: The Terminology mapping something like this:
In this model, a dataset contains 0 or more streams (1 or more to be useful), and a stream is a portion of the dataset that can only be written sequentially in a predefined order. (Reading, where supported, may be random access -- or not -- but we leave that for later.) The set of valid streams is up to the storage backend/driver, which may expose a single stream, multiple predefined streams, or even arbitrary streams that get created on first access. In the case of acquire-zarr, it would map to the When we add support for attaching storage to a camera for direct saving, it will be a stream, not a whole dataset, that gets attached, at least by default (though there might be a need for virtual streams that span multiple backend-defined streams). (The "stream" parameter might also be a way to support random access writers (which can be modeled as having a stream per plane, for example), should the need arise.) |
Yes, that's right, though there's a risk that metadata can be overwritten. For example, if we have the structure and we decide we later want to write another array at
Not exactly. The array name (output key) doesn't belong in the dataset configuration, in belongs in an array configuration, of which you can pass several of these at dataset creation time. |
|
Makes sense, thanks. We can initially limit our support to configs where we only have one ZarrArraySettings in the ZarrStreamSettings, but that will still allow adding arrays via |
This PR builds on top of #535 (MM::StorageDevice). I'm using a new PR to keep things organized, since I expect to make some pretty extensive changes.
(Merging this PR should mark #535 as merged, as long as #535 is not further modified.)
(You will see all the commits of #535 in this PR, too.)
See only the changes relative to #535.
This is a work in progress to refine the API design for Storage devices. The goal is to get to something that we can merge with reasonable confidence that major breaking changes won't be needed for further evolution.
Tasks (includes items in my review comments to #535; may not be exhaustive):
Big issues
AddImage()is not possibleconfigureDatasetDimension()won't work after creationconfigureDatasetCoordinate()won't work after creation (if ever)ZarrStreamSettings) that won't be possible with the current APIcreateDataset()and leave up to storage backend to interpret?yaozarrs-generated JSONJSON auto-transformed on the app side from yaozarrs or ome-zarr-models in the case of acquire-zarrcreateDataset()would still takeshapeandpixelFormat(possibly redundantly)API design
GetProgress()- remove for now? (Also check for other unimplemented functions.)GetImage()? (Implicitly until next call?) Rather copy into caller-provided buffer?getDeviceNameToOpenDataset(): replace withisDatasetSupportedByStorage(storageLabel, path)StorageDataTypeshould not be specific to storage; let's define a generally usefulPixelFormattypeCreate()says dataset name "may be modified ... to avoid overwriting". This is unhelpful, especially when left to each device -- user has no way to predict the resulting dataset name. We've had lots of bad experience with this in the MMStudio API (and GUI).IsOpen()is questionable: Core may be able to support this now without device's help, but what is the use case? Remove.API technicalities
size_t, notintGetImage()to return an error codeReleaseStringBuffer()implementation should not default todelete[]struct DatasetHandle { int h; };)API naming
appendNextToDataset(): rename topopNextImageAndAppendToDataset()(or similar)MMCore behavior
appendNextToDataset(); can we prevent misuse?AppendImage(): Core should throw if read-only, before asking device; similar for all invalid calls on read/write-only datasetsMisc
MaxMetadataLengthis unused; removeFunctionality beyond write/read
Delete()requirements: dataset must be open (or user must pass path)List()API is hard to use correctly (caller must initialize buffers with empty string; if all caller-provided buffers are filled, can't tell how much more is needed) -- and not exposed in Core APIFreeze()should either be required for read-write devices or (temporarily) removed from the APIKnown limitations whose fixes are out of scope for this PR (can be fixed later):
These plans are not set in stone, so please feel free to propose alternatives or ask questions!
Cc: @go2scope, @nicost, @tlambert03.