Write batches of GuestBookResponses for performance#12479
Conversation
(cherry picked from commit 2a1a98e)
(cherry picked from commit 5da66f5)
Summary • Updated multi-file download logic to enforce Locally FAIR restrictions. Changes • Modified FileDownloadServiceBean.resolveSelectedDataFilesInDataset to accept a DataverseRequest and perform a isLocallyFAIR check on the first file being resolved. • Updated FileDownloadServiceBean.writeGuestbookAndStartBatchDownload to pass the current DataverseRequest when resolving files. • Updated Access.downloadDatafiles in the API to pass the DataverseRequest to the file resolution service. • Added necessary import for DataverseRequest in FileDownloadServiceBean.java.
|
Let me take a closer look... It is entirely possible that this was just a bug - that it always defaulted to the original, and nobody noticed over the years. Although I do remember users asking questions specifically about the handling of tabular files in the output of the zipper; so it must have been working at least at some point. |
Huh, it really is a bug that does just that - always defaults to "format=original". But ONLY when the zipper is accessed via the API. When bundles are downloaded from the jsp pages, you get either the originals or archival tabs, as requested. |
|
@qqmyers Is the pr still work in progress - or is it "ready for triage"? |
|
~yes - it probably needs a release note and I wanted to make sure all the IT tests passed, along with getting your nod on the orig zip part, but yes, it could probably move into review w/o waiting for any of that. |
FileAccessRequest declares the realtionship to GuestbookReponse as OneToOne. The OneToMany here appears to be a typo (without any real impact) from IQSS#9599
What this PR does / why we need it: This PR combines a new method to write batches of guestbook entries, versus saving each one separately with the changes from #12110 and #12319, and in doing that, extends Locally FAIR support to the external zipper code. It also switches to using a custom query to get the file id list for a dataset version rather than iterating through all filemetadatas to get datafile objects to get ids.
Which issue(s) this PR closes:
Special notes for your reviewer: I've tried to verify that LocallyFAIR constraints are imposed on all the download get calls.
One oddity I saw is that the handleCustomZipper call appears to have forced orig to true while the code to set up the job (e.g. in fileDownloadService.addFileToCustomZipJob seems to handle both. I'm not sure why that should be - perhaps @landreev can comment - this PR changes it to pass the original flag through but I can revert that if needed.
There is also some cleanup of things in #12110 - in the downloadDatafiles method, the String body sent in was parsed as the list of fileIds and then, later in the same method (old line 1095) as the guestbook response. I think the second use would never have happened. I also stopped making a Map that was only used to get the List of values, avoiding repeat conversions of csv strings to Lists and back again, etc.
Suggestions on how to test this: It should pass all the existing tests. Writing guestbook entries for datasets with many files should be faster - seen at QDR with a 10K file dataset.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: