Skip to content

Write batches of GuestBookResponses for performance#12479

Open
qqmyers wants to merge 10 commits into
IQSS:developfrom
GlobalDataverseCommunityConsortium:WriteMultipleGBsAtOnce
Open

Write batches of GuestBookResponses for performance#12479
qqmyers wants to merge 10 commits into
IQSS:developfrom
GlobalDataverseCommunityConsortium:WriteMultipleGBsAtOnce

Conversation

@qqmyers

@qqmyers qqmyers commented Jun 19, 2026

Copy link
Copy Markdown
Member

What this PR does / why we need it: This PR combines a new method to write batches of guestbook entries, versus saving each one separately with the changes from #12110 and #12319, and in doing that, extends Locally FAIR support to the external zipper code. It also switches to using a custom query to get the file id list for a dataset version rather than iterating through all filemetadatas to get datafile objects to get ids.

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer: I've tried to verify that LocallyFAIR constraints are imposed on all the download get calls.

One oddity I saw is that the handleCustomZipper call appears to have forced orig to true while the code to set up the job (e.g. in fileDownloadService.addFileToCustomZipJob seems to handle both. I'm not sure why that should be - perhaps @landreev can comment - this PR changes it to pass the original flag through but I can revert that if needed.

There is also some cleanup of things in #12110 - in the downloadDatafiles method, the String body sent in was parsed as the list of fileIds and then, later in the same method (old line 1095) as the guestbook response. I think the second use would never have happened. I also stopped making a Map that was only used to get the List of values, avoiding repeat conversions of csv strings to Lists and back again, etc.

Suggestions on how to test this: It should pass all the existing tests. Writing guestbook entries for datasets with many files should be faster - seen at QDR with a 10K file dataset.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

qqmyers added 5 commits June 18, 2026 15:28
(cherry picked from commit 5da66f5)
Summary
•
Updated multi-file download logic to enforce Locally FAIR restrictions.
Changes
•
Modified FileDownloadServiceBean.resolveSelectedDataFilesInDataset to accept a DataverseRequest and perform a isLocallyFAIR check on the first file being resolved.
•
Updated FileDownloadServiceBean.writeGuestbookAndStartBatchDownload to pass the current DataverseRequest when resolving files.
•
Updated Access.downloadDatafiles in the API to pass the DataverseRequest to the file resolution service.
•
Added necessary import for DataverseRequest in FileDownloadServiceBean.java.
@landreev

Copy link
Copy Markdown
Contributor

Let me take a closer look... It is entirely possible that this was just a bug - that it always defaulted to the original, and nobody noticed over the years. Although I do remember users asking questions specifically about the handling of tabular files in the output of the zipper; so it must have been working at least at some point.

@landreev

Copy link
Copy Markdown
Contributor

Let me take a closer look... It is entirely possible that this was just a bug - that it always defaulted to the original, and nobody noticed over the years. Although I do remember users asking questions specifically about the handling of tabular files in the output of the zipper; so it must have been working at least at some point.

Huh, it really is a bug that does just that - always defaults to "format=original". But ONLY when the zipper is accessed via the API. When bundles are downloaded from the jsp pages, you get either the originals or archival tabs, as requested.
/api/access/datafiles/<list> appears to be a somewhat rarely used api at HDV.

@landreev landreev assigned landreev and unassigned landreev Jun 22, 2026
@landreev landreev self-requested a review June 22, 2026 20:27
@landreev

Copy link
Copy Markdown
Contributor

@qqmyers Is the pr still work in progress - or is it "ready for triage"?

@qqmyers

qqmyers commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

~yes - it probably needs a release note and I wanted to make sure all the IT tests passed, along with getting your nod on the orig zip part, but yes, it could probably move into review w/o waiting for any of that.

qqmyers added 5 commits June 23, 2026 09:51
FileAccessRequest declares the realtionship to GuestbookReponse as OneToOne. The OneToMany here appears to be a typo (without any real impact) from IQSS#9599
@qqmyers qqmyers marked this pull request as ready for review June 24, 2026 19:18
@qqmyers qqmyers added the GDCC: QDR of interest to QDR label Jun 24, 2026
@qqmyers qqmyers moved this to Ready for Triage in IQSS Dataverse Project Jun 24, 2026
@qqmyers qqmyers added the Size: 10 A percentage of a sprint. 7 hours. label Jun 24, 2026
@coveralls

Copy link
Copy Markdown

Coverage Status

Coverage is 25.01%GlobalDataverseCommunityConsortium:WriteMultipleGBsAtOnce into IQSS:develop. No base build found for IQSS:develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GDCC: QDR of interest to QDR Size: 10 A percentage of a sprint. 7 hours.

Projects

Status: Ready for Triage

Development

Successfully merging this pull request may close these issues.

3 participants