Improve the Efficiency of the /api/index/perms API call#12200
Conversation
rtreacy
left a comment
There was a problem hiding this comment.
via Claude --
PR #12200: Improve the Efficiency of the /api/index/perms API call (by qqmyers)
The PR replaces a synchronous, memory-heavy indexAllPermissions() with an @asynchronous method that iterates dataverses individually, delegating to the existing
indexPermissionsOnSelfAndChildren() logic.
Key recommendations:
- Add @TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED) — Without it, the async method holds a single transaction open for the entire run (potentially hours on large installations),
risking timeouts. - Guard against concurrent invocations — The fire-and-forget async nature means nothing prevents multiple overlapping runs. An AtomicBoolean or @Singleton/@lock(WRITE) pattern would prevent
this. - No authentication on the endpoint (pre-existing) — Any anonymous user can trigger resource-intensive reindexing via a simple GET request.
- Confirm old method is fully deleted — The old indexAllPermissions() becomes dead code after this change.
The approach itself is sound — reusing indexPermissionsOnSelfAndChildren with its batched sub-transactions and em.clear() every 10 dataverses is correct for memory management.
|
Changes made per review, plus - changed to POST as best practice to avoid cross site issues, also updated call to permission reindex a single dataset to use POST/require superuser and added documentation (calls were previously undocumented) in release note, solr guide and change log. |
# Conflicts: # doc/sphinx-guides/source/api/changelog.rst # src/main/java/edu/harvard/iq/dataverse/search/SolrIndexServiceBean.java
What this PR does / why we need it: The (undocumented?) API call /api/index/perms iterates through all dvobjects in the database in one synchronous transaction, making it `unusable. This PR replaces that logic with an asynchronous iteration over the datasets and dataverses in the root dataverse and use of the index self and children logic used elsewhere. That should make it much faster/less memory intensive.
Which issue(s) this PR closes:
Special notes for your reviewer: FWIW - the old code calls findAll dvobjects which, despite the allExceptFiles variable name, includes files. The code then puts all the files in a map and, as far as I can see, never used them.
Suggestions on how to test this: Run the api call before and after, confirm speed/memory improvement, check the db to make sure all last per index times are updated after the new call.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: