Skip to content

Indexing race conditions for new dataset versions #12377

@vera

Description

@vera

What steps does it take to reproduce the issue?

  • When does this issue occur?

When a new dataset version is created and immediately indexed, there is a race condition where the indexing process starts before the transaction that created the dataset version has fully committed to the database. Consequently, the search index may contain entries with a missing datasetVersionId.

This causes server errors when retrieving those drafts from the Solr index later, e.g. when accessing the search API:

[#|2026-04-28T13:40:41.705+0000|SEVERE|Payara 6.2025.10||_ThreadID=321;_ThreadName=http-thread-pool::http-listener-1(12);_TimeMillis=1777383641705;_LevelValue=1000;|
  java.lang.NullPointerException: Cannot invoke "java.lang.Long.longValue()" because "datasetVersionId" is null
	at edu.harvard.iq.dataverse.search.SolrSearchServiceBean.solrDocumentToSolrSearchResult(SolrSearchServiceBean.java:263)
	at edu.harvard.iq.dataverse.search.SolrSearchServiceBean.search(SolrSearchServiceBean.java:717)

This has occurred mainly and regularly when creating a lot of new dataset versions at once via an automated script.

  • Which page(s) does it occurs on?

The issue affects pages that display search results, including the main search page, collection (dataverse) pages, and Search API calls.

  • What happens?
    When search results are processed, the code attempts to use a null or missing datasetVersionId. For search API calls, this led to 500 errors (Internal Server Error).

  • To whom does it occur (all users, curators, superusers)?

This can occur to any user performing a search or viewing a collection that includes recently created or updated datasets that were affected by the race condition.

  • What did you expect to happen?

Search results should always display correctly. Ideally, the indexer should only run once the data is fully committed.

Which version of Dataverse are you using?

6.9

Any related open or closed issues to this bug report?

Not aware of any.

Screenshots:

/

Are you thinking about creating a pull request for this issue?
Help is always welcome, is this bug something you or your organization plan to fix?

Yes, we have implemented a potential fix and will open a PR asap.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions