Skip to content

[BUG] Index dump workflow fails with misleading "dataset not found" error due to silent exception handling in setup.py #367

@adambuttrick

Description

@adambuttrick

Version
v2

Describe the bug
The dev_index_dump.yml in dev-ror-records workflow fails when attempting to index a data dump file that exists in the ror-data-test repository. In these cases, the API returns a misleading "not found" error even though the file can be confirmed to be present in the repo.

The root cause looks to be in ror-api/rorapi/management/commands/setup.py. Specificially, the get_ror_dump_sha function calls the GitHub API to verify the dump file exists, but includes a bare except: clause that silently swallows any errors that occur. So, when the GitHub API call fails (for any reason), the function returns None, and the user sees "ROR dataset for file ... not found" instead of the actual error.

# Current code (setup.py lines ~17-30)
try:
    repo_contents = response.json()
    for file in repo_contents:
        if filename in file['name']:
            sha = file['sha']
    return sha
except:        # <-- silently hides ALL errors
    return None

If, for example, the GITHUB_TOKEN env var in the ECS task is expired or misconfigured, the GitHub API returns a 401 JSON dict (not a list). Iterating over the dict raises a TypeError, which the bare except: catches, returning None, producing the misleading "not found" message.

To Reproduce

  1. Trigger the dev_index_dump.yml workflow from the dev branch with inputs:
    • release-dump: v2.5-2026-03-31-ror-data
    • schema-version: v2
    • data-env: test
  2. Workflow calls the ror-api dev endpoint: GET /v2/indexdatadump/v2.5-2026-03-31-ror-data/test
  3. API returns: {"status":"ERROR: ROR dataset for file v2.5-2026-03-31-ror-data not found. Please generate the data dump first."}

Expected behavior
The dump file v2.5-2026-03-31-ror-data.zip (32MB, SHA 9d26bc09) exists in ror-community/ror-data-test and was committed on 2026-04-01. The API should find the file and proceed with indexing. If the GitHub API call fails, the error message should surface the actual failure reason (e.g., "GitHub API returned 401: Bad credentials").

Device information:

  • GitHub Actions runner: ubuntu-latest
  • ror-api dev: ECS container (rorcommunity/ror-api:dev), last deployed 2026-02-26

Additional context

  • The last successful run of this workflow was 2026-02-02 (indexing v2.2-2026-01-29-ror-data with test env).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething is not working as defined

    Type

    No type

    Projects

    Status

    Planned

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions