Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 23, 2025

The workflow fails with 403 Forbidden when calling Figshare's /articles/search endpoint. The Python code supports token auth via FIGSHARE_TOKEN env var, but the workflow doesn't pass it. Additionally, rate limiting issues can occur with authenticated requests.

Changes

Workflow (.github/workflows/figshare-processing.yaml)

  • Pass FIGSHARE_TOKEN secret as environment variable to Python script
  • Add force_refresh input for workflow_dispatch (dropdown: true/false, default: false)
  • Use cache by default for all automated runs (scheduled, push, PR)
  • Only use --force-refresh when manually dispatched with force_refresh=true

Error Handling (figshare.py)

  • Add authentication status logging on init
  • Extract 403 error handling into __handle_403_error() helper
  • Provide actionable error messages referencing setup docs
  • Add configurable rate limiting with 1-second default delay between API requests
  • Add --rate-limit-delay CLI argument for customization
  • Sleep only applies after actual HTTP requests, not cached responses

Documentation

  • README.md: Setup instructions, token configuration, usage guide, rate limiting behavior, workflow behavior
  • FIGSHARE_API_RESEARCH.md: 403 error analysis, API authentication details
  • IMPLEMENTATION_SUMMARY.md: Token setup walkthrough

Required Setup

Repository owner must add FIGSHARE_TOKEN GitHub secret:

  1. Generate personal token at figshare.com → Account Settings → Applications
  2. Add as repository secret: Settings → Secrets → Actions → FIGSHARE_TOKEN

Without the token, the workflow will log clear warnings and continue to fail with 403 errors on API calls.

Rate Limiting

  • Default: 1-second delay between API requests to prevent rate limiting
  • Configurable via --rate-limit-delay parameter
  • Only triggers for actual API calls (cached requests return immediately)

Workflow Behavior

  • Scheduled/Push/PR runs: Use cache (faster, fewer API calls)
  • Manual dispatch (default): Use cache
  • Manual dispatch with force_refresh=true: Fetch fresh data with rate limiting

Example

# Before: Anonymous requests fail
headers = {}  # No auth, 403 error

# After: Authenticated requests succeed with rate limiting
headers = {"Authorization": f"token {self.token}"} if self.token else {}
if not self.token:
    self.logger.warning("No authentication token - may hit 403 errors")

# Rate limiting after each API request
if self.rate_limit_delay > 0:
    time.sleep(self.rate_limit_delay)
Original prompt

This section details on the original issue you should resolve

<issue_title>[ISSUE]: 403 returned for articles/search</issue_title>
<issue_description>### Description

https://github.com/LCAS/eprint_cache/actions/runs/20460655076/job/58792451910 is one example of a failed run due to 403 error on the fig share API.</issue_description>

<agent_instructions>research what could cause a 403 here and check if we need an API token for fig share api and how to get one.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 4 commits December 23, 2025 14:02
Co-authored-by: marc-hanheide <1153084+marc-hanheide@users.noreply.github.com>
Co-authored-by: marc-hanheide <1153084+marc-hanheide@users.noreply.github.com>
Co-authored-by: marc-hanheide <1153084+marc-hanheide@users.noreply.github.com>
Co-authored-by: marc-hanheide <1153084+marc-hanheide@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix 403 error for articles/search on fig share API Add Figshare API authentication to fix 403 errors Dec 23, 2025
Copilot AI requested a review from marc-hanheide December 23, 2025 14:10
@marc-hanheide
Copy link
Member

@copilot I'm still hitting rate limits, add a configurable sleep after each request to the fig share API, set to a default of 1 second. In addition, in the workflow, make the --force-refresh optional, only to be used when the workflow is manually dispatched and manually enabled. Otherwise make use of the cache.

- Add configurable rate_limit_delay parameter (default 1 second) to FigShare class
- Add sleep after each API request to avoid rate limits
- Add --rate-limit-delay CLI argument
- Update workflow to only use --force-refresh when manually dispatched with force_refresh=true
- By default, workflow now uses cache for all automated runs
- Update README with rate limiting documentation and workflow behavior

Co-authored-by: marc-hanheide <1153084+marc-hanheide@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 23, 2025

@copilot I'm still hitting rate limits, add a configurable sleep after each request to the fig share API, set to a default of 1 second. In addition, in the workflow, make the --force-refresh optional,...

Done in commit e310467.

Changes made:

  1. Added configurable rate limiting with 1-second default delay between API requests (via --rate-limit-delay parameter)
  2. Updated workflow to only use --force-refresh when manually dispatched with force_refresh=true input
  3. All automated runs (scheduled, push, PR) now use cache by default

The workflow will now respect rate limits and minimize API calls. To force a refresh, manually trigger the workflow and set force_refresh to true.

Copilot AI changed the title Add Figshare API authentication to fix 403 errors Add Figshare API authentication and rate limiting to fix 403 errors Dec 23, 2025
@marc-hanheide marc-hanheide marked this pull request as ready for review December 23, 2025 16:49
@marc-hanheide marc-hanheide merged commit 54b4fb5 into main Dec 23, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ISSUE]: 403 returned for articles/search

2 participants