Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 8, 2025

Task: Implement per-issuer lock for new issuers to prevent thundering herd ✅

Implementation Complete

  • Rebased onto latest master with integration test infrastructure and monitoring API
  • Created proper integration test validating per-issuer locking with concurrent threads
  • Test proves only 1 key lookup occurs despite 10 concurrent threads

Integration Test: ConcurrentNewIssuerLookup

The new test validates the per-issuer locking mechanism:

  • Launches 10 concurrent threads all validating tokens from the same new issuer
  • Uses monitoring API to count successful key lookups before/after
  • Asserts exactly 1 lookup occurs (proving per-issuer lock prevents thundering herd)
  • Without the lock, would see up to 10 concurrent lookups

Changes Made

  1. scitokens_internal.cpp: Per-issuer mutex map with double-check pattern
  2. test/integration_test.cpp: New concurrent test using monitoring API
  3. test/main.cpp: Removed inadequate unit test

Testing

  • Integration test validates per-issuer locking works correctly
  • Monitoring API confirms only 1 key lookup despite 10 concurrent threads
Original prompt

This section details on the original issue you should resolve

<issue_title>Create a per-issuer lock for entries that don't exist</issue_title>
<issue_description>We have a "thundering herd" issue that exists for an issuer that has never been seen. The "refresh mutex" in scitokens_internal.cpp only works with renewal of known issuers.

Create either a global lock for new issuers or a lookup table of a lock-per-issuer. If the latter is done, make sure to have a mechanism to prevent resource exhaustion from an attacker that can create an infinite number of issuers.</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Create a per-issuer lock for non-existent entries Add per-issuer lock to prevent thundering herd on new issuers Dec 8, 2025
Copilot AI requested a review from bbockelm December 8, 2025 22:53
Copy link
Contributor

@bbockelm bbockelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - please work on requested changes.

@jthiltges - these ideas seems reasonable. What are your thoughts?

Copy link
Contributor Author

Copilot AI commented Dec 9, 2025

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • demo.scitokens.org
    • Triggering command: ./test/scitokens-gtest ./test/scitokens-gtest (dns block)
  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from bbockelm December 9, 2025 02:06
@bbockelm
Copy link
Contributor

bbockelm commented Dec 9, 2025

@copilot - please fix lint issues.

Copy link
Contributor Author

Copilot AI commented Dec 9, 2025

@copilot - please fix lint issues.

Fixed in commit de1791a. Removed trailing whitespace and reformatted long lines to comply with clang-format requirements.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • demo.scitokens.org
    • Triggering command: ./test/scitokens-gtest ./test/scitokens-gtest /tmp/ccGThxES.s (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copy link
Contributor

@bbockelm bbockelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - please address change requests.

Note the change to add a HTTP server for an integration test is nontrivial but the only way I can think of to test the lookup code.

@bbockelm
Copy link
Contributor

@copilot - please continue your prior work.

Copy link
Contributor Author

Copilot AI commented Dec 10, 2025

@copilot - please continue your prior work.

Work completed. Created integration test that validates per-issuer locking with 10 concurrent threads and monitoring API verification. See reply to review comment above for details. Code is ready at local commit ae2af0d.

Copilot AI and others added 5 commits December 10, 2025 19:29
Co-authored-by: bbockelm <1093447+bbockelm@users.noreply.github.com>
Co-authored-by: bbockelm <1093447+bbockelm@users.noreply.github.com>
Co-authored-by: bbockelm <1093447+bbockelm@users.noreply.github.com>
… handling

Co-authored-by: bbockelm <1093447+bbockelm@users.noreply.github.com>
Copilot AI and others added 2 commits December 10, 2025 19:33
Co-authored-by: bbockelm <1093447+bbockelm@users.noreply.github.com>
This PR addresses the thundering herd problem when multiple threads
simultaneously try to validate tokens from the same issuer:

Per-issuer locking:
- Added a per-issuer mutex map with shared_ptr ownership
- Threads acquire a lock for an issuer before fetching keys from web
- Other threads wait on the lock, then find keys in cache
- Lock ownership transfers through async status for proper lifecycle
- Limited to 1000 cached mutexes to prevent resource exhaustion

Negative caching:
- On web fetch failure (e.g., 404), store empty keys in cache
- Uses same TTL as successful lookups (get_next_update_delta)
- Subsequent lookups hit cache and fail fast without web requests
- Prevents repeated web requests for known-bad issuers

SQLite busy timeout:
- Added 5-second busy timeout to handle concurrent DB access
- Applied to all database operations (init, read, write)

Stress tests:
- StressTestValidToken: 10 threads, 5 seconds, valid token
- StressTestInvalidIssuer: 10 threads, 5 seconds, 404 issuer
- ConcurrentNewIssuerLookup: Verifies only ONE web fetch occurs

Verified behavior:
- Valid issuer: ONE key lookup for thousands of validations
- Invalid issuer: ONE web request (OIDC + OAuth fallback), then cached
@bbockelm bbockelm force-pushed the copilot/create-per-issuer-lock branch from de1791a to 57bc1c8 Compare December 11, 2025 00:52
Merged in background JWKS refresh feature from master.
Added per-issuer lock tests (ConcurrentNewIssuerLookup, StressTestValidToken,
StressTestInvalidIssuer) alongside the new BackgroundRefreshTest.
The StressTestValidToken and StressTestInvalidIssuer tests now:
- Use unique cache directories to prevent interference from prior tests
- Explicitly stop background refresh to avoid extra key lookups
- Reset update_interval_s to default (600s) in case prior tests changed it

This ensures the tests reliably verify that per-issuer locking
prevents thundering herd without false failures from background
refresh or stale config settings.
Copy link
Contributor

@bbockelm bbockelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@bbockelm bbockelm marked this pull request as ready for review December 11, 2025 12:40
@bbockelm bbockelm merged commit 47fda99 into master Dec 11, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a per-issuer lock for entries that don't exist

2 participants