Skip to content

Fix bug with load_mls_group_with_lock#2814

Closed
neekolas wants to merge 1 commit intomainfrom
11-19-fix_bug_with_load_mls_group_with_lock
Closed

Fix bug with load_mls_group_with_lock#2814
neekolas wants to merge 1 commit intomainfrom
11-19-fix_bug_with_load_mls_group_with_lock

Conversation

@neekolas
Copy link
Copy Markdown
Contributor

@neekolas neekolas commented Nov 20, 2025

tl;dr

  • Fixes a bug where load_mls_group_with_lock does not return an error or wait if the group is already locked (synchronously or asynchronously)
  • Replaces a lot of usage of load_mls_group_with_lock with load_mls_group...which does basically the same thing as the old version but more honestly.

The issue

Previously load_mls_group_with_lock would call get_lock_sync, which returns an error. But it ignored the result and wouldn't return if it failed. That means that if anyone had already acquired the lock, it would do nothing.

If you were the only lock holder you could block someone from calling load_mls_group_with_lock_async, which does successfully wait until the lock was released. But even that would only respect the first caller of load_mls_group_with_lock, so it isn't a functional semaphore or RwLock​ either.

Why not just fix the bug?

I decided to change to the new load_mls_group in most cases for a few reasons.

  1. The current implementation would have had a ton of lock errors if it actually worked. We call methods that call load_mls_group_with_lock from inside the callbacks to load_mls_group_with_lock_async
  2. Most of the calls to load_mls_group_with_lock are for read-only data. If an app wants to call group.members() they don't want (and can't) coordinate their calls around whether or not we are syncing or receiving messages from a stream. If this lock actually worked it would make our SDKs extremely flaky since we would be returning lock errors everywhere.

So, this is safe?

I'm not so sure. We have to thoroughly audit all usage of any method that previously called any of the methods that relied on load_mls_group_with_lock and make sure that they shouldn't actually be taking out a lock (and erroring if one isn't available) or using the async method and waiting until the lock becomes available. We generally use load_mls_group_with_lock_async for heavy lifting...but maybe not exclusively. So we shouldn't merge this without some more investigation tomorrow.

Other notes

  • I kept the callback format for load_mls_group​...but I probably should just make it a regular function.

@claude
Copy link
Copy Markdown

claude Bot commented Nov 20, 2025

Claude Code is working…

I'll analyze this and get back to you.

View job run

Copy link
Copy Markdown
Contributor Author

neekolas commented Nov 20, 2025


How to use the Graphite Merge Queue

Add the label mergequeue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.69%. Comparing base (6c66152) to head (802233d).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2814      +/-   ##
==========================================
+ Coverage   74.67%   74.69%   +0.02%     
==========================================
  Files         385      385              
  Lines       49762    49774      +12     
==========================================
+ Hits        37158    37181      +23     
+ Misses      12604    12593      -11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@neekolas neekolas force-pushed the 11-19-fix_bug_with_load_mls_group_with_lock branch 2 times, most recently from 84c325b to 802233d Compare November 20, 2025 05:00
@neekolas neekolas force-pushed the 11-19-fix_bug_with_load_mls_group_with_lock branch from 802233d to 3bda50f Compare November 20, 2025 21:09
@neekolas neekolas closed this Feb 5, 2026
@insipx insipx deleted the 11-19-fix_bug_with_load_mls_group_with_lock branch March 25, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant