Skip to content

[Opt] cache thread-local CUDA stream for lock acquisition kernels#248

Open
rhdong wants to merge 1 commit intoNVIDIA-Merlin:masterfrom
rhdong:hrong/cache-stream-triple-lock
Open

[Opt] cache thread-local CUDA stream for lock acquisition kernels#248
rhdong wants to merge 1 commit intoNVIDIA-Merlin:masterfrom
rhdong:hrong/cache-stream-triple-lock

Conversation

@rhdong
Copy link
Copy Markdown
Member

@rhdong rhdong commented Feb 24, 2026

  • Replace per-call cudaStreamCreate/Destroy in lock_read(), lock_update(), and lock_update_read() with a thread_local cached stream. Eliminates CUDA driver contention when multiple threads acquire locks concurrently, which caused triple-group mode to underperform R/W lock in mixed workloads (0.93x) despite allowing concurrent updaters.

  • CI tests passed on the local machine.

  • Triple-Group vs R/W Lock Concurrency Benchmark

    Config: dim=16, capacity=128M, HBM=16GB, λ=0.75, batch=64K, 200 batches/thread, H100 NVL

    With Stream Cache Optimization

    Workload Threads TG (B-KV/s) RW (B-KV/s) Speedup
    Read-heavy 8F/1U/1I 1.451 1.408 1.03×
    Update-heavy 4F/5U/1I 1.675 0.523 3.21×
    Insert-heavy 4F/2U/4I 0.551 0.459 1.20×
    Update-only 1U 1.054 1.046 1.01×
    Update-only 2U 1.613 1.125 1.43×
    Update-only 5U 2.279 0.591 3.86×
    Update-only 10U 2.569 0.535 4.80×

    Before vs After Stream Cache

    Workload Before After Change
    Read-heavy 1.02× 1.03×
    Update-heavy 0.93× 3.21× fixed
    Insert-heavy 0.98× 1.20× +22%
    Update-only 1U 1.01× 1.01×
    Update-only 2U 1.33× 1.43× +8%
    Update-only 5U 2.36× 3.86× +64%
    Update-only 10U 2.14× 4.80× +124%

Replace per-call cudaStreamCreate/Destroy in lock_read(), lock_update(),
and lock_update_read() with a thread_local cached stream. Eliminates
CUDA driver contention when multiple threads acquire locks concurrently,
which caused triple-group mode to underperform R/W lock in mixed
workloads (0.93x) despite allowing concurrent updaters.
@rhdong rhdong requested a review from jiashuy February 24, 2026 21:45
@rhdong rhdong self-assigned this Feb 24, 2026
@github-actions
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant