Skip to content

bthread: implement lock-free AtomicInteger128 for RISC-V#3355

Open
Felix-Gong wants to merge 2 commits into
apache:masterfrom
Felix-Gong:riscv-atomic128-opt
Open

bthread: implement lock-free AtomicInteger128 for RISC-V#3355
Felix-Gong wants to merge 2 commits into
apache:masterfrom
Felix-Gong:riscv-atomic128-opt

Conversation

@Felix-Gong

Copy link
Copy Markdown
Contributor

Summary

Replace mutex-based fallback with seqlock implementation for RISC-V platform, bringing it in line with x86 (SSE) and ARM (NEON) lock-free implementations.

Changes

  • Implement seqlock-based atomic 128-bit load/store for RISC-V
  • Use fence instructions for proper memory ordering
  • Maintain backward compatibility with x86/ARM (no changes to existing code paths)

Seqlock Algorithm

  • Reader: read sequence → read data → verify sequence, retry if stale
  • Writer: increment sequence → write data → increment sequence
  • All memory accesses use fence r, rw (acquire) and fence w, w (release) for ordering

Performance Improvement

Tested on SOPHGO SG2044 (RISC-V rv64gcv):

  • bthread_start_urgent latency: ~1.5-2% reduction
  • adding_func throughput: ~25% improvement (541ns vs 722ns)

Test Plan

  • RISC-V cross-compilation verification
  • Native compilation on RISC-V server
  • Official brpc unit tests (bthread_unittest, test_butil, brpc_server_unittest, etc.)
  • Performance benchmark on RISC-V hardware

Notes

  • All changes are wrapped in #elif defined(__riscv) preprocessor guards
  • x86/ARM behavior is completely unchanged
  • The _seq member is added to AtomicInteger128 class but only used on RISC-V (other platforms have (void)_seq; to suppress warnings)

@Felix-Gong Felix-Gong force-pushed the riscv-atomic128-opt branch 2 times, most recently from 2a4b97f to 5cd6d6a Compare June 24, 2026 09:50
@chenBright chenBright requested a review from Copilot June 24, 2026 11:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements a lock-free (seqlock-based) 128-bit load/store path for bthread::AtomicInteger128 on RISC-V, replacing the prior mutex fallback to improve performance when reading/writing _cpu_time_stat in TaskGroup.

Changes:

  • Adds a sequence counter (_seq) to AtomicInteger128 to support seqlock on RISC-V.
  • Updates AtomicInteger128::load() / store() with RISC-V inline-asm implementations and fences.
  • Keeps existing x86_64 (SSE) and ARM NEON code paths unchanged.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/bthread/task_group.h Adds _seq member and updates the mutex comment for non-lock-free architectures.
src/bthread/task_group.cpp Implements RISC-V seqlock-based 128-bit load/store paths alongside existing x86/ARM implementations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +154 to +177
__asm__ volatile(
"fence w, w\n\t"
"sd %0, %1\n\t"
:
: "r"(new_seq), "m"(_seq)
: "memory"
);
__asm__ volatile("fence w, w\n\t" ::: "memory");
__asm__ volatile(
"sd %0, %2\n\t"
"sd %1, %3\n\t"
:
: "r"(value.v1), "r"(value.v2),
"m"(_value.v1), "m"(_value.v2)
: "memory"
);
__asm__ volatile("fence w, w\n\t" ::: "memory");
new_seq++;
__asm__ volatile(
"sd %0, %1\n\t"
:
: "r"(new_seq), "m"(_seq)
: "memory"
);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. For sd (store) instructions, memory operands are now correctly declared as outputs using "=m" constraints. For ld (load) instructions, memory operands remain as inputs ("m") since ld only reads memory.

  • sd %1, %0 with : "=m"(_seq) : "r"(new_seq) - memory is output
  • ld %0, %1 with : "=r"(seq0) : "m"(_seq) - memory is input

This correctly informs the compiler about which memory locations are modified.

Comment thread src/bthread/task_group.cpp Outdated
return {value[0], value[1]};
#else // __x86_64__ || __ARM_NEON
// RISC-V and other architectures use mutex fallback
#elif defined(__riscv)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added && __riscv_xlen == 64 guard to restrict this path to RV64 only. RV32 will fall back to the mutex implementation.

Comment thread src/bthread/task_group.cpp Outdated
(void)_seq;
int64x2_t v = vld1q_s64(reinterpret_cast<int64_t*>(&value));
vst1q_s64(reinterpret_cast<int64_t*>(&_value), v);
#elif defined(__riscv)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added && __riscv_xlen == 64 guard for store() as well. RV32 builds will use the mutex fallback.

Replace mutex-based fallback with seqlock implementation for RISC-V
platform. This brings RISC-V in line with x86 (SSE) and ARM (NEON)
lock-free implementations.

Seqlock algorithm:
- Reader: read sequence, read data, verify sequence, retry if stale
- Writer: increment sequence, write data, increment sequence
- All memory accesses use fence instructions for ordering

Performance improvement:
- bthread_start_urgent latency: 1.5-2% reduction
- adding_func throughput: 25% improvement (541ns vs 722ns)
@Felix-Gong Felix-Gong force-pushed the riscv-atomic128-opt branch from 5cd6d6a to aee927b Compare June 25, 2026 00:22
- Use "m" (input) for ld instructions reading memory
- Use "=m" (output) for sd instructions writing memory
- Swap operand order in sd templates to match RISC-V syntax (register, memory)

The previous "+m" constraints in input operand positions caused compilation
errors on RISC-V. This fix ensures proper constraint usage:
- ld: memory is input "m", register is output "=r"
- sd: register is input "r", memory is output "=m"
@Felix-Gong Felix-Gong force-pushed the riscv-atomic128-opt branch from 4fa5409 to d7841bf Compare June 25, 2026 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants