Skip to content

[Enhance]: Extend RVV acceleration coverage in ailego compute kernels #357

@ihb2032

Description

@ihb2032

Affected Component

ailego compute kernels, especially the single-compute and batch-compute operator paths under the distance computation stack.

Current Behavior

On zvec v0.2.1, our RVV optimization only covers part of the compute layer in ailego, specifically:

  • batch compute operators
  • single compute operators

So the current RVV work is effective but still partial. It accelerates important hot paths, but it is not yet a broader and more systematic RVV backend for the distance-computation stack used by FLAT/HNSW-related search flows.

Test environment:

  • Hardware: SpacemiT K1 Muse Pi Pro
  • Memory/Storage: 16GB + 128GB
  • Dataset: cohere1m
  • zvec version: 0.2.1

Based on our benchmark results, the current partial RVV optimization already brings substantial single-thread improvements on cohere1m:

FLAT, TopK=100

  • FP16:
    • Recall: 0.99760 -> 0.99324
    • QPS: 0.05 -> 1.66
    • P99 latency: 19246.82 ms -> 615.80 ms
  • FP32:
    • Recall: 0.99998 -> 0.99999
    • QPS: 0.35 -> 1.02
    • P99 latency: 2860.69 ms -> 988.96 ms
  • INT8:
    • Recall: 0.95123 -> 0.95123
    • QPS: 0.21 -> 1.98
    • P99 latency: 4764.74 ms -> 510.42 ms
  • Refiner:
    • Recall: 0.95123 -> 0.95123
    • QPS: 0.22 -> 1.94
    • P99 latency: 4652.24 ms -> 520.07 ms

HNSW, TopK=100

  • FP16:
    • Recall: 0.93484 -> 0.93478
    • QPS: 15.22 -> 58.52
    • P99 latency: 88.31 ms -> 21.37 ms
  • FP32:
    • Recall: 0.93505 -> 0.93493
    • QPS: 45.09 -> 53.79
    • P99 latency: 27.76 ms -> 23.33 ms
  • INT8:
    • Recall: 0.90944 -> 0.90961
    • QPS: 38.00 -> 65.56
    • P99 latency: 33.30 ms -> 18.94 ms
  • Refiner:
    • Recall: 0.93385 -> 0.93417
    • QPS: 36.04 -> 61.54
    • P99 latency: 34.69 ms -> 19.57 ms

Some representative observations from the current data:

  • FLAT FP16 gets the largest speedup ~31.4x QPS, but with a small recall drop at TopK=100.
  • FLAT FP32/INT8/Refiner show large speedups while recall stays effectively unchanged in our current tests.
  • HNSW FP16/FP32 show clear latency and throughput improvement with only very small recall differences.
  • HNSW INT8/Refiner improve both throughput and tail latency, while recall is slightly improved in the current benchmark.

These results show that RVV optimization in ailego is already valuable, but the current coverage is still limited to only part of the operator set.

Desired Improvement

Extend RVV support in ailego from the currently optimized single-compute and batch-compute operators to a broader, more systematic implementation.

Suggested improvements:

  1. Expand RVV coverage to more hot distance-computation paths beyond the currently optimized operators.
  2. Document which ailego operators are RVV-accelerated and which still fall back to scalar implementations.
  3. Add validation for both performance and recall so RVV improvements can be evaluated not only by QPS/latency, but also by result quality.

From our current results, there is already a strong case for extending RVV coverage: even partial optimization produces major gains in several workloads.

Impact

This enhancement would improve zvec’s usability and performance on RISC-V platforms with RVV support.

Expected benefits include:

  • much better throughput on RVV-capable devices
  • significantly lower tail latency for FLAT and HNSW workloads
  • better out-of-the-box performance on RISC-V environments
  • easier extension of RVV support to additional kernels over time

Our current cohere1m benchmark on SpacemiT K1 Muse Pi Pro already shows that even partial RVV optimization can deliver substantial gains:

At the same time, the current data also shows that recall behavior should remain part of the evaluation criteria, since different data formats and index paths may respond differently to RVV optimization. Broadening coverage together with recall/performance regression checks would make the improvement more complete and safer for upstream users.

Metadata

Metadata

Assignees

Labels

enhancementImprove an existing feature or component

Type

No type
No fields configured for issues without a type.

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions