Skip to content

【WIP】feat: add onerec model implement[3/N] #998

Closed
DragonFive wants to merge 4 commits intojd-opensource:mainfrom
DragonFive:onerec/pr5-v1126-capability-parity
Closed

【WIP】feat: add onerec model implement[3/N] #998
DragonFive wants to merge 4 commits intojd-opensource:mainfrom
DragonFive:onerec/pr5-v1126-capability-parity

Conversation

@DragonFive
Copy link
Copy Markdown
Collaborator

@DragonFive DragonFive commented Mar 4, 2026

This PR rebases OneRec support onto the current main branch and brings the
OneRec REC/NPU path to capability parity with the current codebase structure.

It is no longer just a registry skeleton PR. In addition to the foundation
contracts, this change wires OneRec through the REC runtime, connects the NPU
block-layer bridge, and aligns the integration with the current
xllm_atb_layers submodule-based dependency model.

What's included

  • OneRec model foundation and registration

    • Add OneRec REC model registration through REGISTER_REC_MODEL
    • Co-locate OneRec model args / tokenizer args registration with the model
      definition in xllm/models/rec/onerec.h
    • Keep OneRec-specific model contracts in the REC model path instead of
      routing through generic LLM registration only
  • REC runtime wiring for OneRec

    • Wire OneRec creation through RecWorkerImpl
    • Add OneRecWorkPipeline request preparation / execution path
    • Extend rec_engine.cpp for the OneRec engine pipeline
    • Extend rec_master.cpp to validate and parse OneRec-specific
      input_tensors, including:
      • sparse_embedding
      • optional decoder_context_embedding
  • OneRec input/model contract extensions

    • Extend ModelInputParams / OneRec model input params for encoder/decoder
      hybrid inputs
    • Add OneRec-specific batch input building in
      onerec_batch_input_builder.cpp
    • Preserve the OneRec request semantics where the first round is a single
      request and later rounds may expand to beam_width branches
  • NPU block-layer integration

    • Add the NpuOneRecBlockLayerImpl bridge and wire OneRec stacks to NPU ATB
      graph construction
    • Add weight loading / merge / init-layer logic for OneRec encoder/decoder
      layers
    • Cover the decoder MoE path required by the current OneRec model layout
    • Add fail-fast guarding so block-layer graph init failures stop model load
      instead of surfacing later as request-time crashes
  • Single-device runtime compatibility fixes

    • Keep valid TP metadata for OneRec even in world_size=1 local mode
    • Add safe metadata fallback for placeholder ProcessGroup rank/world-size
      access used during model construction
  • Submodule-based xllm_atb_layers alignment

    • Align OneRec NPU includes with the submodule-based
      xllm_atb_layers/... layout introduced on current main
    • Stop relying on the historical xllm_kernels/... include pattern
    • Update the third_party/xllm_atb_layers submodule to a OneRec-capable
      revision
  • Targeted bring-up diagnostics

    • Add focused logs around:
      • OneRec worker/model initialization
      • block-layer init / merge
      • request-stage boundaries in the OneRec pipeline
    • These logs are intended to improve bring-up/debug efficiency for the
      current rebaseline effort

Scope / Restrictions

  • This PR focuses on bringing OneRec onto current main.
  • The scope is limited to the OneRec REC/NPU path and its required
    integration points.
  • This PR does not claim full production-ready end-to-end validation yet.
  • Existing non-OneRec backends are not intentionally refactored in this PR.

Backward Compatibility

  • Existing llm / vlm / embedding / dit model flows are kept intact.
  • Existing backend registration paths remain available.
  • Non-OneRec runtime paths are not intended to change behavior.
  • For OneRec constrained decoding, missing vocab_file remains non-blocking
    when constrained decoding is disabled.

Testing

  • Static registration / wiring checks for the OneRec REC path
  • Source-level review of OneRec REC runtime integration
  • Source-level review of NPU block-layer wiring and submodule include
    alignment
  • Failure-path hardening for OneRec block-layer init
  • Full clean build validation in the target NPU environment
  • End-to-end OneRec inference validation on the target machine

Roadmap / Follow-up

  • Rebaseline OneRec model foundation on top of current main
  • Wire OneRec through the REC runtime path
  • Add OneRec NPU block-layer bridge
  • Align OneRec with the xllm_atb_layers submodule model
  • Finish target-environment compile/runtime validation
  • Trim temporary bring-up diagnostics after validation is stable

@DragonFive DragonFive changed the title feat: add onerec model implement[3/N] [WIP] feat: add onerec model implement[3/N] Mar 4, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the onerec model, including the NPU-specific layer implementations and updates to the runtime and scheduler to support it. The changes are extensive, adding new model layers, updating data processing pipelines, and integrating the new model into the engine. However, two high-severity security vulnerabilities were identified: a buffer over-read in the LLMRec raw input processing due to missing validation of input embedding row lengths, and a potential cache key collision in the OneRec model's encoder output cache due to insecure string concatenation of user-supplied request IDs. Both issues could lead to information leakage or denial of service and should be addressed before merging. Additionally, a potential performance issue was found in the caching strategy for the encoder output, which could lead to cache thrashing.

Comment thread xllm/models/rec/onerec.h Outdated
Comment thread xllm/models/rec/onerec.h Outdated
@DragonFive DragonFive force-pushed the onerec/pr5-v1126-capability-parity branch 8 times, most recently from 205d7c4 to 67941b4 Compare March 11, 2026 07:02
@DragonFive DragonFive force-pushed the onerec/pr5-v1126-capability-parity branch from 67941b4 to c1fcd2e Compare March 13, 2026 02:01
@DragonFive DragonFive changed the title [WIP] feat: add onerec model implement[3/N] feat: add onerec model implement[3/N] Mar 13, 2026
@DragonFive DragonFive changed the title feat: add onerec model implement[3/N] 【WIP】feat: add onerec model implement[3/N] Mar 13, 2026
@DragonFive DragonFive closed this Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant