【WIP】feat: add onerec model implement[3/N] #998
【WIP】feat: add onerec model implement[3/N] #998DragonFive wants to merge 4 commits intojd-opensource:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements the onerec model, including the NPU-specific layer implementations and updates to the runtime and scheduler to support it. The changes are extensive, adding new model layers, updating data processing pipelines, and integrating the new model into the engine. However, two high-severity security vulnerabilities were identified: a buffer over-read in the LLMRec raw input processing due to missing validation of input embedding row lengths, and a potential cache key collision in the OneRec model's encoder output cache due to insecure string concatenation of user-supplied request IDs. Both issues could lead to information leakage or denial of service and should be addressed before merging. Additionally, a potential performance issue was found in the caching strategy for the encoder output, which could lead to cache thrashing.
205d7c4 to
67941b4
Compare
67941b4 to
c1fcd2e
Compare
This PR rebases OneRec support onto the current
mainbranch and brings theOneRec REC/NPU path to capability parity with the current codebase structure.
It is no longer just a registry skeleton PR. In addition to the foundation
contracts, this change wires OneRec through the REC runtime, connects the NPU
block-layer bridge, and aligns the integration with the current
xllm_atb_layerssubmodule-based dependency model.What's included
OneRec model foundation and registration
REGISTER_REC_MODELdefinition in
xllm/models/rec/onerec.hrouting through generic LLM registration only
REC runtime wiring for OneRec
RecWorkerImplOneRecWorkPipelinerequest preparation / execution pathrec_engine.cppfor the OneRec engine pipelinerec_master.cppto validate and parse OneRec-specificinput_tensors, including:sparse_embeddingdecoder_context_embeddingOneRec input/model contract extensions
ModelInputParams/ OneRec model input params for encoder/decoderhybrid inputs
onerec_batch_input_builder.cpprequest and later rounds may expand to
beam_widthbranchesNPU block-layer integration
NpuOneRecBlockLayerImplbridge and wire OneRec stacks to NPU ATBgraph construction
layers
instead of surfacing later as request-time crashes
Single-device runtime compatibility fixes
world_size=1local modeProcessGrouprank/world-sizeaccess used during model construction
Submodule-based
xllm_atb_layersalignmentxllm_atb_layers/...layout introduced on currentmainxllm_kernels/...include patternthird_party/xllm_atb_layerssubmodule to a OneRec-capablerevision
Targeted bring-up diagnostics
current rebaseline effort
Scope / Restrictions
main.integration points.
Backward Compatibility
llm/vlm/ embedding / dit model flows are kept intact.vocab_fileremains non-blockingwhen constrained decoding is disabled.
Testing
alignment
Roadmap / Follow-up
mainxllm_atb_layerssubmodule model