You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DRAGON is a device-cloud distributed RAG framework that enables a simultaneous integration of personalized information and generic knowledge.
Framework Overview
Class Diagram
Module List
Module
Description
File Path
Dragon
Top class that defines the framework
/dragon/dragon.py
Generator
A class that encapsulates the generation of text using a causal language model. It handles model initialization, tokenization, and the generation process.
/dragon/generator.py
Indexer
Handles indexing of documents for efficient retrieval.
/dragon/indexer.py
Retriever
Retrieves relevant documents based on the query.
/dragon/retriever.py
DraftQueue
Manages draft tokens during the generation process.
/dragon/queues.py
DraftItem
Represents a single draft token with metadata.
/dragon/queues.py
Decoder
Performs the decoding process for text generation.
/dragon/decoder.py
OfflineProfiler
Profiles the system performance for offline analysis.
/dragon/profiler.py
Transceiver
Handles communication between distributed components.
/dragon/transceiver.py
Aggregator
Aggregates results from multiple decoding processes.
/dragon/aggregator.py
Rag
Implements the retrieval-augmented generation logic.
We evaluated DRAGON on the development dataset of Natural Questions, with the number of retrieved documents set to 4.
Results
F1
EM
29.99
8.96
Given the same number of retrieved documents and an infinite context length, the distributed and centralized RAG produce identical results, confirming the correctness of our implementation. We adopt Hugging Face’s evaluation script from here. As this experiment uses a non-instruction-tuned model, the overall performance remains limited. We are currently conducting further experiments with microsoft/Phi-4-mini-instruct, deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, and Qwen/Qwen2.5-1.5B-Instruct. Preliminary results support the conclusions presented in our paper.
Outputs of DRDG/SW vs. DRDG/TW
Hyperparameters
Hyperparameter
Value
Description
retriever.s_passage
64
Number of words in a retrieved passage
text.with_title
true
Insert title before each passage
retriever.passages
Salesforce/wikitext,wikitext-2-raw-v1
Dataset as passage corpus
retriever.s_context
128
Maximum number of tokens allowed for passage contexts
retriever.n_docs
4
Number of retrieved passages
retriever.s_aggregate
4
Number of parallel decoding processes
generator.model
facebook/opt-1.3b
LLM
evaluator.max_new_tokens
100
Number of tokens in the output sequence
Results
query
output_seq
output_tok
Carbon dioxide is
a greenhouse gas is gas the Earth, viewed as a thatant global gas. is is viewed as a pollutant of to it is also viewed as a pollutant . Thus it is also viewed as a pollutant is Thus it is also . as a pollutant . is it is also viewed as a pollutant . that it is also viewed as a pollutant. gas it is also viewed as a pollutant . a it is also viewed a pollutant . is it is also viewed as
a greenhouse gas that is produced by the burning of fossil fuels. It is a major component of the atmosphere and is responsible for the greenhouse effect. It is also a greenhouse gas that is produced by the burning of fossil fuels. It is a major component of the atmosphere and is responsible for the greenhouse effect. It is also a greenhouse gas that is produced by the burning of fossil fuels. It is a major component of the atmosphere and is responsible for the greenhouse effect. It is also a greenhouse gas that
Ernest Miller Hemingway was
a in writerveston's 18 also a cited as. to of the investon acht Club, of he of also a themervestoningway Club who was a great writer theing he wason aacht man. He was a man who the a greaton, the he was also a great man of the was aon who was a great line of the novel was the a greaton Y line was a novel is was a first line of the novel is also a
a writer and poet who was born in Galveston, Texas in 1882. He was the son of a Methodist minister and a schoolteacher. He was educated at the University of Texas and the University of Texas at Austin. He was a member of the Sigma Phi Epsilon fraternity. He was a member of the Texas Writers Club and the Texas Writers Club. He was a member of the Texas Writers Club. He was a member of the Texas Writers Club. He was a member of the
Clearly, the outputs of the sequence-wise baseline are disordered, as it relies on post-hoc aggregation that disregards the sequential dependencies between tokens.
Correctness of Latency Profiling
We measure both the instantaneous online and offline decoding latencies. As shown in the figure above (from the device's perspective), the cloud-side latency remains relatively stable; hence, we directly use the instantaneous values. While the Least Squares Error (LSE) fitting algorithm is sensitive to outliers, DRAGON provides a more robust estimation. The y-axis intercept reflects the static system overhead, whereas the difference in slope represents the runtime decoding overhead—likely caused by CPU contention between decoding and other concurrent tasks (e.g., aggregation).
About
A device-cloud distributed RAG framework that enables a simultaneous integration of personalized information and generic knowledge