UnWeaving the knots of GraphRAG - turns out VectorRAG is almost enough

This repository contains the implementation for the paper "UnWeaving the knots of GraphRAG - turns out VectorRAG is almost enough". The project presents UnWeaver, a novel approach to Retrieval-Augmented Generation (RAG) that challenges the conventional wisdom of using graph-based knowledge representations.

Architecture of UnWeaver

The diagram above illustrates the flow of data through the UnWeaver system, showing how documents are processed, indexed, and retrieved for question answering.

Overview

The project consists of two main components:

UnWeaver: A RAG system that implements the novel approach described in the paper
Evaluation: A comprehensive evaluation framework for assessing RAG system performance

Project Structure

.
|
├── unweaver/               # UnWeaver RAG system implementation
├── evaluation/             # Evaluation framework
├── data_preprocessing/     # Data preprocessing tools
└── README.md               # This file

Installation

The project uses Poetry for dependency management. Make sure you have Poetry installed on your system.

Prerequisites

Python 3.9 or higher
Poetry
MongoDB (for LLM/Embedding caching if using cache)

Setup

Clone the repository:

git clone <repository-url>
cd unweaver_arxiv

Install dependencies for UnWeaver:

cd unweaver
poetry install

Install dependencies for Evaluation:

cd ../evaluation
poetry install

Usage

Getting data

To obtain the datasets used in the paper and preprocess them to a format digestible by the UnWeaver pipeline run data_preprocessing/run.sh script.

Running UnWeaver

The UnWeaver system can be run using the provided shell script or by executing the Python modules directly.

Using the run script (recommended)

The unweaver/run.sh script automates the indexing and querying process for all datasets:

cd unweaver
./run.sh

This script will:

Index the COVID-QA, E-Manual, and TechQA datasets
Query each dataset using the configured retrieval methods
Store results in the index_<dataset_name> directories

Manual execution

You can also run the indexing and querying steps manually:

Indexing:

cd unweaver
poetry run python -m unweaver.index \
  ../data/<dataset_name>/files_preprocessed/ \
  ./index_<dataset_name> \
  --config configs/custom.json

Querying:

cd unweaver
poetry run python -m unweaver.query \
  ../data/<dataset_name>/questions.json \
  ./index_<dataset_name> \
  --run_name <run_name> \
  --config configs/custom.json

Running Evaluation

To evaluate the results generated by UnWeaver:

cd evaluation
poetry run python -m evaluation \
  ../unweaver/index_<dataset_name> \
  --config configs/custom.json

The evaluation framework will:

Load query results from the specified working directory
Calculate metrics using RAGAS
Generate timing and token usage statistics
Log results to MLflow (if configured)

Configuration

Both UnWeaver and Evaluation use JSON configuration files to control their behavior:

UnWeaver config: unweaver/configs/custom.json
Evaluation config: evaluation/configs/custom.json

Key configuration parameters include:

LLM settings (model, API endpoints, timeouts)
Embedder settings (model, dimensions, batch size)
Retrieval parameters (top-k values, chunk sizes)
Evaluation metrics and MLflow tracking

See the individual READMEs in the unweaver/ and evaluation/ directories for detailed configuration options.

Datasets

The project includes three datasets for evaluation:

COVID-QA: Biomedical question answering dataset
E-Manual: Technical manual dataset
TechQA: Technical question answering dataset

Each dataset should be placed in the data/ directory with the following structure:

data/<dataset_name>/
├── questions.json          # Questions for evaluation
├── files/                  # Original documents
└── files_preprocessed/     # Preprocessed documents for indexing

Citation

If you use this code in your research, please cite our paper:

@article{unweaver2026,
  title={UnWeaving the knots of GraphRAG - turns out VectorRAG is almost enough},
  author={Ryszard Tuora, Mateusz Galiński, Michał Godziszewski, Michał Karpowicz, Mateusz Czyżnikiewicz, Adam Kozakiewicz, Tomasz Ziętkiewicz},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2026}
}

Contact

For questions or issues, please open an issue on the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data_preprocessing		data_preprocessing
evaluation		evaluation
figures		figures
unweaver		unweaver
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UnWeaving the knots of GraphRAG - turns out VectorRAG is almost enough

Architecture of UnWeaver

Overview

Project Structure

Installation

Prerequisites

Setup

Usage

Getting data

Running UnWeaver

Using the run script (recommended)

Manual execution

Running Evaluation

Configuration

Datasets

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

SamsungLabs/UnWeaver

Folders and files

Latest commit

History

Repository files navigation

UnWeaving the knots of GraphRAG - turns out VectorRAG is almost enough

Architecture of UnWeaver

Overview

Project Structure

Installation

Prerequisites

Setup

Usage

Getting data

Running UnWeaver

Using the run script (recommended)

Manual execution

Running Evaluation

Configuration

Datasets

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages