GitHub - MartinBoeckling/rdf2vecgpu: GPU-Accelerated RDF2Vec – A high-performance GPU implementation of RDF2Vec that harnesses CUDA and RAPIDS to generate scalable, embeddings for large dense knowledge graphs.

gpuRDF2Vec

A scalable GPU based implementation of RDF2Vec embeddings for large and dense Knowledge Graphs.

Important

This package is under active development in the beta phase. The overall class/ method design will most probably change and introduce breaking changes between releases

Package installation

Install the package rdf2vecgpu by running the following command:

pip install rdf2vecgpu

Important

Make sure to install the accompanying cuda version as outlined in the following section

Repository setup

The repository setup builds on top of two major libraries. Both Pytorch lightning as well as the RAPIDS libraries cuDF and cuGraph. We provide the exeplanatory installation details for Cuda 12.6:

Pytorch installation page Cuda 12.6 installation

pip install torch torchvision torchaudio

Detailed cudf installation instruction here. Cudf Cuda 12 install

pip install \
    --extra-index-url=https://pypi.nvidia.com \
    "cudf-cu12==25.4.*" "dask-cudf-cu12==25.4.*" \
    "cugraph-cu12==25.4.*" "nx-cugraph-cu12==25.4.*" \
    "nx-cugraph-cu12==25.4.*"

The requirement files and conda environment files can be found here:

gpuRDF2Vec overview

RDF2Vec is a powerful technique to generate vector embeddings of entities in RDF graphs via random walks and Word2Vec. This repository provides a GPU-optimized reimplementation, enabling:

Speedups on dense graphs with millions of nodes
Scalability to industrial-scale knowledge bases
Reproducible experiments to test and qualify the overall implementation details

Repository Structure

.
├── README.md
├── data
├── data_preparation
│   ├── converstion_to_ttl.py
│   └── merge_text_file.py
├── img
│   └── github_repo_header.png
├── jrdf2vec-1.3-SNAPSHOT.jar
├── performance
│   ├── env_files
│   │   ├── jrdf2vec_environment.yml
│   │   ├── jrdf2vec_requirements.txt
│   │   ├── pyrdf2vec_environment.yml
│   │   ├── pyrdf2vec_requirements.txt
│   │   ├── rdf2vecgpu_environment.yml
│   │   ├── rdf2vecgpu_requirements.txt
│   │   ├── sparkrdf2vec_environment.yml
│   │   └── sparkrdf2vec_requirements.txt
│   ├── evaluation_parameters.py
│   ├── gpu_rdf2vec_performance.py
│   ├── graph_creation.py
│   ├── graph_statistics.py
│   ├── jrdf2vec_based_performance.py
│   ├── pyrdf2vec_based_performance.py
│   ├── spark_rdf2vec_performance.py
│   └── wandb_analysis.py
├── src
│   ├── __init__.py
│   ├── corpus
│   │   ├── __init__.py
│   │   └── walk_corpus.py
│   ├── cpu_based_rdf2vec_approach.py
│   ├── embedders
│   │   ├── __init__.py
│   │   ├── word2vec.py
│   │   └── word2vec_loader.py
│   ├── gpu_rdf2vec.py
│   ├── helper
│   │   ├── __init__.py
│   │   └── functions.py
│   └── reader
│       ├── __init__.py
│       └── kg_reader.py
└── test
    ├── helper
    └── reader
        ├── functions_test.py
        └── kg_reader_test.py

Capability overview

GPU-backed walk generation over CUDA Kernels
Batched Word2Vec training with Pytorch lightning
Pluggable rdf loaders and parquet, csv, txt integration
Performance comparison can be found in the following folder

Quick start

from rdf2vecgpu import GPU_RDF2Vec, RDF2VecConfig

# Bundle all hyperparameters in a config object
config = RDF2VecConfig(
    walk_strategy="random",
    walk_depth=4,
    walk_number=100,
    embedding_model="skipgram",
    epochs=5,
    batch_size=None,
    vector_size=100,
    window_size=5,
    min_count=1,
    learning_rate=0.01,
    negative_samples=5,
    random_state=42,
    reproducible=False,
    multi_gpu=False,
    generate_artifact=False,
    cpu_count=20,
)

# Instantiate the pipeline
gpu_rdf2vec_model = GPU_RDF2Vec(config=config)

# Path to the triple dataset
path = "data/wikidata5m/wikidata5m_kg.parquet"

# Read data and receive edge data
edge_data = gpu_rdf2vec_model.read_data(path)

# Fit the Word2Vec model and transform the dataset to an embedding
embeddings = gpu_rdf2vec_model.fit_transform(edge_df=edge_data, walk_vertices=None)

# Write embedding to file format. Return format is a cuDF dataframe
embeddings.to_parquet("data/wikidata5m/wikidata5m_embeddings.parquet", index=False)

Supported file formats:
- .csv
- .parquet
- .orc
- .nt, .nq
- All supported RDFlib file formats
Core RDF2VecConfig parameters (see Configuration reference for the full list):
- walk_strategy: ["random", "bfs"]
- walk_depth: int
- walk_number: int
- walk_weighted: bool (uses cuGraph biased random walks; requires a weights column)
- embedding_model: ["skipgram", "cbow"]
- epochs: int
- batch_size: int | None — if None, a heuristic batch size is picked based on the data loader and the available GPU memory
- vector_size: int
- window_size: int
- min_count: int
- learning_rate: float
- negative_samples: int
- random_state: int
- reproducible: bool
- multi_gpu: bool
- generate_artifact: bool
- cpu_count: int
- literal_predicates, literal_strategy, literal_n_bins, literal_bin_strategy — see the literal handling section of the docs
- tracker: ["none", "mlflow", "wandb"] — pluggable experiment tracking backend

Optional extras

The experiment tracking backends and the test suite are opt-in:

pip install "rdf2vecgpu[mlflow]"
pip install "rdf2vecgpu[wandb]"
pip install "rdf2vecgpu[test]"

Implementation Details

We achieve order-of-magnitude for large and dense graphs over CPU-bound RDF2Vec by engineering both the walk extraction and the Word2Vec training pipelines:

GPU-Native Walk Extraction
- All random-walk and BFS operations leverage cuDF/cuGraph kernels to avoid CPU–GPU data transfers and minimize latency.
- To generate k walks per node in one pass, we replicate node indices in a single cuDF DataFrame rather than looping—fully utilizing GPU parallelism and eliminating Python-loop overhead (∼15× speedup).
- BFS walks currently use GPU-side recursive joins; future work will reconstruct walks entirely in CUDA to remove join overhead.
cuDF→PyTorch Lightning Handoff
- Replaced Lightning’s default CPU-based DataLoader with a cuDF-backed pipeline: context/center columns live on GPU as DLPack tensors.
- Initial deep-copy loads incur extra VRAM, but thereafter all sampling/preprocessing occurs on-device, eliminating PCIe stalls.
- An “index-only” strategy (workers pull tensor indices instead of slices) uses CUDA’s pointer arithmetic for constant-time access, collapsing DataLoader overhead from ~85% of epoch time to near parity with model compute.
Optimized Word2Vec Training
- Batch-Size Heuristic: Estimate per-sample GPU footprint from cuDF loader, then set initial batch = (total VRAM) / (4 × footprint). This “divide-by-four” rule quickly homes in on a viable batch size, reducing tuning runs.
- Kernel Fusion: All sampling and tensor transforms migrated into PyTorch’s C++ back end, removing Python loops and the GIL, for consistent high throughput.
Scalable Data-Parallel Training
- We use PyTorch Distributed + NCCL: each GPU holds the same graph shard but a unique walk corpus.
- Gradients are synchronized via all_reduce at regular intervals (~500 ms), amortizing PCIe/NVLink costs and ensuring linear scaling across nodes.

License

The overview of the used MIT license can be found here

Roadmap

Order aware Word2Vec following the details of Ling, Wang, et al. "Two/too simple adaptations of word2vec for syntax problems.. Issue item
Provide spilling to single GPU training to work around potential OOM issues faced during rdf2vec training Issue Item
Provide weighted walks for spatial datasets Issue item
Provide logging capabilities of complete Word2Vec pipeline for Wandb and mlflow. Issue item

Report issues and bugs

In case you have found a bug or unexpected behaviour, please reach out by opening an issue:

When opening an issue, please tag the issue with the label Bug. Please include the following information:
- Environment: OS, Python/CUDA/PyTorch/RAPIDS versions (cuDF, cuGraph)
- Reproduction steps: Exact commands or small code snippet
- Input data graph format & size (attach a minimal sample if possible)
- Observed vs. expected behavior
- Error messages/ stack traces (copy-paste or attach logs)
We aim to respond to open issues within 3 business days
If you have identified a fix, fork the repo, branch off main, implement & test then open a PR referencing the issue.

Citation

If you use gpuRDF2Vec in your research, please cite the following paper:

@InProceedings{10.1007/978-3-032-09530-5_14,
  author="B{\"o}ckling, Martin and Paulheim, Heiko",
  editor="Garijo, Daniel
  and Kirrane, Sabrina
  and Salatino, Angelo
  and Shimizu, Cogan
  and Acosta, Maribel
  and Nuzzolese, Andrea Giovanni
  and Ferrada, Sebasti{\'a}n
  and Soulard, Thibaut
  and Kozaki, Kouji
  and Takeda, Hideaki
  and Gentile, Anna Lisa",
  title="gpuRDF2vec -- Scalable GPU-Based RDF2vec",
  booktitle="The Semantic Web -- ISWC 2025",
  year="2026",
  publisher="Springer Nature Switzerland",
  address="Cham",
  pages="240--257",
  isbn="978-3-032-09530-5"
}

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
data		data
data_preparation		data_preparation
docs		docs
img		img
performance		performance
src/rdf2vecgpu		src/rdf2vecgpu
test		test
word2idx.parquet		word2idx.parquet
.gitignore		.gitignore
.python-version		.python-version
.readthedocs.yml		.readthedocs.yml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
doap.rdf		doap.rdf
jrdf2vec-1.3-SNAPSHOT.jar		jrdf2vec-1.3-SNAPSHOT.jar
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gpuRDF2Vec

Table of contents

Package installation

Repository setup

gpuRDF2Vec overview

Repository Structure

Capability overview

Quick start

Optional extras

Implementation Details

License

Roadmap

Report issues and bugs

Citation

About

Uh oh!

Releases 6

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gpuRDF2Vec

Table of contents

Package installation

Repository setup

gpuRDF2Vec overview

Repository Structure

Capability overview

Quick start

Optional extras

Implementation Details

License

Roadmap

Report issues and bugs

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages