Skip to content

VickM12/knowledge-graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Knowledge Graph from CSV for RAG

This project converts structured CSV data into a knowledge graph and builds a vector index over node descriptions for Retrieval-Augmented Generation (RAG).

Features

  • Auto-infers simple relations from columns ending with _id (excluding the primary id column)
  • Builds a typed multi-directed graph (nodes/edges) using NetworkX
  • Generates text representations of nodes and embeds them with sentence-transformers
  • Provides a lightweight retriever (nearest neighbors) over node embeddings
  • Optional FastAPI service for querying

Quickstart

  1. Create and activate a virtual environment (Windows PowerShell):
python -m venv .venv
. .venv/Scripts/Activate.ps1
pip install -r requirements.txt
  1. Try the example dataset:
python -m knowledge_graph.cli build-kg --csv .\examples\employees.csv --entity-type Employee --id-column id --output-dir .\artifacts
python -m knowledge_graph.cli build-index --graph-json .\artifacts\graph.json --output-dir .\artifacts --model all-MiniLM-L6-v2
python -m knowledge_graph.cli query --index .\artifacts --graph .\artifacts\graph.json --q "Who manages Alice?" --k 5
  1. Optional: Run the API
uvicorn knowledge_graph.app:app --reload

Then POST a query to http://127.0.0.1:8000/query with body:

{"query": "Who manages Alice?", "k": 5}

How it works

  • The builder creates one node per CSV row with type --entity-type and key from --id-column.
  • For any other column that ends with _id, an edge is created from the row's node to a target node with the same type (unless configured otherwise). If the target node doesn't appear as a row, it is created as a stub node so the edge is still valid.
  • All other columns are kept as node attributes.
  • A node text description is composed from its attributes and neighbors and embedded with sentence-transformers for retrieval.

CLI

  • build-kg: Build a graph from a CSV.
  • build-index: Build embeddings/index from a graph JSON.
  • query: Query the index and return the top-k nodes and a small subgraph context for RAG.

Run python -m knowledge_graph.cli --help for full options.

Inputs/Outputs

  • Inputs: CSV with a header row. Required: one id-like column (e.g., id). Optional: relation columns *_id.
  • Outputs:
    • artifacts/graph.json: nodes/edges JSON
    • artifacts/node_texts.jsonl: node textual representations
    • artifacts/vectors.npy: node embeddings
    • artifacts/metadata.json: node ids, mapping, and model info

Notes

  • You can pass --text-columns to explicitly indicate which columns form the node description; otherwise all non-id columns are used.
  • For multi-table setups (multiple CSVs), run build-kg per CSV and then merge graphs (future enhancement). For now, the autoinference focuses on single-table exports with foreign-key-like columns.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages