CGEx is an interactive, explainable AI system that translates natural‑language biomedical questions into Cypher queries and executes them over Neo4j knowledge graphs. It is designed for COVID‑19 and neurodegenerative disease (NDD) research, enabling users to explore evidence across multiple knowledge graphs without writing Cypher.
- Natural Language → Cypher using LLMs (LangChain + OpenAI)
- Multi‑Knowledge‑Graph support (KG selector in UI)
- Dynamic schema extraction from Neo4j to constrain query generation
- Interactive Dash UI with results, explanations, and graphs
- Solution subgraph visualization using Dash Cytoscape
- Explainability (XAI): inspect prompts, queries, and evidence
- Human‑in‑the‑loop feedback (approve / disapprove generated queries)
CGEx/
│
├── cgex.py # Main CGEx pipeline
├── cypher_examples.json # Few‑shot examples
├── requirements.txt # Python dependencies
├── README.md # Project documentation
├── LICENSE # License file
├── .gitignore # Git ignore
- Python 3.9+
- Neo4j (Aura or local)
- OpenAI API key
Create and activate a virtual environment to avoid dependency conflicts.
python -m venv .cgex # create venv
.cgex\Scripts\activate # activate venv on Windows
source .cgex/bin/activate # activate venv on linuxInstall dependencies:
pip install -r requirements.txtCGEx queries two Neo4j knowledge graphs that are maintained in separate repositories.
https://github.com/SCAI-BIO/CBM-Comorbidity-KG
Setup steps:
- Create a Neo4j instance (Neo4j Desktop or Neo4j Aura).
- Open Neo4j Browser.
- Execute the Cypher import scripts provided in the repository above.
- Verify that nodes and relationships are loaded.
- Record the Neo4j connection details (URI, username, password).
This knowledge graph is already hosted as a Neo4j Aura database and is shared:
https://github.com/SCAI-BIO/covid-NDD-comorbidity-NLP/blob/main/src/comorbidity-hypothesis-db.py
Important notes:
- The Python script does not create or populate the knowledge graph.
- It simply opens Neo4j Browser using existing credentials.
- Users may either:
- run the script provided in the original repository, or
- directly access Neo4j Browser using the credentials listed there.
No additional data import is required for this graph.
After setting up access to both knowledge graphs, add their connection details to
a .env file in the root of this repository:
# OpenAI
OPENAI_API_KEY=your_openai_key
# Knowledge Graph 1 (CBM KG)
NEO4J_URI=bolt+s://...
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=...
NEO4J_HTTP_URI=https://...
# Knowledge Graph 2 (Hypothesis KG)
NEO4J_URI_2=bolt+s://...
NEO4J_USERNAME_2=neo4j
NEO4J_PASSWORD_2=...
NEO4J_HTTP_URI_2=https://...
Start the Dash application:
python cgex.pyThen open your browser at:
http://127.0.0.1:8050
- User asks a question (e.g., "What is the relationship between COVID‑19 and Alzheimer’s disease?") and selects a KG.
- CGEx extracts the KG schema dynamically
- The LLM generates a schema‑constrained Cypher query
- The query is executed on the selected KG
- Results are returned as:
- Raw query output
- Natural‑language explanation
- Interactive solution subgraph with edge evidence.
- The user can inspect the prompt, approve/disapprove the query, and explore relationship evidence.
CGEx exposes:
- The exact prompt sent to the LLM
- The generated Cypher query
- Edge‑level evidence (PMID, citation, source)
- Graph structure identical to Neo4j Browser (nodes + relationships)
This makes CGEx suitable for research, clinical exploration, and hypothesis generation.
For any questions, suggestions, or collaborations, please contact:
Astha Anand
Email: astha.anand@scai.fraunhofer.de