GitCortex: The Intelligent GitHub Research Agent 🧠🚀

GitCortex is a next-generation, stateful AI research agent designed to revolutionize how developers explore, analyze, and understand the GitHub ecosystem. Built on a foundation of Hierarchical Multi-Agent Orchestration and Corrective RAG (CRAG), it transforms ambiguous inquiries into precise technical insights.

🏛️ System Architecture

GitCortex operates as a distributed system, separating heavy-duty AI reasoning from a high-performance, responsive user interface.

1. Unified Intelligence Backend (FastAPI + LangGraph)

The core logic resides in a stateful graph-based engine. Unlike traditional chatbots, GitCortex executes a multi-step research loop:

Hierarchical Intent Routing: A "Guardian" node classifies queries into Greetings, Technical Explanations, or deep GitHub Research to optimize latency.
Corrective RAG (CRAG): Every document is critiqued for relevance. If data is found to be irrelevant, the agent automatically pivots to Tavily Web Search or triggers a Query Rewriter.
Hybrid Knowledge Retrieval: Leverages Weaviate for localized vector expertise and GitHub MCP for live, real-time repository data.

2. Modern Reactive Frontend (Next.js 16 + Radix UI)

A premium dashboard built for technical exploration:

Real-time Process Tracing: View the agent's "Processing Traces" to see the graph execution in real-time.
Rich Markdown Rendering: Full support for syntax highlighting, tables, and complex documentation.
Stateful Persistence: Deep integration with MongoDB for persistent conversation history and research threads.

🌟 Key Features

🔐 Multi-User Security

Identity Isolation: Salted & hashed authentication using bcrypt ensures private workspaces for every researcher.
PII & Safety Guardrails: Integrated PIIMiddleware and Toxicity Filters screen every message before it reaches the reasoning engine.

🧠 Persistent Conversation Memory

No more repeating yourself. GitCortex remembers context across sessions.

MongoDB Snapshots: Every step of the agent's reasoning is saved as a persistent snapshot.
Stateful Resumption: Close your browser and pick up exactly where you left off.

🛡️ Ethical AI Guardrails

Safety is integrated directly into the graph architecture.

Audit Logging: Every safety check is tracked in the execution steps for full observability.
Hallucination Shield: A final node verifies that the generation is strictly grounded in the retrieved facts.

🏗️ StateGraph Architecture & CRAG Logic

Note: To view the visual Architecture Diagram for the Graph, please click on the assets/ directory in the repository root.

GitCortex is powered by a StateGraph built with LangGraph. Instead of a simple linear chain, it operates as a state machine where a central "State" object is passed and mutated through a series of discrete, rational nodes.

This architecture implements a strict Corrective Retrieval Augmented Generation (CRAG) workflow. This means the agent actively critiques its own findings and self-corrects before showing you the answer.

1. The Pre-Processing Phase

State Initialization: The user's query and the thread_id are loaded. MongoDB fetches the conversational history to establish context.
Contextualizer Node: If the query is a conversational follow-up (e.g., "how do I run it?"), an LLM actively rewrites the query into a standalone, explicit search term based on the chat history.
Guardian Node (Safety & Routing): This acts as the system's firewall and traffic controller. It first runs strict Ethical AI guardrails (checking for PII, prompt injections, or toxicity). It then classifies the query:
- Greeting/Chatter: Routed directly to Generation for a fast, low-latency response.
- General Tech: Routed away from expensive GitHub calls to a general knowledge path.
- GitHub Research: Routed to the core CRAG pipeline for deep code analysis.

2. The Retrieval & Critique Phase (CRAG)

Hybrid Retriever: The system queries two distinct sources simultaneously to build a comprehensive context window:
- Weaviate Vector DB: For semantic, localized domain expertise and embedded code snippets.
- GitHub MCP: Directly queries the live GitHub API for real-time repository data, PR statuses, and file contents.
The Grader Node (Critique): This is the defining feature of Corrective RAG. An LLM acts as a strict evaluator, scoring every retrieved document with a binary "relevant" or "irrelevant" flag against the user's intent.
Conditional Fallbacks (State Edges): Based on the Grader's score, the graph makes autonomous routing decisions:
- If documents are relevant, the state securely moves forward to Generation.
- If documents are missing or irrelevant, the Graph dynamically routes to a Web Search Fallback (using the Tavily API) to find external context, or triggers a Query Rewriter to optimize the internal search terms and loop back to the Retriever.

3. The Grounded Generation Phase

Generator Node: An advanced LLM synthesizes the final answer using only the documents that successfully passed the Grader's critique phase. It meticulously formats the output into structured Markdown, complete with processing steps and source tracebacks.
Hallucination Shield: Before the response is finalized and persisted to the MongoDB backend, a final verification node acts as an impartial fact-checker. It explicitly verifies that every technical claim in the generated answer is strictly grounded in the retrieved context. If an anomaly or hallucination is detected, the graph enforces a self-correction loop to force a safer, grounded generation.

📂 Project Structure

.
├── backend/                # FastAPI Web Layer
│   ├── main.py             # Entry Point (Asynchronous Server)
│   ├── auth/               # JWT & Password Hashing Logic
│   ├── routers/            # API Endpoints (Chat, Auth, Threads)
│   └── services/           # Graph Orchestration Services
├── src/                    # Core Intelligence Library (LangGraph)
│   ├── graph/              # StateGraph & CRAG Logic Definitions
│   ├── mcp/                # Hybrid Retrieval (GitHub & Weaviate)
│   ├── nodes/              # Rational Reasoning Nodes (Safety, Grade, Retrieve)
│   └── prompts/            # System-Level LLM Instructions
├── frontend/               # Next.js 16 Dashboard (Turbopack)
│   ├── src/app/            # App Router (Pages, Layout, Auth)
│   ├── src/components/     # Shared React UI & Chat Components
│   └── src/lib/            # API Integration Layer
├── tests/                  # Backend & Logic Test Suite
├── assets/                 # Architecture Diagrams & Media Assets
└── README.md               # Extensive Project Documentation

🛠️ Setup & Installation

Backend (Python)

Environment: Requires Python 3.13+.
Install Dependencies: pip install -r requirements.txt
Configuration: Populate .env with MongoDB, Weaviate, and GitHub API credentials.
Launch: uvicorn backend.main:app --reload

Frontend (Node.js)

Environment: Requires Node.js 20+.
Install Dependencies: cd frontend && npm install
Launch: npm run dev

📜 License & Credits

Distributed under the MIT License.

Special thanks to:

LangChain / LangGraph Teams: For providing state-of-the-art orchestration frameworks.
CRAG Researchers (Yan et al.): For the "Corrective Retrieval Augmented Generation" methodology.
Next.js & ShadCN UI: For the premium web foundations.

GitCortex: Transforming the open-source ecosystem into a searchable, actionable knowledge graph.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
assets		assets
backend		backend
frontend		frontend
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitCortex: The Intelligent GitHub Research Agent 🧠🚀

🏛️ System Architecture

1. Unified Intelligence Backend (FastAPI + LangGraph)

2. Modern Reactive Frontend (Next.js 16 + Radix UI)

🌟 Key Features

🔐 Multi-User Security

🧠 Persistent Conversation Memory

🛡️ Ethical AI Guardrails

🏗️ StateGraph Architecture & CRAG Logic

1. The Pre-Processing Phase

2. The Retrieval & Critique Phase (CRAG)

3. The Grounded Generation Phase

📂 Project Structure

🛠️ Setup & Installation

Backend (Python)

Frontend (Node.js)

📜 License & Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GitCortex: The Intelligent GitHub Research Agent 🧠🚀

🏛️ System Architecture

1. Unified Intelligence Backend (FastAPI + LangGraph)

2. Modern Reactive Frontend (Next.js 16 + Radix UI)

🌟 Key Features

🔐 Multi-User Security

🧠 Persistent Conversation Memory

🛡️ Ethical AI Guardrails

🏗️ StateGraph Architecture & CRAG Logic

1. The Pre-Processing Phase

2. The Retrieval & Critique Phase (CRAG)

3. The Grounded Generation Phase

📂 Project Structure

🛠️ Setup & Installation

Backend (Python)

Frontend (Node.js)

📜 License & Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages