GitCortex is a next-generation, stateful AI research agent designed to revolutionize how developers explore, analyze, and understand the GitHub ecosystem. Built on a foundation of Hierarchical Multi-Agent Orchestration and Corrective RAG (CRAG), it transforms ambiguous inquiries into precise technical insights.
GitCortex operates as a distributed system, separating heavy-duty AI reasoning from a high-performance, responsive user interface.
The core logic resides in a stateful graph-based engine. Unlike traditional chatbots, GitCortex executes a multi-step research loop:
- Hierarchical Intent Routing: A "Guardian" node classifies queries into Greetings, Technical Explanations, or deep GitHub Research to optimize latency.
- Corrective RAG (CRAG): Every document is critiqued for relevance. If data is found to be irrelevant, the agent automatically pivots to Tavily Web Search or triggers a Query Rewriter.
- Hybrid Knowledge Retrieval: Leverages Weaviate for localized vector expertise and GitHub MCP for live, real-time repository data.
A premium dashboard built for technical exploration:
- Real-time Process Tracing: View the agent's "Processing Traces" to see the graph execution in real-time.
- Rich Markdown Rendering: Full support for syntax highlighting, tables, and complex documentation.
- Stateful Persistence: Deep integration with MongoDB for persistent conversation history and research threads.
- Identity Isolation: Salted & hashed authentication using
bcryptensures private workspaces for every researcher. - PII & Safety Guardrails: Integrated PIIMiddleware and Toxicity Filters screen every message before it reaches the reasoning engine.
No more repeating yourself. GitCortex remembers context across sessions.
- MongoDB Snapshots: Every step of the agent's reasoning is saved as a persistent snapshot.
- Stateful Resumption: Close your browser and pick up exactly where you left off.
Safety is integrated directly into the graph architecture.
- Audit Logging: Every safety check is tracked in the execution
stepsfor full observability. - Hallucination Shield: A final node verifies that the generation is strictly grounded in the retrieved facts.
Note: To view the visual Architecture Diagram for the Graph, please click on the
assets/directory in the repository root.
GitCortex is powered by a StateGraph built with LangGraph. Instead of a simple linear chain, it operates as a state machine where a central "State" object is passed and mutated through a series of discrete, rational nodes.
This architecture implements a strict Corrective Retrieval Augmented Generation (CRAG) workflow. This means the agent actively critiques its own findings and self-corrects before showing you the answer.
- State Initialization: The user's query and the
thread_idare loaded. MongoDB fetches the conversational history to establish context. - Contextualizer Node: If the query is a conversational follow-up (e.g., "how do I run it?"), an LLM actively rewrites the query into a standalone, explicit search term based on the chat history.
- Guardian Node (Safety & Routing): This acts as the system's firewall and traffic controller. It first runs strict Ethical AI guardrails (checking for PII, prompt injections, or toxicity). It then classifies the query:
- Greeting/Chatter: Routed directly to Generation for a fast, low-latency response.
- General Tech: Routed away from expensive GitHub calls to a general knowledge path.
- GitHub Research: Routed to the core CRAG pipeline for deep code analysis.
- Hybrid Retriever: The system queries two distinct sources simultaneously to build a comprehensive context window:
- Weaviate Vector DB: For semantic, localized domain expertise and embedded code snippets.
- GitHub MCP: Directly queries the live GitHub API for real-time repository data, PR statuses, and file contents.
- The Grader Node (Critique): This is the defining feature of Corrective RAG. An LLM acts as a strict evaluator, scoring every retrieved document with a binary "relevant" or "irrelevant" flag against the user's intent.
- Conditional Fallbacks (State Edges): Based on the Grader's score, the graph makes autonomous routing decisions:
- If documents are relevant, the state securely moves forward to Generation.
- If documents are missing or irrelevant, the Graph dynamically routes to a Web Search Fallback (using the Tavily API) to find external context, or triggers a Query Rewriter to optimize the internal search terms and loop back to the Retriever.
- Generator Node: An advanced LLM synthesizes the final answer using only the documents that successfully passed the Grader's critique phase. It meticulously formats the output into structured Markdown, complete with processing steps and source tracebacks.
- Hallucination Shield: Before the response is finalized and persisted to the MongoDB backend, a final verification node acts as an impartial fact-checker. It explicitly verifies that every technical claim in the generated answer is strictly grounded in the retrieved context. If an anomaly or hallucination is detected, the graph enforces a self-correction loop to force a safer, grounded generation.
.
βββ backend/ # FastAPI Web Layer
β βββ main.py # Entry Point (Asynchronous Server)
β βββ auth/ # JWT & Password Hashing Logic
β βββ routers/ # API Endpoints (Chat, Auth, Threads)
β βββ services/ # Graph Orchestration Services
βββ src/ # Core Intelligence Library (LangGraph)
β βββ graph/ # StateGraph & CRAG Logic Definitions
β βββ mcp/ # Hybrid Retrieval (GitHub & Weaviate)
β βββ nodes/ # Rational Reasoning Nodes (Safety, Grade, Retrieve)
β βββ prompts/ # System-Level LLM Instructions
βββ frontend/ # Next.js 16 Dashboard (Turbopack)
β βββ src/app/ # App Router (Pages, Layout, Auth)
β βββ src/components/ # Shared React UI & Chat Components
β βββ src/lib/ # API Integration Layer
βββ tests/ # Backend & Logic Test Suite
βββ assets/ # Architecture Diagrams & Media Assets
βββ README.md # Extensive Project Documentation
- Environment: Requires Python 3.13+.
- Install Dependencies:
pip install -r requirements.txt - Configuration: Populate
.envwith MongoDB, Weaviate, and GitHub API credentials. - Launch:
uvicorn backend.main:app --reload
- Environment: Requires Node.js 20+.
- Install Dependencies:
cd frontend && npm install - Launch:
npm run dev
Distributed under the MIT License.
Special thanks to:
- LangChain / LangGraph Teams: For providing state-of-the-art orchestration frameworks.
- CRAG Researchers (Yan et al.): For the "Corrective Retrieval Augmented Generation" methodology.
- Next.js & ShadCN UI: For the premium web foundations.
GitCortex: Transforming the open-source ecosystem into a searchable, actionable knowledge graph.