Skip to content

iSathyam31/GitCortex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GitCortex: The Intelligent GitHub Research Agent πŸ§ πŸš€

GitCortex is a next-generation, stateful AI research agent designed to revolutionize how developers explore, analyze, and understand the GitHub ecosystem. Built on a foundation of Hierarchical Multi-Agent Orchestration and Corrective RAG (CRAG), it transforms ambiguous inquiries into precise technical insights.

Python Next.js FastAPI LangGraph LangChain MongoDB


πŸ›οΈ System Architecture

GitCortex operates as a distributed system, separating heavy-duty AI reasoning from a high-performance, responsive user interface.

1. Unified Intelligence Backend (FastAPI + LangGraph)

The core logic resides in a stateful graph-based engine. Unlike traditional chatbots, GitCortex executes a multi-step research loop:

  • Hierarchical Intent Routing: A "Guardian" node classifies queries into Greetings, Technical Explanations, or deep GitHub Research to optimize latency.
  • Corrective RAG (CRAG): Every document is critiqued for relevance. If data is found to be irrelevant, the agent automatically pivots to Tavily Web Search or triggers a Query Rewriter.
  • Hybrid Knowledge Retrieval: Leverages Weaviate for localized vector expertise and GitHub MCP for live, real-time repository data.

2. Modern Reactive Frontend (Next.js 16 + Radix UI)

A premium dashboard built for technical exploration:

  • Real-time Process Tracing: View the agent's "Processing Traces" to see the graph execution in real-time.
  • Rich Markdown Rendering: Full support for syntax highlighting, tables, and complex documentation.
  • Stateful Persistence: Deep integration with MongoDB for persistent conversation history and research threads.

🌟 Key Features

πŸ” Multi-User Security

  • Identity Isolation: Salted & hashed authentication using bcrypt ensures private workspaces for every researcher.
  • PII & Safety Guardrails: Integrated PIIMiddleware and Toxicity Filters screen every message before it reaches the reasoning engine.

🧠 Persistent Conversation Memory

No more repeating yourself. GitCortex remembers context across sessions.

  • MongoDB Snapshots: Every step of the agent's reasoning is saved as a persistent snapshot.
  • Stateful Resumption: Close your browser and pick up exactly where you left off.

πŸ›‘οΈ Ethical AI Guardrails

Safety is integrated directly into the graph architecture.

  • Audit Logging: Every safety check is tracked in the execution steps for full observability.
  • Hallucination Shield: A final node verifies that the generation is strictly grounded in the retrieved facts.

πŸ—οΈ StateGraph Architecture & CRAG Logic

Note: To view the visual Architecture Diagram for the Graph, please click on the assets/ directory in the repository root.

GitCortex is powered by a StateGraph built with LangGraph. Instead of a simple linear chain, it operates as a state machine where a central "State" object is passed and mutated through a series of discrete, rational nodes.

This architecture implements a strict Corrective Retrieval Augmented Generation (CRAG) workflow. This means the agent actively critiques its own findings and self-corrects before showing you the answer.

1. The Pre-Processing Phase

  • State Initialization: The user's query and the thread_id are loaded. MongoDB fetches the conversational history to establish context.
  • Contextualizer Node: If the query is a conversational follow-up (e.g., "how do I run it?"), an LLM actively rewrites the query into a standalone, explicit search term based on the chat history.
  • Guardian Node (Safety & Routing): This acts as the system's firewall and traffic controller. It first runs strict Ethical AI guardrails (checking for PII, prompt injections, or toxicity). It then classifies the query:
    • Greeting/Chatter: Routed directly to Generation for a fast, low-latency response.
    • General Tech: Routed away from expensive GitHub calls to a general knowledge path.
    • GitHub Research: Routed to the core CRAG pipeline for deep code analysis.

2. The Retrieval & Critique Phase (CRAG)

  • Hybrid Retriever: The system queries two distinct sources simultaneously to build a comprehensive context window:
    • Weaviate Vector DB: For semantic, localized domain expertise and embedded code snippets.
    • GitHub MCP: Directly queries the live GitHub API for real-time repository data, PR statuses, and file contents.
  • The Grader Node (Critique): This is the defining feature of Corrective RAG. An LLM acts as a strict evaluator, scoring every retrieved document with a binary "relevant" or "irrelevant" flag against the user's intent.
  • Conditional Fallbacks (State Edges): Based on the Grader's score, the graph makes autonomous routing decisions:
    • If documents are relevant, the state securely moves forward to Generation.
    • If documents are missing or irrelevant, the Graph dynamically routes to a Web Search Fallback (using the Tavily API) to find external context, or triggers a Query Rewriter to optimize the internal search terms and loop back to the Retriever.

3. The Grounded Generation Phase

  • Generator Node: An advanced LLM synthesizes the final answer using only the documents that successfully passed the Grader's critique phase. It meticulously formats the output into structured Markdown, complete with processing steps and source tracebacks.
  • Hallucination Shield: Before the response is finalized and persisted to the MongoDB backend, a final verification node acts as an impartial fact-checker. It explicitly verifies that every technical claim in the generated answer is strictly grounded in the retrieved context. If an anomaly or hallucination is detected, the graph enforces a self-correction loop to force a safer, grounded generation.

πŸ“‚ Project Structure

.
β”œβ”€β”€ backend/                # FastAPI Web Layer
β”‚   β”œβ”€β”€ main.py             # Entry Point (Asynchronous Server)
β”‚   β”œβ”€β”€ auth/               # JWT & Password Hashing Logic
β”‚   β”œβ”€β”€ routers/            # API Endpoints (Chat, Auth, Threads)
β”‚   └── services/           # Graph Orchestration Services
β”œβ”€β”€ src/                    # Core Intelligence Library (LangGraph)
β”‚   β”œβ”€β”€ graph/              # StateGraph & CRAG Logic Definitions
β”‚   β”œβ”€β”€ mcp/                # Hybrid Retrieval (GitHub & Weaviate)
β”‚   β”œβ”€β”€ nodes/              # Rational Reasoning Nodes (Safety, Grade, Retrieve)
β”‚   └── prompts/            # System-Level LLM Instructions
β”œβ”€β”€ frontend/               # Next.js 16 Dashboard (Turbopack)
β”‚   β”œβ”€β”€ src/app/            # App Router (Pages, Layout, Auth)
β”‚   β”œβ”€β”€ src/components/     # Shared React UI & Chat Components
β”‚   └── src/lib/            # API Integration Layer
β”œβ”€β”€ tests/                  # Backend & Logic Test Suite
β”œβ”€β”€ assets/                 # Architecture Diagrams & Media Assets
└── README.md               # Extensive Project Documentation

πŸ› οΈ Setup & Installation

Backend (Python)

  1. Environment: Requires Python 3.13+.
  2. Install Dependencies: pip install -r requirements.txt
  3. Configuration: Populate .env with MongoDB, Weaviate, and GitHub API credentials.
  4. Launch: uvicorn backend.main:app --reload

Frontend (Node.js)

  1. Environment: Requires Node.js 20+.
  2. Install Dependencies: cd frontend && npm install
  3. Launch: npm run dev

πŸ“œ License & Credits

Distributed under the MIT License.

Special thanks to:

  • LangChain / LangGraph Teams: For providing state-of-the-art orchestration frameworks.
  • CRAG Researchers (Yan et al.): For the "Corrective Retrieval Augmented Generation" methodology.
  • Next.js & ShadCN UI: For the premium web foundations.

GitCortex: Transforming the open-source ecosystem into a searchable, actionable knowledge graph.

About

An intelligent GitHub assistant powered by Corrective RAG (CRAG), built with LangGraph and using MCP, that retrieves and refines information from live GitHub data to deliver accurate answers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors