An AI-powered developer assistant that analyzes GitHub repositories and answers questions about the codebase using Retrieval-Augmented Generation (RAG).
The system indexes repository files, performs semantic code search using embeddings and FAISS, and generates explanations with Google Gemini through a React chat interface.
- AI-powered codebase understanding
- Index any public GitHub repository
- Semantic code search using embeddings
- FAISS vector database for fast retrieval
- Google Gemini LLM for explanation generation
- Chat interface for multi-turn questions
- Source file references for answers
- Code snippet highlighting from retrieved context
User (React Chat UI)
↓
FastAPI Backend
↓
Repository Ingestion
↓
Code Parsing + Chunking
↓
Embedding Model
↓
FAISS Vector Database
↓
Semantic Code Search
↓
Gemini LLM
↓
Answer + Sources + Code Snippets
This architecture follows the Retrieval-Augmented Generation (RAG) pattern used in modern AI applications.
Frontend
- React
- JavaScript
Backend
- FastAPI
- Python
AI / ML
- Sentence Transformers
- FAISS Vector Database
- Google Gemini API
Other Tools
- GitHub repository ingestion
- LangChain components
- REST API architecture
-
User provides a GitHub repository URL.
-
The backend clones the repository.
-
Code files are parsed and split into smaller chunks.
-
Each chunk is converted into vector embeddings.
-
Embeddings are stored in a FAISS vector database.
-
When a question is asked:
- FAISS retrieves the most relevant code chunks.
- The retrieved code is sent to the Gemini LLM.
-
The AI generates an explanation with source references and code snippets.
Enter a GitHub repository URL:
https://github.com/tiangolo/fastapi
Click Index Repository.
Example questions:
Where is authentication implemented?
How does token validation work?
Explain the dependency injection system.
AI:
Authentication is implemented in the FastAPI security module.
Sources:
fastapi/security.py
Code Snippet:
class OAuth2PasswordBearer:
def __init__(self, tokenUrl: str):
ai-codebase-explainer
│
├── app
│ ├── api
│ │ └── routes.py
│ ├── services
│ │ ├── repo_service.py
│ │ ├── query_service.py
│ │ └── embedding_service.py
│ ├── vectorstore
│ │ └── faiss_store.py
│ └── main.py
│
├── frontend
│ └── React application
│
├── vector_db
│ └── FAISS index storage
│
├── repos
│ └── cloned repositories
│
├── requirements.txt
└── README.md
git clone https://github.com/znixxx30/ai-codebase-explainer.git
cd ai-codebase-explainer
Create a virtual environment:
python -m venv venv
Activate it:
venv\Scripts\activate
Install dependencies:
pip install -r requirements.txt
Create a .env file:
GEMINI_API_KEY=your_api_key_here
uvicorn app.main:app --reload
cd frontend
npm install
npm start
Potential upgrades:
- streaming AI responses
- improved UI styling
- repository caching
- code syntax highlighting
- support for private repositories
Built an AI-powered developer assistant using Retrieval-Augmented Generation (RAG). The system indexes GitHub repositories, performs semantic code search using embeddings and FAISS, and generates explanations with Google Gemini LLM, accessible through a React chat interface.
MIT License


