Code Expert is a free, open-source AI assistant that lets developers input any public GitHub repository URL and instantly ask questions about its code structure, dependencies, and functionality. Powered by advanced RAG (Retrieval Augmented Generation) and Google's Gemini 2.5 Pro, it provides accurate, context-grounded answers to help you explore, understand, and onboard to complex codebases with ease.
- Real-Time Chat: Interact directly with your indexed code and get answers in seconds.
- 100% Free & Open Source: Free to use and modify. Self-hosting or local setup requires your own API keys for Supabase and Google Gemini (see Environment Variables).
- Dual RAG Engines: Compare "Base RAG" (pure semantic similarity) and "Filtered RAG" (semantic similarity with keyword filtering) for optimal answers tailored to your needs.
- Supports Multiple Languages: Automatically chunks and indexes code from various languages including Python, JavaScript, Java, C++, TypeScript, Markdown, and more.
- Powered by Gemini 2.5 Pro: Leverages Google's latest generative AI model for superior code understanding and response generation.
- Supabase Integration: Utilizes Supabase for efficient vector storage and retrieval of code chunks.
- Netlify Functions: Backend logic is deployed as serverless functions on Netlify, ensuring scalability and ease of deployment.
- Intuitive UI: A clean and responsive user interface built with React and Tailwind CSS.
Follow these steps to set up and run Code Expert locally or deploy it.
Before you begin, ensure you have the following installed:
- Node.js: Version 18 or higher.
- npm: Version 8 or higher (comes with Node.js).
- Git: For cloning the repository.
- A Supabase Project: You'll need a Supabase project URL and a
service_rolekey. - A Google Cloud Project: With the Generative AI API enabled and an API Key.
-
Clone the repository:
git clone https://github.com/AryamanGupta001/Code-Expert.git cd Code-Expert -
Install dependencies:
npm install
Code Expert relies on several environment variables for its functionality.
-
Create a
.envfile: Copy the.env.examplefile to.envin the root of your project:cp .env.example .env
-
Populate
.env: Open the newly created.envfile and fill in your credentials:# .env GITHUB_TOKEN=YOUR_GITHUB_TOKEN_HERE # Optional: For cloning private repositories SUPABASE_URL=YOUR_SUPABASE_URL_HERE SUPABASE_SERVICE_KEY=YOUR_SUPABASE_SERVICE_KEY_HERE GEMINI_API_KEY=YOUR_GEMINI_API_KEY_HERE
GITHUB_TOKEN: (Optional) A GitHub Personal Access Token withreposcope if you plan to index private repositories.SUPABASE_URL: Your Supabase project URL, found in your Supabase project settings.SUPABASE_SERVICE_KEY: Your Supabaseservice_rolekey, found under Project Settings > API Keys in your Supabase dashboard. Keep this key secure and do not expose it in client-side code.GEMINI_API_KEY: Your Google AI Studio API Key with access to the Gemini API. Ensure the Generative Language API is enabled in your Google Cloud project.
-
Netlify Environment Variables (for deployment): If deploying to Netlify, you must also configure these environment variables in your Netlify dashboard:
Site Settings>Build & deploy>Environment. Add each key exactly as above.
Code Expert uses a PostgreSQL database with the pgvector extension for storing and querying code embeddings.
-
Enable
pgvectorextension: In your Supabase project, navigate toDatabase>Extensionsand enablepgvector. -
Create
code_chunkstable andmatch_chunks_by_embeddingfunction: Go to your Supabase SQL Editor and run the following SQL commands:create extension if not exists vector; create table if not exists code_chunks ( id uuid primary key default gen_random_uuid(), repo_id text not null, file_path text not null, content text not null, embedding vector(768), -- Default for Xenova/microsoft-codebert-base metadata jsonb, created_at timestamp with time zone default now() ); create function match_chunks_by_embedding( query_embedding vector(768), -- Matches the embedding model dimension repo_filter text, k int ) returns table ( id uuid, repo_id text, file_path text, content text, embedding vector(768), -- Matches the embedding model dimension metadata jsonb, created_at timestamp with time zone, distance float ) as $$ begin return query select *, (embedding <=> query_embedding) as distance from code_chunks where repo_id = repo_filter order by embedding <=> query_embedding limit k; end; $$ language plpgsql;
Note: The
embeddingcolumn type andquery_embeddingparameter in thematch_chunks_by_embeddingfunction are set tovector(768)to match theXenova/microsoft-codebert-basemodel used in this project. If you change the embedding model, ensure you update these dimensions accordingly.
To run the application locally, you'll use Netlify CLI to serve both the frontend and the Netlify Functions.
-
Install Netlify CLI (if you haven't already):
npm install -g netlify-cli
-
Start the development server:
netlify dev
This will typically start the frontend at
http://localhost:8888and expose your Netlify Functions athttp://localhost:8888/.netlify/functions/<functionName>.
- Open the application in your browser (e.g.,
http://localhost:8888). - Navigate to the "Live Demo" section.
- Paste a public GitHub repository URL (e.g.,
https://github.com/facebook/react) into the input field. - Click the "Process Repo" button.
- The application will clone the repository, chunk its code files, generate embeddings, and store them in your Supabase database. This process can take some time depending on the size of the repository. A success message will appear once indexing is complete.
Once a repository has been successfully indexed:
- The chat interface will automatically appear below the "Live Demo" section.
- Type your question about the codebase into the input field (e.g., "What does the
UserServicedo?"). - Choose between "Base RAG" and "Filtered RAG" variants:
- Base RAG: Retrieves code chunks purely based on semantic similarity to your question.
- Filtered RAG: Applies additional keyword filtering to prioritize more specific and relevant chunks, often leading to more precise answers for detailed questions.
- Click the "Send" button.
- Code Expert will retrieve relevant code snippets, use them as context for Gemini 2.5 Pro, and provide a grounded answer. You can also view the metrics (context relevance, groundedness) and the source files used to generate the answer.
All configuration is managed via environment variables as described in the Environment Variables section.
Code Expert exposes two primary Netlify Functions as its backend API:
- Description: Clones a specified GitHub repository, chunks its code files, generates embeddings, and stores them in the Supabase database.
- Request Body:
{ "githubUrl": "string" // The URL of the GitHub repository to process } - Response:
{ "status": "success", "repo_id": "string", // A unique ID (SHA256 hash) for the processed repository "total_chunks": number // The total number of code chunks processed } - Error Response:
{ "error": "string" // Description of the error }
- Description: Answers a question about an already indexed repository using Retrieval Augmented Generation (RAG).
- Request Body:
{ "repo_id": "string", // The unique ID of the indexed repository "question": "string", // The question to ask about the codebase "variant": "base" | "filtered" // The RAG variant to use } - Response:
{ "answer": "string", // The AI-generated answer "metrics": { "context_relevance": number, // How relevant the retrieved context was (0-1) "groundedness": number, // How much of the answer is supported by the context (0-1) "num_chunks_retrieved": number // Number of code chunks used for generation }, "sources": [ { "file_path": "string", // Path to the source file "distance": number // Semantic distance from the question embedding } // ... more source objects ] } - Error Response:
{ "error": "string" // Description of the error }
We welcome contributions to Code Expert! If you'd like to contribute, please follow these steps:
- Fork the repository.
- Clone your forked repository:
git clone https://github.com/AryamanGupta001/Code-Expert.git - Create a new branch:
git checkout -b feature/your-feature-name - Make your changes and ensure they adhere to the existing code style.
- Commit your changes:
git commit -m "feat: Add new feature" - Push to your branch:
git push origin feature/your-feature-name - Open a Pull Request to the
mainbranch of the original repository.
Please ensure your code compiles without errors and passes all checks.
- Processing Time: Indexing large repositories can take a significant amount of time (up to 60 seconds or more) due to cloning, chunking, and embedding processes.
- Cold Start Latency: The embedding model (
Xenova/microsoft-codebert-base) can incur a cold start delay on Netlify Functions, leading to initial requests being slower. - API Rate Limits: Heavy usage might hit rate limits on GitHub (for cloning) or Google Gemini API.
- Public Repositories Only: By default, only public GitHub repositories can be processed. Support for private repositories requires a
GITHUB_TOKENwith appropriate permissions. - File Type Support: Only specific code file extensions are processed (
.py,.js,.java,.cpp,.ts,.tsx,.md).
Here are some planned features and future improvements:
- Enhanced Private Repo Support: More robust authentication for private repositories (e.g., OAuth).
- Alternative LLM Integrations: Support for other generative AI models beyond Gemini.
- Advanced Chunking Strategies: Implement AST-based or semantic chunking for more intelligent code segmentation.
- User Authentication & History: Allow users to log in and persist their chat history and indexed repositories.
- Web UI for Repo Management: A dedicated interface to view, manage, and delete indexed repositories.
- Improved Metrics & Analytics: More detailed insights into RAG performance.
- Streaming Responses: Implement server-sent events for real-time AI response generation.
This project is licensed under the MIT License - see the LICENSE file for details.
Code Expert was developed by Aryaman Gupta.
Special thanks to:
- Netlify: For providing the serverless functions and hosting platform.
- Supabase: For the powerful PostgreSQL database and
pgvectorextension. - Google Gemini: For the advanced generative AI capabilities.
- Hugging Face Transformers.js: For the client-side embedding models.
- All contributors: Who help make this project better.
For questions, feedback, or support, please open an issue on the GitHub repository or contact the maintainer:
- Aryaman Gupta: LinkedIn