|
1 | | -# Welcome to your organization's demo respository |
| 1 | +# CodeVector |
2 | 2 |
|
3 | | -This code repository (or "repo") is designed to demonstrate the best GitHub has to offer with the least amount of noise. |
| 3 | +CodeVector is a full-stack app for asking questions about a codebase using RAG (retrieval augmented generation). |
| 4 | +You can either upload a local `.zip` repo or import a GitHub repo URL, then chat with the app about the code. |
4 | 5 |
|
5 | | -The repo includes an `index.html` file (so it can render a web page), two GitHub Actions workflows, and a CSS stylesheet dependency. |
| 6 | +## Features |
6 | 7 |
|
7 | | -``` |
8 | | -*** Added by Lorenc: |
9 | | -This is a basic backend file structure. NOTE it is not final, this is just a means to give us an overview and an idea of how the backend might look like.. |
| 8 | +- Upload a local repository as a `.zip` |
| 9 | +- Import a repository from a GitHub URL |
| 10 | +- Chunk and embed source files with OpenAI embeddings |
| 11 | +- Store vectors in Pinecone (namespaced by job ID) |
| 12 | +- Ask questions and get grounded answers from retrieved code context |
| 13 | +- Save query history to Supabase (for uploaded zip jobs) |
| 14 | + |
| 15 | +## Tech Stack |
| 16 | + |
| 17 | +- Frontend: React 19, TypeScript, Vite, React Router |
| 18 | +- Backend: Node.js, Express |
| 19 | +- AI: OpenAI (`text-embedding-3-small`, `gpt-4o-mini` by default) |
| 20 | +- Vector DB: Pinecone |
| 21 | +- Data logging: Supabase |
| 22 | + |
| 23 | +## Repo Structure |
10 | 24 |
|
11 | | -AskMyRepo/ |
12 | | -├── package.json |
13 | | -├── .env |
| 25 | +```text |
| 26 | +askMyRepo/ |
| 27 | +├── client/ # React + Vite frontend |
| 28 | +├── server/ # Express API + RAG pipeline |
14 | 29 | ├── README.md |
15 | | -├── server/ |
16 | | -│ ├── index.js # Entry point for Express app |
17 | | -│ ├── routes/ |
18 | | -│ │ ├── chatRoutes.js # Route to handle chat with LLM |
19 | | -│ │ ├── embedRoutes.js # Route to embed codebase |
20 | | -│ │ └── repoRoutes.js # Route to handle local repo ingestion |
21 | | -│ ├── controllers/ |
22 | | -│ │ ├── openaiController.js # Talk to OpenAI API |
23 | | -│ │ ├── pineconeController.js # Interact with Pinecone |
24 | | -│ │ └── repoController.js # Read + parse local repo |
25 | | -│ ├── services/ |
26 | | -│ │ ├── embeddingService.js # Handles text embedding |
27 | | -│ │ ├── fileService.js # Read files, tokenize, etc. |
28 | | -│ │ └── pineconeService.js # Upsert/query Pinecone |
29 | | -│ ├── utils/ |
30 | | -│ │ ├── chunkCode.js # Chunk large files for embedding |
31 | | -│ │ ├── logger.js # Logger helper |
32 | | -│ │ └── validate.js # Input/format validators |
33 | | -└── scripts/ |
34 | | - └── ingestLocalRepo.js # CLI script to parse & embed repo |
| 30 | +├── package.json # Placeholder/demo package file (root) |
| 31 | +└── index.html # Legacy/demo file at repo root |
| 32 | +``` |
| 33 | + |
| 34 | +## How It Works |
| 35 | + |
| 36 | +1. A repo is uploaded as `.zip` (`/api/repo/upload-local`) or cloned/imported from GitHub (frontend supports this flow). |
| 37 | +2. Server extracts and scans files (skipping `node_modules`, `.git`, `dist`, `build`). |
| 38 | +3. Files are chunked and embedded with OpenAI. |
| 39 | +4. Embeddings are upserted to Pinecone under a namespace (`jobId`). |
| 40 | +5. User asks a question (`/api/repo/query`). |
| 41 | +6. Relevant chunks are retrieved from Pinecone and passed to the LLM to generate an answer. |
| 42 | + |
| 43 | +## Getting Started |
| 44 | + |
| 45 | +### Prerequisites |
| 46 | + |
| 47 | +- Node.js 18+ (reccomended) |
| 48 | +- npm |
| 49 | +- OpenAI API key |
| 50 | +- Pinecone index/API key |
| 51 | +- Supabase project (optional in concept, but current server code expects env vars for logging) |
| 52 | + |
| 53 | +### 1. Install dependancies |
| 54 | + |
| 55 | +Backend: |
| 56 | + |
| 57 | +```bash |
| 58 | +cd server |
| 59 | +npm install |
| 60 | +``` |
| 61 | + |
| 62 | +Frontend: |
| 63 | + |
| 64 | +```bash |
| 65 | +cd client |
| 66 | +npm install |
| 67 | +``` |
| 68 | + |
| 69 | +### 2. Configure enviroment variables (server) |
| 70 | + |
| 71 | +Create `server/.env` with values similar to: |
| 72 | + |
| 73 | +```env |
| 74 | +PORT=3001 |
| 75 | +OPENAI_API_KEY=your_openai_key |
| 76 | +PINECONE_API_KEY=your_pinecone_key |
| 77 | +PINECONE_INDEX=your_index_name |
| 78 | +SUPABASE_URL=https://your-project.supabase.co |
| 79 | +SUPABASE_SERVICE_KEY=your_service_key |
| 80 | +
|
| 81 | +# Optional model tuning |
| 82 | +RAG_MODEL=gpt-4o-mini |
| 83 | +RAG_TEMPERATURE=0.2 |
| 84 | +``` |
| 85 | + |
| 86 | +Notes: |
| 87 | + |
| 88 | +- `PINECONE_INDEX` is required at startup. |
| 89 | +- Current `supabaseService` initializes immediately, so missing Supabase vars may cause runtime issues. |
| 90 | + |
| 91 | +### 3. Run the app |
35 | 92 |
|
| 93 | +Backend (from `server/`): |
| 94 | + |
| 95 | +```bash |
| 96 | +npm run dev |
| 97 | +``` |
| 98 | + |
| 99 | +Frontend (from `client/`): |
| 100 | + |
| 101 | +```bash |
| 102 | +npm run dev |
36 | 103 | ``` |
| 104 | + |
| 105 | +Default local endpoints: |
| 106 | + |
| 107 | +- Frontend: Vite dev server (usually `http://localhost:5173`) |
| 108 | +- Backend: `http://localhost:3001` |
| 109 | + |
| 110 | +## Main Routes |
| 111 | + |
| 112 | +### Frontend routes |
| 113 | + |
| 114 | +- `/` -> Upload page |
| 115 | +- `/chat/:projectId` -> Chat for uploaded zip job |
| 116 | +- `/chat/repo/:repoName` -> Chat for imported/cloned repo flow |
| 117 | + |
| 118 | +### Backend routes |
| 119 | + |
| 120 | +Mounted under `/api/repo`: |
| 121 | + |
| 122 | +- `POST /upload-local` -> upload `.zip` file (`repoZip` form field) |
| 123 | +- `POST /query` -> ask a question about a previously indexed job |
| 124 | +- `GET /history/job/:jobId` -> fetch query history for a job |
| 125 | + |
| 126 | +## Example API Usage |
| 127 | + |
| 128 | +Upload a zip: |
| 129 | + |
| 130 | +```bash |
| 131 | +curl -X POST http://localhost:3001/api/repo/upload-local \ |
| 132 | + -F "repoZip=@/path/to/repo.zip" |
| 133 | +``` |
| 134 | + |
| 135 | +Query a job: |
| 136 | + |
| 137 | +```bash |
| 138 | +curl -X POST http://localhost:3001/api/repo/query \ |
| 139 | + -H "Content-Type: application/json" \ |
| 140 | + -d '{"jobId":"YOUR_JOB_ID","question":"What does the upload route do?","topK":5}' |
| 141 | +``` |
| 142 | + |
| 143 | +## Current Notes / Gaps |
| 144 | + |
| 145 | +- Root `package.json` is mostly demo/placeholder metadata and not the main app runner. |
| 146 | +- Some legacy files and alternate routes/controllers exist in `server/` and are not all actively wired. |
| 147 | +- The frontend GitHub import flow posts to a `/repo` endpoint and chat fallback uses `/ask`; depending on your backend wiring, you may need to align these endpoints. |
| 148 | + |
| 149 | +## Contributing |
| 150 | + |
| 151 | +1. Fork the repo |
| 152 | +2. Create a feature branch |
| 153 | +3. Make changes |
| 154 | +4. Test locally |
| 155 | +5. Open a pull request |
| 156 | + |
| 157 | +## License |
| 158 | + |
| 159 | +No explicit project license is defined for the main app folders yet (the root `package.json` lists `MIT`, but verify before publishing). |
0 commit comments