This document explains the architecture and operational principles of the incremental indexing system in RagCode.
Incremental indexing allows RagCode to update the search index efficiently by processing only the files that have changed since the last indexing run. This significantly reduces the time and computational resources required to keep the knowledge base up-to-date.
The system relies on three main concepts:
- State Tracking: Remembering the state (modification time, size) of files from the previous run.
- Change Detection: Comparing the current file system state against the saved state to identify added, modified, or deleted files.
- Selective Updates: Updating the vector database (Qdrant) only for the affected files.
The state of the workspace is persisted in a JSON file located at .ragcode/state.json within the workspace root.
Structure:
{
"files": {
"/path/to/file.go": {
"mod_time": "2023-10-27T10:00:00Z",
"size": 1024
}
},
"last_indexed": "2023-10-27T10:05:00Z"
}Incremental indexing can be triggered in two ways:
- Manual: Via the
index_workspacetool or theindex-allCLI utility. - Automatic: Whenever any MCP tool (e.g.,
search_code,get_function_details,find_type_definition, etc.) accesses an already indexed workspace. The Manager detects the collection, runscheckAndReindexIfNeededin a goroutine, and if changes are detected, startsIndexLanguagein the background without blocking the agent's response.
The diagram below describes the common flow used in both scenarios:
graph TD
A[Start Indexing] --> B{Collection Exists?}
B -- No --> C[Full Indexing]
B -- Yes --> D[Load State (.ragcode/state.json)]
D --> E[Scan Current Files]
E --> F{Compare with State}
F -->|New/Modified| G[Add to Index List]
F -->|Deleted/Modified| H[Add to Delete List]
F -->|Unchanged| I[Ignore]
H --> J[Delete Old Chunks from Qdrant]
G --> K[Index New Content]
J --> L[Update State]
K --> L
L --> M[Save State]
M --> N[Finish]
The WorkspaceManager detects the workspace and attempts to load .ragcode/state.json. If the file doesn't exist, it assumes a fresh state.
The system iterates through all currently detected source files for the target language:
- Modified: If a file exists in the state but has a different
mod_timeorsize, it is marked for re-indexing. - New: If a file is not in the state, it is marked for indexing.
- Deleted: If a file is in the state but no longer exists on disk, it is marked for deletion.
For every file marked as Modified or Deleted, the system performs a cleanup in the vector database.
- It calls
DeleteByMetadata(ctx, "file", filePath). - This removes all code chunks associated with that specific file path, ensuring no duplicate or phantom results remain.
The system runs the standard indexing pipeline (Analyzer -> Chunker -> Embedder -> Vector DB) only for the list of new or modified files.
Finally, the in-memory state is updated with the new file information, and state.json is rewritten to disk.
- Speed: Re-indexing a project with thousands of files takes seconds if only a few files changed.
- Efficiency: Reduces LLM embedding costs by not re-embedding unchanged code.
- Consistency: Ensures the search index accurately reflects the current code, including deletions.
- First Run: Any MCP tool receiving a
file_pathwill detect the workspace and trigger collection creation + full indexing. - Subsequent Runs: The same tools check
state.jsonand automatically trigger incremental re-indexing when they detect changed files. There is no need to callindex_workspacemanually.
You can still force a manual run using index_workspace:
# First run - indexes all files
index_workspace --file_path /path/to/project
# Subsequent runs - only indexes changed files
index_workspace --file_path /path/to/projectThe index-all command-line utility also supports incremental indexing:
# First run
./bin/index-all -paths /path/to/project
# Output: "📝 Indexing 77 new/modified files..."
# Second run (no changes)
./bin/index-all -paths /path/to/project
# Output: "✨ No code changes detected for language 'go'"Currently, markdown files are re-indexed on every run. The incremental logic applies only to source code files (Go, PHP, etc.). Future versions will extend incremental indexing to documentation files as well.
The .ragcode/state.json file is stored in the workspace root. This directory should be added to .gitignore as it contains local indexing state that should not be shared between developers.
To verify incremental indexing is working:
- Initial Index: Run indexing on a project and note the number of files indexed.
- No-Change Run: Run indexing again without modifying any files. You should see "No code changes detected".
- Single File Modification: Modify one file and run indexing. You should see "Indexing 1 new/modified files...".
- New File Addition: Add a new file and run indexing. The new file should be detected and indexed.
Example output showing successful incremental operation:
🔎 Indexing Go files in '.' (incremental)...
2025/11/23 22:40:56 🚀 Starting indexing for workspace: .
2025/11/23 22:40:56 Collection: do-ai-code
2025/11/23 22:40:56 Language: go
2025/11/23 22:40:56 ✨ No code changes detected for language 'go'
- Server logs are written by default to
~/.local/state/ragcode/mcp.log(configurable vialogging.path). - Set
logging.level: debugto see messages like🔄 Auto-detected file changes…,📝 Indexing N new/modified files…,✨ No code changes detected…, etc. - Example monitoring command:
tail -f ~/.local/state/ragcode/mcp.log | grep -E "Auto-reindex|Indexing|No code"