This repository contains a Python-based server implementing the Model Context Protocol (MCP). It acts as an intelligent agent tool provider, exposing code search capabilities powered by Azure AI Search and Azure OpenAI embeddings to compatible clients like Visual Studio Code Copilot Chat.
The server is designed specifically for scenarios where the project codebase is very large and developers typically work with only a small subset locally. This server searches a comprehensive index of the entire codebase, providing more reliable results than local file searches.
- Hybrid Code Search: Combines keyword search with semantic vector search using Azure OpenAI embeddings for natural language queries (
code_searchtool). - Full File Content Retrieval: Fetches the complete content of a specific file from the Azure AI Search index (
get_file_contenttool). - Symbol Listing: Lists functions and classes defined within a specific file, as extracted during indexing (
list_symbols_in_filetool). - Optimized for Large Codebases: Explicitly designed to overcome limitations of local file searching when the workspace is incomplete.
- MCP SDK Implementation: Uses the
model-context-protocolPython SDK for clean handling of the MCP protocol. - SSE Transport: Supports Server-Sent Events (SSE) for robust integration with clients like VS Code.
- STDIO Transport: Also supports standard I/O for alternative integrations or testing.
- NLTK Stop Word Filtering: Enhances keyword search relevance by filtering common English stop words.
- Python: Version 3.9 or higher recommended.
- pip & venv: Standard Python package management tools.
- Git: For cloning the repository.
- Azure Subscription: Access to Azure services.
- Azure AI Search Service: An active Azure AI Search instance.
- Azure OpenAI Service: An active Azure OpenAI instance with access to an embedding model (e.g.,
text-embedding-3-large,text-embedding-ada-002). You'll need a deployment of this model. - Pre-existing Azure AI Search Index: This server consumes an existing index. You must have already indexed your codebase into Azure AI Search. The index schema must include at least the following fields:
filePath(string, filterable, retrievable) - The unique path identifier for the file.content(string, retrievable) - The full content of the file.embedding(vector, searchable, retrievable) - The vector embedding of the file content (or relevant chunks).functions(Collection(Edm.String), retrievable) - Optional, but needed forlist_symbols_in_file. A list of function names defined in the file.classes(Collection(Edm.String), retrievable) - Optional, but needed forlist_symbols_in_file. A list of class names defined in the file. (Creating and populating this index is outside the scope of this server - see Azure AI Search documentation for indexing strategies).
-
Clone the Repository:
git clone https://github.com/Arthur742Ramos/azure-code-search-mcp cd azure-code-search-mcp -
Install Requirements: Create a file named
requirements.txtin the root of the repository with the following content:# requirements.txt python-dotenv>=1.0.0 numpy>=1.24.0 openai>=1.1.0 azure-identity>=1.15.0 azure-search-documents>=11.4.0 mcp>=1.6.0 uvicorn>=0.24.0 nltk>=3.8.1Then, install the packages using pip:
pip install -r requirements.txt
-
Download NLTK Data: The first time you run the server, it will attempt to download the necessary 'stopwords' corpus from NLTK automatically. Ensure you have an internet connection during the first run.
The server requires Azure credentials and endpoints, which should be provided via a .env file in the root directory of the project.
-
Create
.envfile: Create a file named.env. -
Add Environment Variables: Copy the following template into your
.envfile and replace the placeholder values with your actual Azure service details:# .env file AZURE_SEARCH_ENDPOINT="https://YOUR_SEARCH_SERVICE_NAME.search.windows.net" AZURE_SEARCH_INDEX_NAME="your-code-index-name" AZURE_SEARCH_ADMIN_KEY="YOUR_AZURE_AI_SEARCH_ADMIN_API_KEY" AZURE_OPENAI_ENDPOINT="https://YOUR_AOAI_RESOURCE_NAME.openai.azure.com/" AZURE_OPENAI_API_KEY="YOUR_AZURE_OPENAI_API_KEY" AZURE_OPENAI_EMBEDDING_DEPLOYMENT="your-embedding-deployment-name" # e.g., text-embedding-3-large # Optional: Override default server settings # PORT=8000 # HOST="0.0.0.0" # LOG_LEVEL="info"
AZURE_SEARCH_ENDPOINT: Full URL of your Azure AI Search service.AZURE_SEARCH_INDEX_NAME: The name of the index containing your codebase.AZURE_SEARCH_ADMIN_KEY: An Admin API key for your Azure AI Search service (required for querying withAzureKeyCredential). Consider using Managed Identity andDefaultAzureCredentialin production environments for better security.AZURE_OPENAI_ENDPOINT: Full URL of your Azure OpenAI service.AZURE_OPENAI_API_KEY: API key for your Azure OpenAI service.AZURE_OPENAI_EMBEDDING_DEPLOYMENT: The specific deployment name of your embedding model within Azure OpenAI.
Ensure your virtual environment is activated and the .env file is configured correctly.
-
Start with SSE Transport (Default for VS Code):
python mcp_sdk_server.py # Assuming your file is named mcp_sdk_server.pyThe server will start listening on
http://0.0.0.0:8000by default. -
Start on a Different Port:
python mcp_sdk_server.py --port 8080
-
Start with STDIO Transport:
python mcp_sdk_server.py --transport stdio
Keep the terminal window open while you intend to use the server. Monitor the logs for connection information, tool calls, and potential errors.
To allow VS Code features like Copilot Chat to use this server as an agent tool:
-
Ensure the Server is Running: Start the server using the
ssetransport as described above. Note the host and port (e.g.,http://localhost:8000). -
Open VS Code User Settings (JSON):
- Press
Ctrl+Shift+P(Windows/Linux) orCmd+Shift+P(macOS). - Type
Preferences: Open User Settings (JSON)and select it.
- Press
-
Add MCP Server Configuration: Add the following JSON object to your
settings.json. The exact location might vary depending on the VS Code extension providing the MCP client:Option A (Directly at Root): Try this structure first.
// In settings.json { // ... other settings ... "mcp": { "servers": { "code_Search": { // This name becomes @code_search in chat "type": "sse", // MUST be "sse" "url": "http://localhost:8000/sse" // URL to the GET /sse endpoint // Adjust port if you changed it! } // You could add other custom MCP servers here } }, // ... more settings ... }
Option B (Nested under Copilot): If Option A doesn't work, try nesting it under
github.copilot.advanced(this was common previously).// In settings.json { // ... other settings ... "github.copilot.advanced": { // ... potentially other advanced settings ... "mcp": { "servers": { "code_search": { // This name becomes @code_search in chat "type": "sse", "url": "http://localhost:8000/sse" } } } }, // ... more settings ... }
- Replace
"code_search"with your preferred name for invoking the agent (e.g.,@AzureSearch,@MyCode). - Ensure
"type": "sse"matches the server's transport. - Verify the
"url"points to the correcthost:port/ssewhere your server is running. Uselocalhostor127.0.0.1for the host if running locally. - Note: If neither structure works, consult the documentation for the specific VS Code extension (e.g., Copilot Chat, specific Agent extensions) you are using to see where it expects MCP server configurations.
- Replace
-
Save
settings.json. -
Restart VS Code: Completely close and reopen VS Code to ensure it picks up the new settings.
-
Test in Chat:
- Open the Copilot ChaAgent view (or equivalent).
- Type
#code_search(or the name you chose). It might appear in autocompletion. - Ask questions that should trigger the tools:
#code_search where is the EnlistmentCleaner class defined?#code_search show me how database connections are configured- (After a search)
#code_search get file content for src/database/connection.py - (After a search)
#code_search list symbols in file src/api/auth.py
- Observe the chat response and the server logs in your terminal.
- Server Fails to Start: Check the terminal logs. Common issues include missing environment variables in
.envor incorrect Azure credentials/endpoints. Ensure NLTK data download succeeds on first run. - Connection Refused (VS Code): Ensure the server is running and accessible. Verify the
urlinsettings.jsonmatches the host and port shown in the server startup logs. Check firewalls if not running locally. - 404 Not Found Errors: Double-check the
urlpath insettings.jsonis exactly/sse. - Tool Errors in Chat: Check the server logs for Python exceptions during tool execution (e.g., Azure API errors, embedding failures, search failures). Improve error handling in the server code if needed.
- Tool Not Used as Expected: Refine the tool descriptions in
list_tools_handlerto be clearer about the tool's purpose and when it should be used (as demonstrated in the code). Restart VS Code after changing descriptions.
This project is licensed under the MIT License - see the LICENSE file for details (assuming you add an MIT license file).