Skip to content

michaelbuckner/Now-Docs-Vector

Repository files navigation

ServiceNow Documentation Vectorizer

A powerful tool to vectorize ServiceNow documentation for semantic search and integration with AI-powered IDEs (Cursor, Cline, Roocode) via MCP (Model Context Protocol) server.

Features

  • Intelligent Document Chunking: Preserves markdown structure and headers while splitting large documents
  • Vector Database Storage: Uses ChromaDB for efficient similarity search
  • Flexible Embeddings: Supports both OpenAI and local embedding models (Sentence Transformers)
  • MCP Server: Exposes vectorized data to AI-powered IDEs
  • Interactive Query Tool: Rich CLI interface for searching documentation
  • Batch Processing: Efficiently handles large documentation files (2MB+)

Quick Start

Super Quick Setup (One Command)

macOS/Linux:

./setup.sh

Windows:

setup.bat

This will automatically:

  • Create a virtual environment
  • Install all dependencies
  • Configure your environment
  • Verify the setup

1. Manual Installation

# Clone the repository
git clone <your-repo-url>
cd now_docs_vectorized

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Configuration

Option A: Interactive Setup (Recommended)

python setup_env.py

Option B: Manual Setup Create a .env file from the example:

cp .env.example .env

Edit .env to configure:

  • Embedding model (OpenAI or local)
  • Chunk sizes
  • Database settings

For local embeddings (no API key required):

EMBEDDING_MODEL_TYPE=local
EMBEDDING_MODEL_NAME=all-MiniLM-L6-v2

For OpenAI embeddings:

EMBEDDING_MODEL_TYPE=openai
EMBEDDING_MODEL_NAME=text-embedding-3-small
OPENAI_API_KEY=your_api_key_here

3. Verify Setup

# Test that everything is installed correctly
python test_setup.py

4. Index Documentation

# Index the ServiceNow documentation
python index_docs.py

# Options:
python index_docs.py --docs-path your-docs.md
python index_docs.py --chunk-size 1500 --chunk-overlap 300
python index_docs.py --reset  # Clear existing database first

5. Query Documentation

Interactive Mode:

python query_docs.py --interactive

Single Query:

python query_docs.py "Now Assist AI agents"

Advanced Options:

python query_docs.py "your query" --max-results 10 --threshold 0.5 --show-metadata

JSON Output (for programmatic use):

python query_docs.py "your query" --json-output

MCP Server Integration

Starting the MCP Server

python mcp_server.py

Configuring with Cursor/Cline

Add to your MCP configuration file (.cursorrules or similar):

{
  "servers": {
    "servicenow-docs": {
      "command": "python",
      "args": ["/path/to/now_docs_vectorized/mcp_server.py"],
      "env": {
        "PYTHONPATH": "/path/to/now_docs_vectorized"
      }
    }
  }
}

Available MCP Tools

  1. search_docs: Search documentation with semantic similarity

    {
      "query": "Now Assist configuration",
      "max_results": 10,
      "score_threshold": 0.7
    }
  2. get_doc_stats: Get statistics about indexed documentation

    {}
  3. search_by_headers: Search by section headers

    {
      "header_pattern": "AI agents",
      "max_results": 5
    }
  4. get_context: Get surrounding context for a specific chunk

    {
      "chunk_id": "abc123...",
      "context_size": 2
    }

Architecture

┌─────────────────────┐
│  Markdown Document  │
└──────────┬──────────┘
           │
           v
┌─────────────────────┐
│  Document Processor │ ← Chunks with headers preservation
└──────────┬──────────┘
           │
           v
┌─────────────────────┐
│  Embedding Model    │ ← OpenAI or Local (Sentence Transformers)
└──────────┬──────────┘
           │
           v
┌─────────────────────┐
│  ChromaDB Vector    │
│     Database        │
└──────────┬──────────┘
           │
           ├──────────────┐
           v              v
┌─────────────────┐  ┌─────────────────┐
│   Query Tool    │  │   MCP Server    │
└─────────────────┘  └─────────────────┘
                            │
                            v
                     ┌─────────────────┐
                     │  IDE (Cursor,   │
                     │  Cline, etc.)   │
                     └─────────────────┘

Project Structure

now_docs_vectorized/
├── config.py               # Configuration management
├── document_processor.py   # Markdown chunking and parsing
├── vector_store.py        # ChromaDB operations
├── index_docs.py          # Main indexing script
├── query_docs.py          # Interactive query tool
├── mcp_server.py          # MCP server implementation
├── setup_env.py           # Interactive environment setup
├── setup.sh              # Automated setup script (macOS/Linux)
├── setup.bat             # Automated setup script (Windows)
├── requirements.txt       # Python dependencies
├── mcp.json              # MCP server configuration
├── chroma_db/            # Vector database storage
├── venv/                 # Python virtual environment
├── zurich-intelligent-experiences.md  # ServiceNow documentation
└── README.md             # This file

Performance

  • Document Size: Tested with 2MB+ markdown files (45,000+ lines)
  • Indexing Speed: ~100-200 chunks/second (depends on embedding model)
  • Query Speed: <100ms for similarity search
  • Memory Usage: ~500MB for typical documentation set

Embedding Model Comparison

Model Speed Quality Cost Requirements
all-MiniLM-L6-v2 (local) Fast Good Free No API needed
all-mpnet-base-v2 (local) Medium Better Free No API needed
text-embedding-3-small Fast Better $ OpenAI API
text-embedding-3-large Slow Best $$ OpenAI API

Troubleshooting

Issue: "No module named 'chromadb'"

pip install --upgrade chromadb

Issue: "Failed to connect to vector store"

# Ensure you've indexed the documentation first
python index_docs.py --reset

Issue: "OPENAI_API_KEY not set"

Either:

  1. Set the API key in .env file
  2. Switch to local embeddings:
    EMBEDDING_MODEL_TYPE=local

Issue: Slow indexing

  • Use smaller chunk sizes
  • Use local embeddings instead of OpenAI
  • Increase batch size in index_docs.py

Advanced Usage

Custom Chunking Strategy

Edit config.py:

chunk_size: int = 1500  # Larger chunks for more context
chunk_overlap: int = 300  # More overlap for better continuity

Using Different Vector Databases

The system is designed to be modular. To use a different vector database:

  1. Implement the interface in vector_store.py
  2. Update the initialization in VectorStoreManager

Batch Processing Multiple Files

from document_processor import DocumentProcessor
from pathlib import Path

processor = DocumentProcessor()
files = list(Path("docs").glob("*.md"))
all_chunks = processor.process_multiple_files(files)

Development

Running Tests

pytest tests/

Code Formatting

black .

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

[Your License Here]

Acknowledgments

  • Built with LangChain, ChromaDB, and Sentence Transformers
  • MCP protocol by Anthropic
  • ServiceNow documentation format and structure

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published