The vCon MCP server provides four search tools with different capabilities, from simple filtering to advanced semantic search.
Best for: Finding vCons by metadata (subject, parties, dates)
Searches:
- Subject line
- Party names, emails, phone numbers
- Creation dates
Does NOT search:
- Dialog content
- Analysis content
- Attachments
Example:
{
"subject": "customer support",
"party_name": "John Doe",
"start_date": "2024-01-01T00:00:00Z",
"limit": 10
}Returns: Complete vCon objects matching the filters
Best for: Finding specific words or phrases in conversation content
Searches:
- ✅ Subject
- ✅ Dialog bodies (conversations, transcripts)
- ✅ Analysis bodies (summaries, sentiment, etc.)
- ✅ Party information (names, emails, phones)
- ❌ Attachments (not indexed for full-text search)
Features:
- Full-text search with ranking
- Typo tolerance via trigram indexing
- Highlighted snippets in results
- Tag filtering support
- Date range filtering
Example:
{
"query": "billing issue refund",
"tags": {"department": "sales"},
"start_date": "2024-01-01T00:00:00Z",
"limit": 50
}Returns: Ranked results with snippets showing where matches were found
Result format:
{
"success": true,
"count": 5,
"results": [
{
"vcon_id": "uuid",
"content_type": "analysis", // or "subject", "dialog", "party"
"content_index": 0,
"relevance_score": 0.85,
"snippet": "...regarding the billing issue and potential refund..."
}
]
}Best for: Finding conversations by meaning, not just keywords
Searches:
- ✅ Subject (embedded)
- ✅ Dialog bodies (embedded)
- ✅ Analysis bodies with
encoding='none'orNULL(embedded) - ❌ Analysis with
encoding='base64url'orencoding='json'(not embedded) - ❌ Attachments (not embedded)
Features:
- Finds conceptually similar content
- Works across paraphrases and synonyms
- AI embeddings using 384-dimensional vectors
- Tag filtering support
- Similarity threshold control
Requirements:
- Embeddings must be generated first (see embedding documentation)
- Currently requires pre-computed embedding vector (384 dimensions)
Example:
{
"query": "customer angry about late delivery",
"threshold": 0.7,
"limit": 20
}Returns: Similar conversations ranked by semantic similarity
Best for: Comprehensive search combining exact matches and conceptual similarity
Searches:
- Everything from keyword search (subject, dialog, analysis, parties)
- Everything from semantic search (embedded content)
Features:
- Combines full-text and semantic search
- Adjustable weighting between keyword and semantic results
- Best of both worlds: exact matches + conceptual matches
- Tag filtering support
Example:
{
"query": "billing dispute",
"semantic_weight": 0.6,
"tags": {"priority": "high"},
"limit": 30
}Parameters:
semantic_weight: 0-1 (default 0.6)- 0.0 = 100% keyword search
- 1.0 = 100% semantic search
- 0.6 = 60% semantic, 40% keyword (recommended)
Returns: Combined results with both keyword and semantic scores
Attachments are NOT indexed for search in the current implementation.
Why?
- Binary content: Many attachments contain binary data (PDFs, images, audio) that isn't suitable for text-based search
- Encoding: Attachments with
encoding='base64url'contain encoded data, not searchable text - Structured data: Attachments with
encoding='json'contain structured data that produces poor quality embeddings
Attachments of type tags with encoding='json' ARE used for filtering, but not for content search.
Example tags attachment:
{
"type": "tags",
"encoding": "json",
"body": ["department:sales", "priority:high", "region:west"]
}These tags can be used with the tags parameter in any search tool:
{
"query": "customer complaint",
"tags": {"department": "sales", "priority": "high"}
}Potential future support for attachment content search:
- Text extraction: Extract text from PDFs, Word docs, etc.
- Audio transcription: Transcribe audio attachments to searchable text
- OCR: Extract text from images
- Selective indexing: Index only attachments with text content
If you need to search attachment content, consider:
- Extracting text and adding it as an analysis element
- Adding a summary of attachment content as an analysis
- Using attachment metadata in tags
Analysis elements are included in search, with filtering based on encoding:
| Encoding | Keyword Search | Semantic Search | Notes |
|---|---|---|---|
none or NULL |
✅ Yes | ✅ Yes | Plain text content, ideal for search |
json |
✅ Yes | ❌ No | Included in keyword search only |
base64url |
✅ Yes | ❌ No | Included in keyword search only |
Analysis with encoding='none' contains human-readable text like:
- Conversation summaries
- Transcriptions
- Sentiment analysis results
- Translation output
- Natural language insights
These are ideal for semantic search because they contain meaningful natural language.
Analysis with encoding='json' or encoding='base64url' typically contains:
- Structured data (poor quality embeddings)
- Binary content (not suitable for embeddings)
- Encoded data (not searchable as text)
| Feature | search_vcons | search_vcons_content | search_vcons_semantic | search_vcons_hybrid |
|---|---|---|---|---|
| Subject | ✅ Filter | ✅ Search | ✅ Search | ✅ Search |
| Dialog | ❌ | ✅ Search | ✅ Search | ✅ Search |
| Analysis | ❌ | ✅ Search | ✅ (encoding=none) | ✅ All |
| Attachments | ❌ | ❌ | ❌ | ❌ |
| Party Info | ✅ Filter | ✅ Search | ❌ | ✅ Search |
| Tags | ❌ | ✅ Filter | ✅ Filter | ✅ Filter |
| Ranking | ❌ | ✅ Relevance | ✅ Similarity | ✅ Combined |
| Snippets | ❌ | ✅ Yes | ❌ | ❌ |
| Requires Embeddings | ❌ | ❌ | ✅ |
-
search_vcons: Quick metadata lookups- "Find vCons with party email john@example.com"
- "Show me vCons from last week"
- "List vCons with subject containing 'urgent'"
-
search_vcons_content: Keyword-based content search- "Find conversations mentioning 'refund'"
- "Search for 'technical support' in dialog"
- "Find analysis containing 'positive sentiment'"
-
search_vcons_semantic: Concept-based search- "Find conversations where customer was unhappy"
- "Show me calls about payment issues"
- "Find similar conversations to this one"
-
search_vcons_hybrid: Comprehensive search- "Find all billing-related conversations" (gets both exact matches and related topics)
- "Search for customer complaints" (finds variations and synonyms)
- Best when you want both precision and recall
- Use filters: Date ranges and tags can dramatically reduce search scope
- Set appropriate limits: Start with smaller limits (10-20) for faster results
- Choose the right tool: Don't use semantic search if keyword search is sufficient
- Pre-generate embeddings: Semantic search requires embeddings to be generated beforehand
For semantic and hybrid search to work effectively, you need to generate embeddings for your vCons.
See the following guides:
- INGEST_AND_EMBEDDINGS.md - Complete guide to embedding generation
- EMBEDDING_STRATEGY_UPGRADE.md - Details on which content is embedded
Quick start:
# Generate embeddings continuously
npm run sync:embeddings
# Or as part of full sync
npm run sync
# Check embedding coverage
npm run embeddings:check- Check that the content exists in dialog or analysis
- Try a simpler query (fewer words)
- Use wildcards or partial words
- Check date range filters
- Semantic search currently requires pre-computed embeddings
- Use
search_vcons_contentfor keyword search instead - Generate embeddings using the scripts in
/scripts/
- The system uses 384-dimensional embeddings
- If you're providing embeddings, ensure they match this dimension
- Use
text-embedding-3-smallwithdimensions=384(OpenAI) - Or use
sentence-transformers/all-MiniLM-L6-v2(Hugging Face)
- For keyword search: Try simpler, more specific terms
- For semantic search: Ensure embeddings are up to date
- For hybrid search: Adjust
semantic_weightparameter - Consider using tags to filter results
{
"query": "customer complaint angry upset frustrated",
"limit": 20
}{
"query": "pricing quote proposal",
"tags": {
"department": "sales",
"priority": "high"
},
"start_date": "2024-01-01T00:00:00Z"
}{
"query": "billing invoice payment",
"semantic_weight": 0.3,
"limit": 30
}- Get the vCon's embedding from the database
- Use it in semantic search:
{
"embedding": [0.123, 0.456, ...], // 384 dimensions
"threshold": 0.75,
"limit": 10
}- Getting Started - Getting started with vCon MCP
- Ingest and Embeddings - Embedding generation
- Search Optimization Guide - Database search performance