Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 6 additions & 82 deletions KNOWN-ISSUES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,52 +2,7 @@

## Search Functionality

### 1. NOT Operator Issues

**Status:** Known bug, not yet fixed

**Issue:** The NOT operator has two problems:

1. **Standalone NOT crashes:** `km search "NOT foo"` throws FTS5 syntax error
2. **NOT doesn't exclude:** `km search "foo NOT bar"` returns documents containing both instead of excluding "bar"

**Examples:**
```bash
# Problem 1: Standalone NOT crashes
km search "NOT important"
# Error: SQLite Error 1: 'fts5: syntax error near "NOT"'

# Problem 2: NOT doesn't exclude
km put "foo and bar together"
km put "only foo here"
km search "foo NOT bar"
# Expected: 1 result (only foo here)
# Actual: 2 results (both documents)
```

**Root Cause:**
- FTS5 requires NOT to have a left operand (e.g., `foo NOT bar`), standalone `NOT term` is invalid
- Even when valid, FTS query extraction passes `"NOT (bar)"` to SQLite FTS5 which doesn't work as expected
- No LINQ post-filtering is applied to exclude NOT terms
- The architecture assumes FTS handles all logic, but NOT needs LINQ filtering

**Workaround:**
- For literal text containing "NOT", use quotes: `km search '"NOT important"'`
- Avoid using NOT as a boolean operator

**Fix Required:**
1. Handle standalone NOT gracefully (either treat as literal or provide clear error)
2. Split query: extract positive terms for FTS, negative terms for filtering
3. Apply LINQ filter to FTS results using QueryLinqBuilder
4. Filter out documents matching NOT terms

**Files Affected:**
- `src/Core/Search/NodeSearchService.cs:190` - ExtractLogical NOT handling
- Need to add LINQ filtering after line 89

---

### 2. Field Queries with Quoted Values Fail
### 1. Field Queries with Quoted Values Fail

**Status:** Known bug, not yet fixed

Expand All @@ -70,41 +25,10 @@ km search 'content:"user:password"'

---

## Resolved Issues
## Notes

### Quoted Phrases Don't Escape Operators (Resolved)

**Status:** Fixed

**Issue:** Cannot search for literal phrases containing reserved words like "AND", "OR", "NOT".

**Example:**
```bash
km put "Meeting with Alice AND Bob"
km search '"Alice AND Bob"'
# Now works correctly and finds the document
```

**Resolution:**
- The tokenizer correctly handles quoted strings and preserves them as literal text
- The FTS query extractor properly quotes phrases containing reserved words
- E2E tests added in `SearchEndToEndTests.cs` to prevent regression (tests: `KnownIssue2_*`)

---

## Testing Gaps

These bugs were discovered through comprehensive E2E testing. Previous tests only verified:
- AST structure correctness
- LINQ expression building
- Direct FTS calls

But did NOT test:
- Full pipeline: Parse -> Extract FTS -> Search -> Filter -> Rank
- Default settings (MinRelevance=0.3)
- Actual result verification

**Lesson:** Exit code testing and structure testing are insufficient. Must test actual behavior with real data.

---
### Query Syntax

The infix query parser requires explicit AND between terms:
- `foo AND NOT bar` (correct) instead of `foo NOT bar` (incorrect - ignores NOT)
- `(foo OR baz) AND NOT bar` (correct) instead of `(foo OR baz) NOT bar` (incorrect)
180 changes: 169 additions & 11 deletions src/Core/Search/NodeSearchService.cs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,25 @@

namespace KernelMemory.Core.Search;

/// <summary>
/// Result of FTS query extraction from the AST.
/// Contains the FTS query string for SQLite and a list of NOT terms for post-filtering.
/// SQLite FTS5 has limited NOT support (requires left operand), so NOT terms
/// are filtered via LINQ after FTS returns initial results.
/// </summary>
/// <param name="FtsQuery">The FTS5 query string for positive terms.</param>
/// <param name="NotTerms">Terms to exclude via LINQ post-filtering. Each term includes optional field info.</param>
[System.Diagnostics.CodeAnalysis.SuppressMessage("Performance", "CA1819:Properties should not return arrays")]
public sealed record FtsQueryResult(string FtsQuery, NotTerm[] NotTerms);

/// <summary>
/// Represents a term that should be excluded from search results.
/// Used for LINQ post-filtering since SQLite FTS5 NOT has limitations.
/// </summary>
/// <param name="Term">The term to exclude.</param>
/// <param name="Field">Optional field to check (title/description/content). If null, checks all fields.</param>
public sealed record NotTerm(string Term, string? Field);

/// <summary>
/// Per-node search service.
/// Executes searches within a single node's indexes.
Expand Down Expand Up @@ -61,12 +80,12 @@ public NodeSearchService(
// Query the FTS index
var maxResults = request.MaxResultsPerNode ?? SearchConstants.DefaultMaxResultsPerNode;

// Convert QueryNode to FTS query string
var ftsQuery = this.ExtractFtsQuery(queryNode);
// Convert QueryNode to FTS query string and extract NOT terms for post-filtering
var queryResult = this.ExtractFtsQuery(queryNode);

// Search the FTS index
var ftsMatches = await this._ftsIndex.SearchAsync(
ftsQuery,
queryResult.FtsQuery,
maxResults,
cts.Token).ConfigureAwait(false);

Expand Down Expand Up @@ -95,6 +114,13 @@ public NodeSearchService(
}
}

// Apply NOT term filtering via LINQ (SQLite FTS5 NOT has limitations)
// Filter out any documents that contain the NOT terms
if (queryResult.NotTerms.Length > 0)
{
results = this.ApplyNotTermFiltering(results, queryResult.NotTerms);
}

stopwatch.Stop();
return ([.. results], stopwatch.Elapsed);
}
Expand All @@ -117,11 +143,79 @@ public NodeSearchService(
}

/// <summary>
/// Extract FTS query string from query AST.
/// Converts the AST to SQLite FTS5 query syntax.
/// Only includes text search terms; filtering is done via LINQ on results.
/// Apply NOT term filtering to results via LINQ.
/// Excludes documents that contain any of the NOT terms.
/// </summary>
/// <param name="results">The search results to filter.</param>
/// <param name="notTerms">The terms to exclude.</param>
/// <returns>Filtered results excluding documents containing NOT terms.</returns>
private List<SearchIndexResult> ApplyNotTermFiltering(List<SearchIndexResult> results, NotTerm[] notTerms)
{
return results
.Where(result => !this.ContainsAnyNotTerm(result, notTerms))
.ToList();
}

/// <summary>
/// Check if a result contains any of the NOT terms.
/// </summary>
/// <param name="result">The search result to check.</param>
/// <param name="notTerms">The NOT terms to check for.</param>
/// <returns>True if the result contains any NOT term.</returns>
private bool ContainsAnyNotTerm(SearchIndexResult result, NotTerm[] notTerms)
{
foreach (var notTerm in notTerms)
{
if (this.ContainsNotTerm(result, notTerm))
{
return true;
}
}

return false;
}

/// <summary>
/// Check if a result contains a specific NOT term.
/// </summary>
/// <param name="result">The search result to check.</param>
/// <param name="notTerm">The NOT term to check for.</param>
/// <returns>True if the result contains the NOT term.</returns>
private bool ContainsNotTerm(SearchIndexResult result, NotTerm notTerm)
{
// Case-insensitive contains check
var term = notTerm.Term;

// Check specific field if specified
if (notTerm.Field != null)
{
var fieldValue = notTerm.Field.ToLowerInvariant() switch
{
"title" => result.Title ?? string.Empty,
"description" => result.Description ?? string.Empty,
"content" => result.Content ?? string.Empty,
_ => string.Empty
};

return fieldValue.Contains(term, StringComparison.OrdinalIgnoreCase);
}

// Check all FTS fields (title, description, content)
var title = result.Title ?? string.Empty;
var description = result.Description ?? string.Empty;
var content = result.Content ?? string.Empty;

return title.Contains(term, StringComparison.OrdinalIgnoreCase) ||
description.Contains(term, StringComparison.OrdinalIgnoreCase) ||
content.Contains(term, StringComparison.OrdinalIgnoreCase);
}

/// <summary>
/// Extract FTS query string and NOT terms from query AST.
/// Converts the AST to SQLite FTS5 query syntax for positive terms.
/// NOT terms are collected separately for LINQ post-filtering.
/// </summary>
private string ExtractFtsQuery(QueryNode queryNode)
private FtsQueryResult ExtractFtsQuery(QueryNode queryNode)
{
var visitor = new FtsQueryExtractor();
return visitor.Extract(queryNode);
Expand All @@ -131,9 +225,12 @@ private string ExtractFtsQuery(QueryNode queryNode)
/// Visitor that extracts FTS query terms from the AST.
/// Focuses only on TextSearchNode and field-specific text searches.
/// Logical operators are preserved for FTS query syntax.
/// NOT operators are handled specially - their terms are collected for LINQ post-filtering.
/// </summary>
private sealed class FtsQueryExtractor
{
private readonly List<NotTerm> _notTerms = [];

/// <summary>
/// SQLite FTS5 reserved words that must be quoted when used as search terms.
/// These keywords have special meaning in FTS5 query syntax.
Expand All @@ -143,10 +240,15 @@ private sealed class FtsQueryExtractor
"AND", "OR", "NOT", "NEAR"
};

public string Extract(QueryNode node)
public FtsQueryResult Extract(QueryNode node)
{
var terms = this.ExtractTerms(node);
return string.IsNullOrEmpty(terms) ? "*" : terms;

// If only NOT terms exist (no positive terms), use wildcard to get all documents
// then filter with NOT terms
var ftsQuery = string.IsNullOrEmpty(terms) ? "*" : terms;

return new FtsQueryResult(ftsQuery, [.. this._notTerms]);
}

private string ExtractTerms(QueryNode node)
Expand Down Expand Up @@ -198,6 +300,14 @@ private string ExtractTextSearch(TextSearchNode node)

private string ExtractLogical(LogicalNode node)
{
// Handle NOT and NOR specially - collect terms for LINQ post-filtering
if (node.Operator == LogicalOperator.Not || node.Operator == LogicalOperator.Nor)
{
this.CollectNotTerms(node);
// Return empty string - NOT terms are not included in FTS query
return string.Empty;
}

var childTerms = node.Children
.Select(this.ExtractTerms)
.Where(t => !string.IsNullOrEmpty(t))
Expand All @@ -212,12 +322,60 @@ private string ExtractLogical(LogicalNode node)
{
LogicalOperator.And => string.Join(" AND ", childTerms.Select(t => $"({t})")),
LogicalOperator.Or => string.Join(" OR ", childTerms.Select(t => $"({t})")),
LogicalOperator.Not => childTerms.Length > 0 ? $"NOT ({childTerms[0]})" : string.Empty,
LogicalOperator.Nor => string.Join(" AND ", childTerms.Select(t => $"NOT ({t})")),
_ => string.Empty
};
}

/// <summary>
/// Collect NOT terms from a NOT or NOR node.
/// These terms will be filtered via LINQ after FTS returns results.
/// </summary>
private void CollectNotTerms(LogicalNode node)
{
foreach (var child in node.Children)
{
this.CollectNotTermsFromNode(child);
}
}

/// <summary>
/// Recursively collect NOT terms from a node.
/// </summary>
private void CollectNotTermsFromNode(QueryNode node)
{
switch (node)
{
case TextSearchNode textNode:
// Extract the term and optional field
this._notTerms.Add(new NotTerm(textNode.SearchText, textNode.Field?.FieldPath));
break;

case ComparisonNode comparisonNode:
// Handle field:value comparisons for NOT
if ((comparisonNode.Operator == ComparisonOperator.Contains ||
comparisonNode.Operator == ComparisonOperator.Equal) &&
comparisonNode.Field?.FieldPath != null &&
comparisonNode.Value != null)
{
var term = comparisonNode.Value.AsString();
this._notTerms.Add(new NotTerm(term, comparisonNode.Field.FieldPath));
}

break;

case LogicalNode logicalNode:
// Recursively collect from nested logical nodes
// For nested NOT/NOR, we add all children as NOT terms
// For nested AND/OR within NOT, all their children become NOT terms
foreach (var child in logicalNode.Children)
{
this.CollectNotTermsFromNode(child);
}

break;
}
}

private string ExtractComparison(ComparisonNode node)
{
// Extract text search from Contains OR Equal operator on FTS fields
Expand Down
7 changes: 4 additions & 3 deletions src/Core/Search/Query/Parsers/InfixQueryParser.cs
Original file line number Diff line number Diff line change
Expand Up @@ -161,12 +161,13 @@ private List<Token> Tokenize(string query)
continue;
}

// Quoted string
if (query[i] == '"')
// Quoted string (double or single quotes)
if (query[i] == '"' || query[i] == '\'')
{
var quoteChar = query[i];
i++;
var start = i;
while (i < query.Length && query[i] != '"')
while (i < query.Length && query[i] != quoteChar)
{
i++;
}
Expand Down
Loading
Loading