Skip to content

fix: filter audit recursion from default search#277

Open
EtanHey wants to merge 6 commits into
mainfrom
fix/audit-recursion-filter
Open

fix: filter audit recursion from default search#277
EtanHey wants to merge 6 commits into
mainfrom
fix/audit-recursion-filter

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented May 3, 2026

Summary

  • Exclude audit/eval memories from default Python hybrid/search paths using tag-based detection for audit, r02/r0x, audit-pollution-source, and agent=auditor.
  • Add include_audit opt-in to brain_search/brain_recall and document the default filter in tool descriptions.
  • Mirror the default audit-recursion exclusion in BrainBar Swift search and add MCP regression coverage.

Tests

  • uv run pytest tests/test_audit_recursion_filter.py -q
  • swift test --package-path brain-bar --filter MCPRouterTests/testBrainSearchExcludesAuditRecursionByDefaultAndAllowsOptIn
  • swift test --package-path brain-bar
  • ./scripts/run_tests.sh
  • Pre-push hook reran ./scripts/run_tests.sh

Notes

  • CodeRabbit CLI local review could not start due account hourly cap / usage-based add-on limit.
  • Follow-up to R02 audit-recursion contamination: default retrieval should stop returning auditor prompt chunks unless include_audit=true.

Note

Medium Risk
Changes default retrieval behavior across Swift and Python by filtering out audit/eval-tagged chunks, which can affect recall quality and any workflows that relied on those results. Adds new SQL predicates and tag-detection logic (with legacy DB fallbacks) that could impact query performance or edge-case matching.

Overview
Default search now excludes audit/eval “recursion” chunks unless callers pass include_audit=true/false (Swift/Python).

This threads a new include_audit flag through brain_search/brain_recall (tool schemas + router/handlers) and the underlying retrieval paths (hybrid_search, think, recall, plus Swift BrainDatabase.search/FTS and exact chunk-id lookup). Filtering is implemented via tag-based predicates (e.g., audit, agent=auditor, r0[0-9]) with a chunk_tags-table fast path and a JSON-tag fallback for legacy/readonly DBs.

Adds regression coverage verifying default exclusion, explicit opt-in inclusion, and avoiding false positives for tags containing r0x substrings, and adjusts exact chunk-id lookups to return an empty structured result when filtered rather than short-circuiting with None.

Reviewed by Cursor Bugbot for commit 4a8c8ee. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Filter audit recursion tags from default search results across all search paths

  • Adds an include_audit parameter (default false) to all search entry points — hybrid_search, search, think, recall, and exact chunk-id lookup — excluding chunks tagged with audit/eval markers (audit, agent=auditor, r0[0-9], r0x) unless explicitly opted in.
  • Introduces SQL exclusion predicates in search_repo.py using a NOT EXISTS / json_each approach, with a faster path via a dedicated chunk_tags table when available.
  • Mirrors the same filtering in the Swift BrainDatabase via a static auditRecursionTagExclusionSQL helper applied to both FTS and exact-ID search paths.
  • Exposes include_audit as an optional boolean in the brain_search and brain_recall MCP tool schemas for both the Python and Swift layers.
  • Behavioral Change: search results now exclude audit/eval-tagged chunks by default; callers that previously relied on these chunks appearing in results must explicitly pass include_audit: true.

Macroscope summarized 4a8c8ee.

Summary by CodeRabbit

  • New Features

    • Search results now exclude audit/evaluation memories by default for cleaner, more relevant results; an opt-in flag lets you include them when needed (applies to search, recall, and think flows and tool calls).
  • Tests

    • Added tests to verify default exclusion, opt-in inclusion, and correct handling of similar tag patterns.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

📝 Walkthrough

Walkthrough

Search APIs gain an opt-in include_audit flag (default false) that suppresses audit-recursion–tagged chunks unless enabled; the flag is threaded through MCP/tool schemas, routing, engine/handler logic, vector-store search queries, SQL exclusion helpers, and tests.

Changes

Audit Recursion Filtering Throughout Search Pipeline

Layer / File(s) Summary
Tool Schema & Router
src/brainlayer/mcp/__init__.py, brain-bar/Sources/BrainBar/MCPRouter.swift
Adds include_audit: boolean to brain_search/brain_recall tool schemas and parses/forwards the flag from MCP router into internal recall/search paths.
API Signatures / Wiring
brain-bar/Sources/BrainBar/BrainDatabase.swift, src/brainlayer/mcp/search_handler.py, src/brainlayer/engine.py, src/brainlayer/search_repo.py
Adds includeAudit/include_audit parameters to public/internal search methods (search, searchCandidates, _brain_search, _brain_recall, _search, think, recall, SearchMixin.hybrid_search, _binary_search, etc.) and threads the flag through call chains.
Audit Detection Helpers
src/brainlayer/search_repo.py, brain-bar/Sources/BrainBar/BrainDatabase.swift
Introduces audit-recursion detection constants and helpers (AUDIT_RECURSION_TAG_PATTERNS, _audit_recursion_exclusion_sql, auditRecursionTagExclusionSQL(alias:), _is_audit_recursion_metadata) and a SQL NOT EXISTS fragment generator for excluding audit-tagged chunks.
SQL/Query Changes
src/brainlayer/search_repo.py, brain-bar/Sources/BrainBar/BrainDatabase.swift
Appends audit-exclusion NOT EXISTS (...) to vector/FTS/binary WHERE clauses and FTS extra filters when include_audit is false; scopes hybrid cache keys by include_audit; adds post-RRF metadata filtering when needed.
VectorStore Availability Tracking
src/brainlayer/vector_store.py
Tracks presence of chunk_tags table to choose between tag-table vs JSON tag exclusion SQL paths.
Tests
brain-bar/Tests/BrainBarTests/MCPRouterTests.swift, tests/test_audit_recursion_filter.py, tests/test_search_exact_chunk_id.py
Adds tests verifying default exclusion of audit-recursion chunks, opt-in inclusion via include_audit, correct handling of r0x-like substrings, JSON fallback for legacy readonly DB, and forwarding of include_audit through engine/handler/store calls.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Router as MCP Router
    participant Handler as Search Handler
    participant Repo as Search Repo
    participant Store as VectorStore/DB

    Client->>Router: brain_search(query, include_audit=true)
    Router->>Handler: _brain_search(query, include_audit=true)
    Handler->>Repo: _search(query, include_audit=true)
    Repo->>Store: hybrid_search(query, include_audit=true)
    Note over Store: If include_audit == false,\napply NOT EXISTS audit-exclusion SQL\n(or JSON-tag fallback)
    Store-->>Repo: results (filtered or full)
    Repo-->>Handler: post-RRF filter applied if needed
    Handler-->>Router: formatted response
    Router-->>Client: search results
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

Poem

🐇 A hop, a flag, a careful trace,

Hidden audits find a quieter place.
Flip the switch and let them show,
Search remembers what you want to know.
🥕🔎

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.94% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: filter audit recursion from default search' directly and clearly summarizes the main change across the codebase—introducing audit recursion filtering as the default behavior with an opt-in flag to include them.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/audit-recursion-filter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 3, 2026

@codex review\n@cursor review\n@bugbot review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

exact_chunk_hit = _exact_chunk_lookup_result(
query,

P1 Badge Apply include_audit filter to exact chunk-id lookups

_brain_search now defaults include_audit to false, but chunk-id-shaped queries still short-circuit through _exact_chunk_lookup_result before reaching _search(..., include_audit=...). That means an audit-tagged chunk can still be returned by default when the query equals its chunk ID, which breaks the new “exclude audit/eval by default” contract and can reintroduce audit-recursion contamination through direct-id retrieval.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/brainlayer/search_repo.py Outdated
AUDIT_RECURSION_TAG_PATTERNS = (
"LOWER(tag) LIKE '%audit%'",
"LOWER(tag) = 'r02'",
"LOWER(tag) GLOB '*r0[0-9]*'",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict r0x audit tag match to whole tags

The new audit filter uses LOWER(tag) GLOB '*r0[0-9]*', which matches any tag containing r0<digit> as a substring, not just dedicated audit tags (for example, tags like error01 or mirror07). Because this predicate is applied across default search paths, unrelated memories can be silently excluded and retrieval quality regresses; the match should be constrained to exact audit-tag forms.

Useful? React with 👍 / 👎.

Comment thread src/brainlayer/mcp/search_handler.py
Comment thread src/brainlayer/mcp/search_handler.py
@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 3, 2026

@codex review
@cursor review
@BugBot review

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 316769ba1f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/brainlayer/search_repo.py Outdated
Comment on lines +50 to +52
"SELECT 1 FROM chunk_tags audit_tags "
"WHERE audit_tags.chunk_id = {chunk_id_expr} "
f"AND ({' OR '.join(AUDIT_RECURSION_TAG_PATTERNS)})"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard audit-tag filter when chunk_tags table is unavailable

This new default filter hard-depends on chunk_tags, but VectorStore._init_readonly_db() explicitly skips migrations, so opening a legacy read-only DB that predates chunk_tags will now make default search fail with no such table: chunk_tags (because include_audit defaults to False and always injects this predicate). That turns previously working read-only retrieval into an error path unless callers know to opt into include_audit=true, so the filter should be conditional on table existence or fall back to chunks.tags JSON scanning.

Useful? React with 👍 / 👎.

Comment thread src/brainlayer/search_repo.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/brainlayer/mcp/search_handler.py (1)

174-199: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop exact chunk-id routing after an audit-filtered hit.

Returning None here tells _brain_search() the exact lookup missed, so a chunk-id-shaped query for an excluded audit chunk can still fan out into fuzzy hybrid/entity search and produce unrelated results. Once the chunk exists and is intentionally hidden, this path should return an explicit empty exact-result sentinel instead of falling through.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/mcp/search_handler.py` around lines 174 - 199, The exact-chunk
branch currently returns None for audit-filtered hits which signals
`_brain_search()` to treat it as a miss and fall back to fuzzy/entity searches;
change the branch so that when a chunk is found but excluded (checks around
include_audit and `_is_audit_recursion_metadata`, or other exclusion checks like
superseded_by/aggregated_into/archived_at, tag/project/content_type mismatches)
you return an explicit empty exact-result sentinel (e.g., a module-level
constant like EMPTY_EXACT_RESULT or a distinct dict/object) instead of None so
`_brain_search()` can detect "intentional hidden exact hit" and avoid fan-out;
update code paths that inspect exact results to recognize this sentinel.
src/brainlayer/search_repo.py (1)

394-415: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Over-fetch KNN candidates when audit filtering is active.

include_audit=False adds another filter that can discard nearest neighbors after the vector match, but needs_overfetch still ignores it. If a query is close to many audit-tagged chunks, both semantic paths can return fewer than n_results non-audit candidates even though valid matches exist.

Suggested fix
-            needs_overfetch = (
-                entity_id
-                or (source_filter and source_filter != "claude_code")
-                or source_filter_like
-                or correction_category
-            )
+            needs_overfetch = (
+                entity_id
+                or (source_filter and source_filter != "claude_code")
+                or source_filter_like
+                or correction_category
+                or not include_audit
+            )
-        needs_overfetch = (
-            entity_id or (source_filter and source_filter != "claude_code") or source_filter_like or correction_category
-        )
+        needs_overfetch = (
+            entity_id
+            or (source_filter and source_filter != "claude_code")
+            or source_filter_like
+            or correction_category
+            or not include_audit
+        )

Also applies to: 735-749

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/search_repo.py` around lines 394 - 415, The overfetch decision
(needs_overfetch) ignores audit filtering; include the audit flag so that when
include_audit is False we also over-fetch—update the needs_overfetch expression
to include "or not include_audit" (or equivalent) so effective_k is bumped when
audit filtering can discard neighbors, and make the identical change in the
other copy of this logic (the block around lines 735-749) so both semantic paths
over-fetch when audit filtering is active.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@brain-bar/Tests/BrainBarTests/MCPRouterTests.swift`:
- Around line 618-630: The opt-in test currently only checks that an audit chunk
appears (via optInResponse/optInText) and misses verifying ordinary result
chunks; update the test that calls router.handle with include_audit=true to also
assert that a normal (non-audit) chunk is still present—e.g., inspect
optInResponse["result"] -> "content" array (same as optInText extraction) and
assert there exists at least one item whose "text" does NOT contain the audit
marker (e.g., does not contain "audit-recurs"), in addition to the existing
XCTAssertTrue that checks for the audit chunk; keep using the same router.handle
call and optInResponse/optInText variables to locate and assert both conditions.

In `@tests/test_audit_recursion_filter.py`:
- Around line 7-19: The test helper _insert_chunk currently writes rows to
chunks and chunk_vectors but never populates the chunk_tags table, so the new
NOT EXISTS (...) FROM chunk_tags predicate used by hybrid_search/search_repo.py
is not exercised; update _insert_chunk to also insert one row per tag into
chunk_tags (associating chunk_id with each tag) when tags are provided so the
tag-side index is seeded and the NOT EXISTS branch is actually tested.

In `@tests/test_search_exact_chunk_id.py`:
- Line 128: The test currently accesses
mock_store.kg_hybrid_search.call_args.kwargs without verifying the mock was
invoked; change the assertion to first assert the mock was called (e.g., use
mock_store.kg_hybrid_search.assert_called() or assert
mock_store.kg_hybrid_search.called) and only then inspect call_args.kwargs to
check ["include_audit"] is True, or replace with
mock_store.kg_hybrid_search.assert_called_once() and assert the expected kwargs
via the mock's assert_called_once_with/inspect of call_args; this ensures a
clear failure message if kg_hybrid_search was never invoked.

---

Outside diff comments:
In `@src/brainlayer/mcp/search_handler.py`:
- Around line 174-199: The exact-chunk branch currently returns None for
audit-filtered hits which signals `_brain_search()` to treat it as a miss and
fall back to fuzzy/entity searches; change the branch so that when a chunk is
found but excluded (checks around include_audit and
`_is_audit_recursion_metadata`, or other exclusion checks like
superseded_by/aggregated_into/archived_at, tag/project/content_type mismatches)
you return an explicit empty exact-result sentinel (e.g., a module-level
constant like EMPTY_EXACT_RESULT or a distinct dict/object) instead of None so
`_brain_search()` can detect "intentional hidden exact hit" and avoid fan-out;
update code paths that inspect exact results to recognize this sentinel.

In `@src/brainlayer/search_repo.py`:
- Around line 394-415: The overfetch decision (needs_overfetch) ignores audit
filtering; include the audit flag so that when include_audit is False we also
over-fetch—update the needs_overfetch expression to include "or not
include_audit" (or equivalent) so effective_k is bumped when audit filtering can
discard neighbors, and make the identical change in the other copy of this logic
(the block around lines 735-749) so both semantic paths over-fetch when audit
filtering is active.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 48f221ce-ad4d-453f-b933-ced6053444a7

📥 Commits

Reviewing files that changed from the base of the PR and between 219f697 and 316769b.

📒 Files selected for processing (8)
  • brain-bar/Sources/BrainBar/BrainDatabase.swift
  • brain-bar/Sources/BrainBar/MCPRouter.swift
  • brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
  • src/brainlayer/mcp/__init__.py
  • src/brainlayer/mcp/search_handler.py
  • src/brainlayer/search_repo.py
  • tests/test_audit_recursion_filter.py
  • tests/test_search_exact_chunk_id.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.11)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

  • tests/test_audit_recursion_filter.py
  • src/brainlayer/mcp/search_handler.py
  • tests/test_search_exact_chunk_id.py
  • src/brainlayer/search_repo.py
  • src/brainlayer/mcp/__init__.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

  • src/brainlayer/mcp/search_handler.py
  • src/brainlayer/search_repo.py
  • src/brainlayer/mcp/__init__.py
🧠 Learnings (2)
📚 Learning: 2026-03-18T00:12:08.774Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:08.774Z
Learning: In Swift files under brain-bar/Sources/BrainBar, enforce that when a critical dependency like the database is nil due to startup ordering (socket before DB), any tool handler that accesses the database must throw an explicit error (e.g., ToolError.noDatabase) instead of returning a default/empty value. Do not allow silent defaults (e.g., guard let db else { return ... }). Flag patterns that silently return defaults when db is nil, as this masks startup timing issues. This guidance applies broadly to similar Swift files in the BrainBar module, not just this one location.

Applied to files:

  • brain-bar/Sources/BrainBar/MCPRouter.swift
  • brain-bar/Sources/BrainBar/BrainDatabase.swift
📚 Learning: 2026-03-29T18:45:40.988Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 133
File: brain-bar/Sources/BrainBar/BrainDatabase.swift:0-0
Timestamp: 2026-03-29T18:45:40.988Z
Learning: In the BrainBar module’s Swift database layer (notably BrainDatabase.swift), ensure that the `search()` function’s `unreadOnly=true` path orders results by the delivery frontier cursor so the watermark `maxRowID` stays contiguous. Specifically, when `unreadOnly` is enabled, the query must include `ORDER BY c.rowid ASC` (e.g., via `let orderByClause = unreadOnly ? "c.rowid ASC" : "f.rank"`). Do not replace the unread-only ordering with relevance-based sorting (e.g., `f.rank`) unconditionally or for the unread-only path, as it can introduce gaps in the watermark and incorrectly mark unseen rows as delivered. Flag any future change to the `ORDER BY` clause in this function that makes relevance sorting apply to the unread-only case.

Applied to files:

  • brain-bar/Sources/BrainBar/MCPRouter.swift
  • brain-bar/Sources/BrainBar/BrainDatabase.swift
🔇 Additional comments (2)
src/brainlayer/mcp/__init__.py (1)

508-512: LGTM — include_audit schema and routing look correct.

False (Python) serializes to JSON false consistently with existing boolean defaults (confirm, stats). Both brain_search and brain_recall handlers correctly default to False via .get("include_audit", False). The deprecated brainlayer_search alias intentionally omits the flag (excluded audit is the safe default, and the alias is deprecated with no schema exposure of the field).

Also applies to: 800-804, 1231-1231, 1311-1311

brain-bar/Sources/BrainBar/MCPRouter.swift (1)

228-228: LGTM — includeAudit extraction, pass-through, and schema entry are consistent with existing patterns.

The ?? false default is correct, the named-argument pass-through to db.search(...) aligns with the rest of the call site, and omitting a "default" key from the schema dict follows the same convention as "unread_only" and "stats". The guard let db = database else { throw ToolError.noDatabase } guard at line 243 remains in place.

Also applies to: 268-269, 838-838

Comment thread brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
Comment thread tests/test_audit_recursion_filter.py
Comment thread tests/test_search_exact_chunk_id.py
@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 3, 2026

@codex review
@cursor review
@BugBot review

Final follow-up pushed: exact chunk/entity edge paths fixed; r0x matcher narrowed; formatter CI fixed.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/brainlayer/mcp/search_handler.py
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 3, 2026

@codex review
@cursor review
@BugBot review

Final review-fix pushed: hidden exact chunk-id queries no longer fan out, KNN overfetch accounts for audit filtering, and tests now seed chunk_tags/verify opt-in ordinary results.

Comment thread src/brainlayer/search_repo.py
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 3, 2026

@codex review
@cursor review
@BugBot review

Review-fix pushed at eb29a30: legacy readonly DBs without chunk_tags now fall back to JSON tag filtering, internal brain_search recall/think routes forward include_audit, and KNN overfetch no longer triggers solely for default audit filtering.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/brainlayer/search_repo.py (1)

425-431: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Audit exclusion needs to trigger KNN overfetch.

Line 425 and Line 762 still compute effective_k as if no extra post-KNN filter was added. Because the audit predicate is applied after candidate selection, nearest-neighbor queries can return fewer than n_results whenever the top candidates are mostly audit-tagged.

Suggested fix
-            needs_overfetch = (
-                entity_id
-                or (source_filter and source_filter != "claude_code")
-                or source_filter_like
-                or correction_category
-            )
+            needs_overfetch = (
+                entity_id
+                or (source_filter and source_filter != "claude_code")
+                or source_filter_like
+                or correction_category
+                or not include_audit
+            )
-        needs_overfetch = (
-            entity_id or (source_filter and source_filter != "claude_code") or source_filter_like or correction_category
-        )
+        needs_overfetch = (
+            entity_id
+            or (source_filter and source_filter != "claude_code")
+            or source_filter_like
+            or correction_category
+            or not include_audit
+        )

Also applies to: 762-765

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/search_repo.py` around lines 425 - 431, The computed
effective_k (variable effective_k) doesn't account for the post-KNN audit
exclusion filter, so add the audit-exclusion predicate to the needs_overfetch
boolean (alongside entity_id, source_filter, source_filter_like,
correction_category) so that overfetching occurs when audit filtering will be
applied; update both occurrences (the block using needs_overfetch and
effective_k around the first diff and the similar block near lines 762-765) so
effective_k = min(n_results * 10, 1000) when the audit exclusion predicate is
true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/brainlayer/search_repo.py`:
- Around line 41-45: AUDIT_RECURSION_TAG_PATTERNS currently misses exact 'r0x'
literal matches so chunks tagged exactly "r01"/"r02" still pass; update
AUDIT_RECURSION_TAG_PATTERNS to include an explicit equality pattern that
matches r0 followed by a single digit (e.g. an SQL equality/GLOB that targets
the entire tag, referenced via tag_expr) and mirror the same exact-match check
in the metadata-fallback logic (the code that inspects tags in plain Python
around lines 149-161) so those literal r0x tags are filtered out in both the SQL
filter and the metadata fallback.

In `@tests/test_audit_recursion_filter.py`:
- Around line 17-24: Tests only seed chunk_vectors but not chunk_vectors_binary,
so VectorStore.hybrid_search() still takes the FTS path; add an INSERT into
chunk_vectors_binary for each chunk so the binary KNN path (_binary_search) is
exercised. Specifically, after the existing INSERT INTO chunk_vectors, also
execute an INSERT INTO chunk_vectors_binary (chunk_id, embedding) VALUES (?, ?)
using the same chunk_id and the binary-serialized embedding (use the project’s
binary serializer helper if available, e.g., serialize_f32_binary or convert the
float array to the required blob) before inserting tags so hybrid_search() can
prefer the binary path.

---

Outside diff comments:
In `@src/brainlayer/search_repo.py`:
- Around line 425-431: The computed effective_k (variable effective_k) doesn't
account for the post-KNN audit exclusion filter, so add the audit-exclusion
predicate to the needs_overfetch boolean (alongside entity_id, source_filter,
source_filter_like, correction_category) so that overfetching occurs when audit
filtering will be applied; update both occurrences (the block using
needs_overfetch and effective_k around the first diff and the similar block near
lines 762-765) so effective_k = min(n_results * 10, 1000) when the audit
exclusion predicate is true.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a89861a6-ddfe-4d54-aebb-9c519269de3c

📥 Commits

Reviewing files that changed from the base of the PR and between 316769b and eb29a30.

📒 Files selected for processing (7)
  • brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
  • src/brainlayer/engine.py
  • src/brainlayer/mcp/search_handler.py
  • src/brainlayer/search_repo.py
  • src/brainlayer/vector_store.py
  • tests/test_audit_recursion_filter.py
  • tests/test_search_exact_chunk_id.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: Macroscope - Correctness Check
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

  • src/brainlayer/engine.py
  • tests/test_search_exact_chunk_id.py
  • src/brainlayer/search_repo.py
  • src/brainlayer/vector_store.py
  • src/brainlayer/mcp/search_handler.py
  • tests/test_audit_recursion_filter.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

  • src/brainlayer/engine.py
  • src/brainlayer/search_repo.py
  • src/brainlayer/vector_store.py
  • src/brainlayer/mcp/search_handler.py

Comment thread src/brainlayer/search_repo.py
Comment thread tests/test_audit_recursion_filter.py
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/brainlayer/search_repo.py
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 3, 2026

@codex review
@cursor review
@BugBot review

Latest review-fix pushed at 4a8c8ee: audit-recursion shorthand r0x is filtered, tests now build the binary index, and KNN overfetch is restored for default audit filtering with an audit-heavy binary-path regression.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4a8c8ee. Configure here.

OR \(tagValue) GLOB 'r0[0-9]'
)
"""
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swift audit filter missing r0x tag exclusion

Medium Severity

The Swift auditRecursionTagExclusionSQL only checks LIKE '%audit%', = 'agent=auditor', and GLOB 'r0[0-9]'. It does not exclude chunks tagged r0x because 'x' is not a digit and "r0x" doesn't contain "audit". The Python counterpart explicitly includes = 'r0x' in AUDIT_RECURSION_TAG_PATTERNS. Chunks tagged solely with r0x will leak through in Swift but be correctly filtered in Python.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4a8c8ee. Configure here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a8c8ee8c1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

FROM json_each(\(tagsJSON)) audit_tags
WHERE \(tagValue) LIKE '%audit%'
OR \(tagValue) = 'agent=auditor'
OR \(tagValue) GLOB 'r0[0-9]'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add r0x shorthand to Swift audit exclusion

The default BrainBar filter claims to exclude r02/r0x audit recursion tags, but this SQL only matches GLOB 'r0[0-9]', so chunks tagged exactly r0x still leak into default brain_search results when include_audit is false. This creates an inconsistent contract versus the Python path (which does filter r0x) and allows the audit-recursion contamination this change is meant to prevent.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant