fix: filter audit recursion from default search by EtanHey · Pull Request #277 · EtanHey/brainlayer

EtanHey · 2026-05-03T11:36:14Z

Summary

Exclude audit/eval memories from default Python hybrid/search paths using tag-based detection for audit, r02/r0x, audit-pollution-source, and agent=auditor.
Add include_audit opt-in to brain_search/brain_recall and document the default filter in tool descriptions.
Mirror the default audit-recursion exclusion in BrainBar Swift search and add MCP regression coverage.

Tests

uv run pytest tests/test_audit_recursion_filter.py -q
swift test --package-path brain-bar --filter MCPRouterTests/testBrainSearchExcludesAuditRecursionByDefaultAndAllowsOptIn
swift test --package-path brain-bar
./scripts/run_tests.sh
Pre-push hook reran ./scripts/run_tests.sh

Notes

CodeRabbit CLI local review could not start due account hourly cap / usage-based add-on limit.
Follow-up to R02 audit-recursion contamination: default retrieval should stop returning auditor prompt chunks unless include_audit=true.

Note

Medium Risk
Changes default retrieval behavior across Swift and Python by filtering out audit/eval-tagged chunks, which can affect recall quality and any workflows that relied on those results. Adds new SQL predicates and tag-detection logic (with legacy DB fallbacks) that could impact query performance or edge-case matching.

Overview
Default search now excludes audit/eval “recursion” chunks unless callers pass include_audit=true/false (Swift/Python).

This threads a new include_audit flag through brain_search/brain_recall (tool schemas + router/handlers) and the underlying retrieval paths (hybrid_search, think, recall, plus Swift BrainDatabase.search/FTS and exact chunk-id lookup). Filtering is implemented via tag-based predicates (e.g., audit, agent=auditor, r0[0-9]) with a chunk_tags-table fast path and a JSON-tag fallback for legacy/readonly DBs.

Adds regression coverage verifying default exclusion, explicit opt-in inclusion, and avoiding false positives for tags containing r0x substrings, and adjusts exact chunk-id lookups to return an empty structured result when filtered rather than short-circuiting with None.

^{Reviewed by Cursor Bugbot for commit 4a8c8ee. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Filter audit recursion tags from default search results across all search paths

Adds an include_audit parameter (default false) to all search entry points — hybrid_search, search, think, recall, and exact chunk-id lookup — excluding chunks tagged with audit/eval markers (audit, agent=auditor, r0[0-9], r0x) unless explicitly opted in.
Introduces SQL exclusion predicates in search_repo.py using a NOT EXISTS / json_each approach, with a faster path via a dedicated chunk_tags table when available.
Mirrors the same filtering in the Swift BrainDatabase via a static auditRecursionTagExclusionSQL helper applied to both FTS and exact-ID search paths.
Exposes include_audit as an optional boolean in the brain_search and brain_recall MCP tool schemas for both the Python and Swift layers.
Behavioral Change: search results now exclude audit/eval-tagged chunks by default; callers that previously relied on these chunks appearing in results must explicitly pass include_audit: true.

^{Macroscope summarized 4a8c8ee.}

Summary by CodeRabbit

New Features
- Search results now exclude audit/evaluation memories by default for cleaner, more relevant results; an opt-in flag lets you include them when needed (applies to search, recall, and think flows and tool calls).
Tests
- Added tests to verify default exclusion, opt-in inclusion, and correct handling of similar tag patterns.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-05-03T11:36:27Z

📝 Walkthrough

Walkthrough

Search APIs gain an opt-in include_audit flag (default false) that suppresses audit-recursion–tagged chunks unless enabled; the flag is threaded through MCP/tool schemas, routing, engine/handler logic, vector-store search queries, SQL exclusion helpers, and tests.

Changes

Audit Recursion Filtering Throughout Search Pipeline

Layer / File(s)	Summary
Tool Schema & Router `src/brainlayer/mcp/__init__.py`, `brain-bar/Sources/BrainBar/MCPRouter.swift`	Adds `include_audit: boolean` to `brain_search`/`brain_recall` tool schemas and parses/forwards the flag from MCP router into internal recall/search paths.
API Signatures / Wiring `brain-bar/Sources/BrainBar/BrainDatabase.swift`, `src/brainlayer/mcp/search_handler.py`, `src/brainlayer/engine.py`, `src/brainlayer/search_repo.py`	Adds `includeAudit`/`include_audit` parameters to public/internal search methods (`search`, `searchCandidates`, `_brain_search`, `_brain_recall`, `_search`, `think`, `recall`, `SearchMixin.hybrid_search`, `_binary_search`, etc.) and threads the flag through call chains.
Audit Detection Helpers `src/brainlayer/search_repo.py`, `brain-bar/Sources/BrainBar/BrainDatabase.swift`	Introduces audit-recursion detection constants and helpers (`AUDIT_RECURSION_TAG_PATTERNS`, `_audit_recursion_exclusion_sql`, `auditRecursionTagExclusionSQL(alias:)`, `_is_audit_recursion_metadata`) and a SQL `NOT EXISTS` fragment generator for excluding audit-tagged chunks.
SQL/Query Changes `src/brainlayer/search_repo.py`, `brain-bar/Sources/BrainBar/BrainDatabase.swift`	Appends audit-exclusion `NOT EXISTS (...)` to vector/FTS/binary WHERE clauses and FTS extra filters when `include_audit` is `false`; scopes hybrid cache keys by `include_audit`; adds post-RRF metadata filtering when needed.
VectorStore Availability Tracking `src/brainlayer/vector_store.py`	Tracks presence of `chunk_tags` table to choose between tag-table vs JSON tag exclusion SQL paths.
Tests `brain-bar/Tests/BrainBarTests/MCPRouterTests.swift`, `tests/test_audit_recursion_filter.py`, `tests/test_search_exact_chunk_id.py`	Adds tests verifying default exclusion of audit-recursion chunks, opt-in inclusion via `include_audit`, correct handling of `r0x`-like substrings, JSON fallback for legacy readonly DB, and forwarding of `include_audit` through engine/handler/store calls.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Router as MCP Router
    participant Handler as Search Handler
    participant Repo as Search Repo
    participant Store as VectorStore/DB

    Client->>Router: brain_search(query, include_audit=true)
    Router->>Handler: _brain_search(query, include_audit=true)
    Handler->>Repo: _search(query, include_audit=true)
    Repo->>Store: hybrid_search(query, include_audit=true)
    Note over Store: If include_audit == false,\napply NOT EXISTS audit-exclusion SQL\n(or JSON-tag fallback)
    Store-->>Repo: results (filtered or full)
    Repo-->>Handler: post-RRF filter applied if needed
    Handler-->>Router: formatted response
    Router-->>Client: search results

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

feat: search quality improvements + Groq rate limiter #68 — touches same search control paths and hybrid_search signatures; likely overlapping logic changes.
feat(brainbar): port MCP parity tools #270 — modifies BrainDatabase search surface and MCP routing; related to threading new search flags.
fix: BrainBar write retry cap + search filter params (C1/H3) #91 — changes BrainDatabase.search and MCPRouter search handling similar to this PR's signature/routing edits.

Poem

🐇 A hop, a flag, a careful trace,

Hidden audits find a quieter place.
Flip the switch and let them show,
Search remembers what you want to know.
🥕🔎

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.94% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: filter audit recursion from default search' directly and clearly summarizes the main change across the codebase—introducing audit recursion filtering as the default behavior with an opt-in flag to include them.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/audit-recursion-filter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

EtanHey · 2026-05-03T11:36:29Z

@codex review\n@cursor review\n@bugbot review

chatgpt-codex-connector

💡 Codex Review

brainlayer/src/brainlayer/mcp/search_handler.py

Lines 509 to 510 in 110f436

    
           exact_chunk_hit = _exact_chunk_lookup_result( 
        
               query,

Apply include_audit filter to exact chunk-id lookups

_brain_search now defaults include_audit to false, but chunk-id-shaped queries still short-circuit through _exact_chunk_lookup_result before reaching _search(..., include_audit=...). That means an audit-tagged chunk can still be returned by default when the query equals its chunk ID, which breaks the new “exclude audit/eval by default” contract and can reintroduce audit-recursion contamination through direct-id retrieval.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-03T11:40:27Z

+AUDIT_RECURSION_TAG_PATTERNS = (
+    "LOWER(tag) LIKE '%audit%'",
+    "LOWER(tag) = 'r02'",
+    "LOWER(tag) GLOB '*r0[0-9]*'",


Restrict r0x audit tag match to whole tags

The new audit filter uses LOWER(tag) GLOB '*r0[0-9]*', which matches any tag containing r0<digit> as a substring, not just dedicated audit tags (for example, tags like error01 or mirror07). Because this predicate is applied across default search paths, unrelated memories can be silently excluded and retrieval quality regresses; the match should be constrained to exact audit-tag forms.

Useful? React with 👍 / 👎.

EtanHey · 2026-05-03T11:52:32Z

@codex review
@cursor review
@BugBot review

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 316769ba1f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-03T11:56:46Z

+    "SELECT 1 FROM chunk_tags audit_tags "
+    "WHERE audit_tags.chunk_id = {chunk_id_expr} "
+    f"AND ({' OR '.join(AUDIT_RECURSION_TAG_PATTERNS)})"


Guard audit-tag filter when chunk_tags table is unavailable

This new default filter hard-depends on chunk_tags, but VectorStore._init_readonly_db() explicitly skips migrations, so opening a legacy read-only DB that predates chunk_tags will now make default search fail with no such table: chunk_tags (because include_audit defaults to False and always injects this predicate). That turns previously working read-only retrieval into an error path unless callers know to opt into include_audit=true, so the filter should be conditional on table existence or fall back to chunks.tags JSON scanning.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/brainlayer/mcp/search_handler.py (1)

174-199: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop exact chunk-id routing after an audit-filtered hit.

Returning None here tells _brain_search() the exact lookup missed, so a chunk-id-shaped query for an excluded audit chunk can still fan out into fuzzy hybrid/entity search and produce unrelated results. Once the chunk exists and is intentionally hidden, this path should return an explicit empty exact-result sentinel instead of falling through.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/mcp/search_handler.py` around lines 174 - 199, The exact-chunk
branch currently returns None for audit-filtered hits which signals
`_brain_search()` to treat it as a miss and fall back to fuzzy/entity searches;
change the branch so that when a chunk is found but excluded (checks around
include_audit and `_is_audit_recursion_metadata`, or other exclusion checks like
superseded_by/aggregated_into/archived_at, tag/project/content_type mismatches)
you return an explicit empty exact-result sentinel (e.g., a module-level
constant like EMPTY_EXACT_RESULT or a distinct dict/object) instead of None so
`_brain_search()` can detect "intentional hidden exact hit" and avoid fan-out;
update code paths that inspect exact results to recognize this sentinel.

src/brainlayer/search_repo.py (1)

394-415: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Over-fetch KNN candidates when audit filtering is active.

include_audit=False adds another filter that can discard nearest neighbors after the vector match, but needs_overfetch still ignores it. If a query is close to many audit-tagged chunks, both semantic paths can return fewer than n_results non-audit candidates even though valid matches exist.

Suggested fix

-            needs_overfetch = (
-                entity_id
-                or (source_filter and source_filter != "claude_code")
-                or source_filter_like
-                or correction_category
-            )
+            needs_overfetch = (
+                entity_id
+                or (source_filter and source_filter != "claude_code")
+                or source_filter_like
+                or correction_category
+                or not include_audit
+            )

-        needs_overfetch = (
-            entity_id or (source_filter and source_filter != "claude_code") or source_filter_like or correction_category
-        )
+        needs_overfetch = (
+            entity_id
+            or (source_filter and source_filter != "claude_code")
+            or source_filter_like
+            or correction_category
+            or not include_audit
+        )

Also applies to: 735-749

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/search_repo.py` around lines 394 - 415, The overfetch decision
(needs_overfetch) ignores audit filtering; include the audit flag so that when
include_audit is False we also over-fetch—update the needs_overfetch expression
to include "or not include_audit" (or equivalent) so effective_k is bumped when
audit filtering can discard neighbors, and make the identical change in the
other copy of this logic (the block around lines 735-749) so both semantic paths
over-fetch when audit filtering is active.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@brain-bar/Tests/BrainBarTests/MCPRouterTests.swift`:
- Around line 618-630: The opt-in test currently only checks that an audit chunk
appears (via optInResponse/optInText) and misses verifying ordinary result
chunks; update the test that calls router.handle with include_audit=true to also
assert that a normal (non-audit) chunk is still present—e.g., inspect
optInResponse["result"] -> "content" array (same as optInText extraction) and
assert there exists at least one item whose "text" does NOT contain the audit
marker (e.g., does not contain "audit-recurs"), in addition to the existing
XCTAssertTrue that checks for the audit chunk; keep using the same router.handle
call and optInResponse/optInText variables to locate and assert both conditions.

In `@tests/test_audit_recursion_filter.py`:
- Around line 7-19: The test helper _insert_chunk currently writes rows to
chunks and chunk_vectors but never populates the chunk_tags table, so the new
NOT EXISTS (...) FROM chunk_tags predicate used by hybrid_search/search_repo.py
is not exercised; update _insert_chunk to also insert one row per tag into
chunk_tags (associating chunk_id with each tag) when tags are provided so the
tag-side index is seeded and the NOT EXISTS branch is actually tested.

In `@tests/test_search_exact_chunk_id.py`:
- Line 128: The test currently accesses
mock_store.kg_hybrid_search.call_args.kwargs without verifying the mock was
invoked; change the assertion to first assert the mock was called (e.g., use
mock_store.kg_hybrid_search.assert_called() or assert
mock_store.kg_hybrid_search.called) and only then inspect call_args.kwargs to
check ["include_audit"] is True, or replace with
mock_store.kg_hybrid_search.assert_called_once() and assert the expected kwargs
via the mock's assert_called_once_with/inspect of call_args; this ensures a
clear failure message if kg_hybrid_search was never invoked.

---

Outside diff comments:
In `@src/brainlayer/mcp/search_handler.py`:
- Around line 174-199: The exact-chunk branch currently returns None for
audit-filtered hits which signals `_brain_search()` to treat it as a miss and
fall back to fuzzy/entity searches; change the branch so that when a chunk is
found but excluded (checks around include_audit and
`_is_audit_recursion_metadata`, or other exclusion checks like
superseded_by/aggregated_into/archived_at, tag/project/content_type mismatches)
you return an explicit empty exact-result sentinel (e.g., a module-level
constant like EMPTY_EXACT_RESULT or a distinct dict/object) instead of None so
`_brain_search()` can detect "intentional hidden exact hit" and avoid fan-out;
update code paths that inspect exact results to recognize this sentinel.

In `@src/brainlayer/search_repo.py`:
- Around line 394-415: The overfetch decision (needs_overfetch) ignores audit
filtering; include the audit flag so that when include_audit is False we also
over-fetch—update the needs_overfetch expression to include "or not
include_audit" (or equivalent) so effective_k is bumped when audit filtering can
discard neighbors, and make the identical change in the other copy of this logic
(the block around lines 735-749) so both semantic paths over-fetch when audit
filtering is active.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 48f221ce-ad4d-453f-b933-ced6053444a7

📥 Commits

Reviewing files that changed from the base of the PR and between 219f697 and 316769b.

📒 Files selected for processing (8)

brain-bar/Sources/BrainBar/BrainDatabase.swift
brain-bar/Sources/BrainBar/MCPRouter.swift
brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
src/brainlayer/mcp/__init__.py
src/brainlayer/mcp/search_handler.py
src/brainlayer/search_repo.py
tests/test_audit_recursion_filter.py
tests/test_search_exact_chunk_id.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Cursor Bugbot
GitHub Check: test (3.12)
GitHub Check: test (3.13)
GitHub Check: test (3.11)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

tests/test_audit_recursion_filter.py
src/brainlayer/mcp/search_handler.py
tests/test_search_exact_chunk_id.py
src/brainlayer/search_repo.py
src/brainlayer/mcp/__init__.py

src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

src/brainlayer/mcp/search_handler.py
src/brainlayer/search_repo.py
src/brainlayer/mcp/__init__.py

🧠 Learnings (2)

📚 Learning: 2026-03-18T00:12:08.774Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:08.774Z
Learning: In Swift files under brain-bar/Sources/BrainBar, enforce that when a critical dependency like the database is nil due to startup ordering (socket before DB), any tool handler that accesses the database must throw an explicit error (e.g., ToolError.noDatabase) instead of returning a default/empty value. Do not allow silent defaults (e.g., guard let db else { return ... }). Flag patterns that silently return defaults when db is nil, as this masks startup timing issues. This guidance applies broadly to similar Swift files in the BrainBar module, not just this one location.

Applied to files:

brain-bar/Sources/BrainBar/MCPRouter.swift
brain-bar/Sources/BrainBar/BrainDatabase.swift

📚 Learning: 2026-03-29T18:45:40.988Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 133
File: brain-bar/Sources/BrainBar/BrainDatabase.swift:0-0
Timestamp: 2026-03-29T18:45:40.988Z
Learning: In the BrainBar module’s Swift database layer (notably BrainDatabase.swift), ensure that the `search()` function’s `unreadOnly=true` path orders results by the delivery frontier cursor so the watermark `maxRowID` stays contiguous. Specifically, when `unreadOnly` is enabled, the query must include `ORDER BY c.rowid ASC` (e.g., via `let orderByClause = unreadOnly ? "c.rowid ASC" : "f.rank"`). Do not replace the unread-only ordering with relevance-based sorting (e.g., `f.rank`) unconditionally or for the unread-only path, as it can introduce gaps in the watermark and incorrectly mark unseen rows as delivered. Flag any future change to the `ORDER BY` clause in this function that makes relevance sorting apply to the unread-only case.

Applied to files:

brain-bar/Sources/BrainBar/MCPRouter.swift
brain-bar/Sources/BrainBar/BrainDatabase.swift

🔇 Additional comments (2)

src/brainlayer/mcp/__init__.py (1)

508-512: LGTM — include_audit schema and routing look correct.

False (Python) serializes to JSON false consistently with existing boolean defaults (confirm, stats). Both brain_search and brain_recall handlers correctly default to False via .get("include_audit", False). The deprecated brainlayer_search alias intentionally omits the flag (excluded audit is the safe default, and the alias is deprecated with no schema exposure of the field).

Also applies to: 800-804, 1231-1231, 1311-1311

brain-bar/Sources/BrainBar/MCPRouter.swift (1)

228-228: LGTM — includeAudit extraction, pass-through, and schema entry are consistent with existing patterns.

The ?? false default is correct, the named-argument pass-through to db.search(...) aligns with the rest of the call site, and omitting a "default" key from the schema dict follows the same convention as "unread_only" and "stats". The guard let db = database else { throw ToolError.noDatabase } guard at line 243 remains in place.

Also applies to: 268-269, 838-838

EtanHey · 2026-05-03T12:27:05Z

@codex review
@cursor review
@BugBot review

Final follow-up pushed: exact chunk/entity edge paths fixed; r0x matcher narrowed; formatter CI fixed.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

chatgpt-codex-connector · 2026-05-03T12:31:20Z

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

EtanHey · 2026-05-03T12:44:11Z

@codex review
@cursor review
@BugBot review

Final review-fix pushed: hidden exact chunk-id queries no longer fan out, KNN overfetch accounts for audit filtering, and tests now seed chunk_tags/verify opt-in ordinary results.

chatgpt-codex-connector · 2026-05-03T12:50:25Z

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

EtanHey · 2026-05-03T13:05:01Z

@codex review
@cursor review
@BugBot review

Review-fix pushed at eb29a30: legacy readonly DBs without chunk_tags now fall back to JSON tag filtering, internal brain_search recall/think routes forward include_audit, and KNN overfetch no longer triggers solely for default audit filtering.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/brainlayer/search_repo.py (1)

425-431: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Audit exclusion needs to trigger KNN overfetch.

Line 425 and Line 762 still compute effective_k as if no extra post-KNN filter was added. Because the audit predicate is applied after candidate selection, nearest-neighbor queries can return fewer than n_results whenever the top candidates are mostly audit-tagged.

Suggested fix

-            needs_overfetch = (
-                entity_id
-                or (source_filter and source_filter != "claude_code")
-                or source_filter_like
-                or correction_category
-            )
+            needs_overfetch = (
+                entity_id
+                or (source_filter and source_filter != "claude_code")
+                or source_filter_like
+                or correction_category
+                or not include_audit
+            )

-        needs_overfetch = (
-            entity_id or (source_filter and source_filter != "claude_code") or source_filter_like or correction_category
-        )
+        needs_overfetch = (
+            entity_id
+            or (source_filter and source_filter != "claude_code")
+            or source_filter_like
+            or correction_category
+            or not include_audit
+        )

Also applies to: 762-765

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/search_repo.py` around lines 425 - 431, The computed
effective_k (variable effective_k) doesn't account for the post-KNN audit
exclusion filter, so add the audit-exclusion predicate to the needs_overfetch
boolean (alongside entity_id, source_filter, source_filter_like,
correction_category) so that overfetching occurs when audit filtering will be
applied; update both occurrences (the block using needs_overfetch and
effective_k around the first diff and the similar block near lines 762-765) so
effective_k = min(n_results * 10, 1000) when the audit exclusion predicate is
true.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/brainlayer/search_repo.py`:
- Around line 41-45: AUDIT_RECURSION_TAG_PATTERNS currently misses exact 'r0x'
literal matches so chunks tagged exactly "r01"/"r02" still pass; update
AUDIT_RECURSION_TAG_PATTERNS to include an explicit equality pattern that
matches r0 followed by a single digit (e.g. an SQL equality/GLOB that targets
the entire tag, referenced via tag_expr) and mirror the same exact-match check
in the metadata-fallback logic (the code that inspects tags in plain Python
around lines 149-161) so those literal r0x tags are filtered out in both the SQL
filter and the metadata fallback.

In `@tests/test_audit_recursion_filter.py`:
- Around line 17-24: Tests only seed chunk_vectors but not chunk_vectors_binary,
so VectorStore.hybrid_search() still takes the FTS path; add an INSERT into
chunk_vectors_binary for each chunk so the binary KNN path (_binary_search) is
exercised. Specifically, after the existing INSERT INTO chunk_vectors, also
execute an INSERT INTO chunk_vectors_binary (chunk_id, embedding) VALUES (?, ?)
using the same chunk_id and the binary-serialized embedding (use the project’s
binary serializer helper if available, e.g., serialize_f32_binary or convert the
float array to the required blob) before inserting tags so hybrid_search() can
prefer the binary path.

---

Outside diff comments:
In `@src/brainlayer/search_repo.py`:
- Around line 425-431: The computed effective_k (variable effective_k) doesn't
account for the post-KNN audit exclusion filter, so add the audit-exclusion
predicate to the needs_overfetch boolean (alongside entity_id, source_filter,
source_filter_like, correction_category) so that overfetching occurs when audit
filtering will be applied; update both occurrences (the block using
needs_overfetch and effective_k around the first diff and the similar block near
lines 762-765) so effective_k = min(n_results * 10, 1000) when the audit
exclusion predicate is true.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a89861a6-ddfe-4d54-aebb-9c519269de3c

📥 Commits

Reviewing files that changed from the base of the PR and between 316769b and eb29a30.

📒 Files selected for processing (7)

brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
src/brainlayer/engine.py
src/brainlayer/mcp/search_handler.py
src/brainlayer/search_repo.py
src/brainlayer/vector_store.py
tests/test_audit_recursion_filter.py
tests/test_search_exact_chunk_id.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Cursor Bugbot
GitHub Check: test (3.13)
GitHub Check: test (3.12)
GitHub Check: test (3.11)
GitHub Check: Macroscope - Correctness Check

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

src/brainlayer/engine.py
tests/test_search_exact_chunk_id.py
src/brainlayer/search_repo.py
src/brainlayer/vector_store.py
src/brainlayer/mcp/search_handler.py
tests/test_audit_recursion_filter.py

src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

src/brainlayer/engine.py
src/brainlayer/search_repo.py
src/brainlayer/vector_store.py
src/brainlayer/mcp/search_handler.py

chatgpt-codex-connector · 2026-05-03T13:10:35Z

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

EtanHey · 2026-05-03T13:47:50Z

@codex review
@cursor review
@BugBot review

Latest review-fix pushed at 4a8c8ee: audit-recursion shorthand r0x is filtered, tests now build the binary index, and KNN overfetch is restored for default audit filtering with an audit-heavy binary-path regression.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 4a8c8ee. Configure here.}

cursor · 2026-05-03T13:50:54Z

+               OR \(tagValue) GLOB 'r0[0-9]'
+        )
+        """
+    }


Swift audit filter missing r0x tag exclusion

Medium Severity

The Swift auditRecursionTagExclusionSQL only checks LIKE '%audit%', = 'agent=auditor', and GLOB 'r0[0-9]'. It does not exclude chunks tagged r0x because 'x' is not a digit and "r0x" doesn't contain "audit". The Python counterpart explicitly includes = 'r0x' in AUDIT_RECURSION_TAG_PATTERNS. Chunks tagged solely with r0x will leak through in Swift but be correctly filtered in Python.

^{Reviewed by Cursor Bugbot for commit 4a8c8ee. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a8c8ee8c1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-03T13:50:56Z

+            FROM json_each(\(tagsJSON)) audit_tags
+            WHERE \(tagValue) LIKE '%audit%'
+               OR \(tagValue) = 'agent=auditor'
+               OR \(tagValue) GLOB 'r0[0-9]'


Add r0x shorthand to Swift audit exclusion

The default BrainBar filter claims to exclude r02/r0x audit recursion tags, but this SQL only matches GLOB 'r0[0-9]', so chunks tagged exactly r0x still leak into default brain_search results when include_audit is false. This creates an inconsistent contract versus the Python path (which does filter r0x) and allows the audit-recursion contamination this change is meant to prevent.

Useful? React with 👍 / 👎.

fix: filter audit recursion from default search

110f436

greptile-apps Bot reviewed May 3, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 3, 2026

View reviewed changes

cursor Bot reviewed May 3, 2026

View reviewed changes

Comment thread src/brainlayer/mcp/search_handler.py

Comment thread src/brainlayer/mcp/search_handler.py

fix: honor audit filter in edge search paths

316769b

greptile-apps Bot reviewed May 3, 2026

View reviewed changes

chore: format search handler

1605ea5

chatgpt-codex-connector Bot reviewed May 3, 2026

View reviewed changes

cursor Bot reviewed May 3, 2026

View reviewed changes

Comment thread src/brainlayer/search_repo.py Outdated

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

Comment thread brain-bar/Tests/BrainBarTests/MCPRouterTests.swift

Comment thread tests/test_audit_recursion_filter.py

Comment thread tests/test_search_exact_chunk_id.py

greptile-apps Bot reviewed May 3, 2026

View reviewed changes

macroscopeapp Bot reviewed May 3, 2026

View reviewed changes

Comment thread src/brainlayer/mcp/search_handler.py

fix: address audit filter review gaps

70bdc7e

greptile-apps Bot reviewed May 3, 2026

View reviewed changes

cursor Bot reviewed May 3, 2026

View reviewed changes

Comment thread src/brainlayer/search_repo.py

fix: handle audit filter review edge cases

eb29a30

greptile-apps Bot reviewed May 3, 2026

View reviewed changes

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

Comment thread src/brainlayer/search_repo.py

Comment thread tests/test_audit_recursion_filter.py

cursor Bot reviewed May 3, 2026

View reviewed changes

Comment thread src/brainlayer/search_repo.py

fix: preserve results under audit-heavy knn

4a8c8ee

greptile-apps Bot reviewed May 3, 2026

View reviewed changes

cursor Bot reviewed May 3, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 3, 2026

View reviewed changes

Conversation

EtanHey commented May 3, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Notes

Filter audit recursion tags from default search results across all search paths

Summary by CodeRabbit

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

EtanHey commented May 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

EtanHey commented May 3, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EtanHey commented May 3, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented May 3, 2026

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

EtanHey commented May 3, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented May 3, 2026

Uh oh!

EtanHey commented May 3, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented May 3, 2026

Uh oh!

EtanHey commented May 3, 2026 •

edited by macroscopeapp Bot

Loading

coderabbitai Bot commented May 3, 2026 •

edited

Loading