fix: filter audit recursion from default search#277
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 WalkthroughWalkthroughSearch APIs gain an opt-in ChangesAudit Recursion Filtering Throughout Search Pipeline
Sequence DiagramsequenceDiagram
participant Client as Client
participant Router as MCP Router
participant Handler as Search Handler
participant Repo as Search Repo
participant Store as VectorStore/DB
Client->>Router: brain_search(query, include_audit=true)
Router->>Handler: _brain_search(query, include_audit=true)
Handler->>Repo: _search(query, include_audit=true)
Repo->>Store: hybrid_search(query, include_audit=true)
Note over Store: If include_audit == false,\napply NOT EXISTS audit-exclusion SQL\n(or JSON-tag fallback)
Store-->>Repo: results (filtered or full)
Repo-->>Handler: post-RRF filter applied if needed
Handler-->>Router: formatted response
Router-->>Client: search results
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@codex review\n@cursor review\n@bugbot review |
There was a problem hiding this comment.
💡 Codex Review
brainlayer/src/brainlayer/mcp/search_handler.py
Lines 509 to 510 in 110f436
_brain_search now defaults include_audit to false, but chunk-id-shaped queries still short-circuit through _exact_chunk_lookup_result before reaching _search(..., include_audit=...). That means an audit-tagged chunk can still be returned by default when the query equals its chunk ID, which breaks the new “exclude audit/eval by default” contract and can reintroduce audit-recursion contamination through direct-id retrieval.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| AUDIT_RECURSION_TAG_PATTERNS = ( | ||
| "LOWER(tag) LIKE '%audit%'", | ||
| "LOWER(tag) = 'r02'", | ||
| "LOWER(tag) GLOB '*r0[0-9]*'", |
There was a problem hiding this comment.
Restrict r0x audit tag match to whole tags
The new audit filter uses LOWER(tag) GLOB '*r0[0-9]*', which matches any tag containing r0<digit> as a substring, not just dedicated audit tags (for example, tags like error01 or mirror07). Because this predicate is applied across default search paths, unrelated memories can be silently excluded and retrieval quality regresses; the match should be constrained to exact audit-tag forms.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 316769ba1f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "SELECT 1 FROM chunk_tags audit_tags " | ||
| "WHERE audit_tags.chunk_id = {chunk_id_expr} " | ||
| f"AND ({' OR '.join(AUDIT_RECURSION_TAG_PATTERNS)})" |
There was a problem hiding this comment.
Guard audit-tag filter when chunk_tags table is unavailable
This new default filter hard-depends on chunk_tags, but VectorStore._init_readonly_db() explicitly skips migrations, so opening a legacy read-only DB that predates chunk_tags will now make default search fail with no such table: chunk_tags (because include_audit defaults to False and always injects this predicate). That turns previously working read-only retrieval into an error path unless callers know to opt into include_audit=true, so the filter should be conditional on table existence or fall back to chunks.tags JSON scanning.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/brainlayer/mcp/search_handler.py (1)
174-199:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStop exact chunk-id routing after an audit-filtered hit.
Returning
Nonehere tells_brain_search()the exact lookup missed, so a chunk-id-shaped query for an excluded audit chunk can still fan out into fuzzy hybrid/entity search and produce unrelated results. Once the chunk exists and is intentionally hidden, this path should return an explicit empty exact-result sentinel instead of falling through.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/mcp/search_handler.py` around lines 174 - 199, The exact-chunk branch currently returns None for audit-filtered hits which signals `_brain_search()` to treat it as a miss and fall back to fuzzy/entity searches; change the branch so that when a chunk is found but excluded (checks around include_audit and `_is_audit_recursion_metadata`, or other exclusion checks like superseded_by/aggregated_into/archived_at, tag/project/content_type mismatches) you return an explicit empty exact-result sentinel (e.g., a module-level constant like EMPTY_EXACT_RESULT or a distinct dict/object) instead of None so `_brain_search()` can detect "intentional hidden exact hit" and avoid fan-out; update code paths that inspect exact results to recognize this sentinel.src/brainlayer/search_repo.py (1)
394-415:⚠️ Potential issue | 🟠 Major | ⚡ Quick winOver-fetch KNN candidates when audit filtering is active.
include_audit=Falseadds another filter that can discard nearest neighbors after the vector match, butneeds_overfetchstill ignores it. If a query is close to many audit-tagged chunks, both semantic paths can return fewer thann_resultsnon-audit candidates even though valid matches exist.Suggested fix
- needs_overfetch = ( - entity_id - or (source_filter and source_filter != "claude_code") - or source_filter_like - or correction_category - ) + needs_overfetch = ( + entity_id + or (source_filter and source_filter != "claude_code") + or source_filter_like + or correction_category + or not include_audit + )- needs_overfetch = ( - entity_id or (source_filter and source_filter != "claude_code") or source_filter_like or correction_category - ) + needs_overfetch = ( + entity_id + or (source_filter and source_filter != "claude_code") + or source_filter_like + or correction_category + or not include_audit + )Also applies to: 735-749
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/search_repo.py` around lines 394 - 415, The overfetch decision (needs_overfetch) ignores audit filtering; include the audit flag so that when include_audit is False we also over-fetch—update the needs_overfetch expression to include "or not include_audit" (or equivalent) so effective_k is bumped when audit filtering can discard neighbors, and make the identical change in the other copy of this logic (the block around lines 735-749) so both semantic paths over-fetch when audit filtering is active.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@brain-bar/Tests/BrainBarTests/MCPRouterTests.swift`:
- Around line 618-630: The opt-in test currently only checks that an audit chunk
appears (via optInResponse/optInText) and misses verifying ordinary result
chunks; update the test that calls router.handle with include_audit=true to also
assert that a normal (non-audit) chunk is still present—e.g., inspect
optInResponse["result"] -> "content" array (same as optInText extraction) and
assert there exists at least one item whose "text" does NOT contain the audit
marker (e.g., does not contain "audit-recurs"), in addition to the existing
XCTAssertTrue that checks for the audit chunk; keep using the same router.handle
call and optInResponse/optInText variables to locate and assert both conditions.
In `@tests/test_audit_recursion_filter.py`:
- Around line 7-19: The test helper _insert_chunk currently writes rows to
chunks and chunk_vectors but never populates the chunk_tags table, so the new
NOT EXISTS (...) FROM chunk_tags predicate used by hybrid_search/search_repo.py
is not exercised; update _insert_chunk to also insert one row per tag into
chunk_tags (associating chunk_id with each tag) when tags are provided so the
tag-side index is seeded and the NOT EXISTS branch is actually tested.
In `@tests/test_search_exact_chunk_id.py`:
- Line 128: The test currently accesses
mock_store.kg_hybrid_search.call_args.kwargs without verifying the mock was
invoked; change the assertion to first assert the mock was called (e.g., use
mock_store.kg_hybrid_search.assert_called() or assert
mock_store.kg_hybrid_search.called) and only then inspect call_args.kwargs to
check ["include_audit"] is True, or replace with
mock_store.kg_hybrid_search.assert_called_once() and assert the expected kwargs
via the mock's assert_called_once_with/inspect of call_args; this ensures a
clear failure message if kg_hybrid_search was never invoked.
---
Outside diff comments:
In `@src/brainlayer/mcp/search_handler.py`:
- Around line 174-199: The exact-chunk branch currently returns None for
audit-filtered hits which signals `_brain_search()` to treat it as a miss and
fall back to fuzzy/entity searches; change the branch so that when a chunk is
found but excluded (checks around include_audit and
`_is_audit_recursion_metadata`, or other exclusion checks like
superseded_by/aggregated_into/archived_at, tag/project/content_type mismatches)
you return an explicit empty exact-result sentinel (e.g., a module-level
constant like EMPTY_EXACT_RESULT or a distinct dict/object) instead of None so
`_brain_search()` can detect "intentional hidden exact hit" and avoid fan-out;
update code paths that inspect exact results to recognize this sentinel.
In `@src/brainlayer/search_repo.py`:
- Around line 394-415: The overfetch decision (needs_overfetch) ignores audit
filtering; include the audit flag so that when include_audit is False we also
over-fetch—update the needs_overfetch expression to include "or not
include_audit" (or equivalent) so effective_k is bumped when audit filtering can
discard neighbors, and make the identical change in the other copy of this logic
(the block around lines 735-749) so both semantic paths over-fetch when audit
filtering is active.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 48f221ce-ad4d-453f-b933-ced6053444a7
📒 Files selected for processing (8)
brain-bar/Sources/BrainBar/BrainDatabase.swiftbrain-bar/Sources/BrainBar/MCPRouter.swiftbrain-bar/Tests/BrainBarTests/MCPRouterTests.swiftsrc/brainlayer/mcp/__init__.pysrc/brainlayer/mcp/search_handler.pysrc/brainlayer/search_repo.pytests/test_audit_recursion_filter.pytests/test_search_exact_chunk_id.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Cursor Bugbot
- GitHub Check: test (3.12)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests
**/*.py: Usepaths.py:get_db_path()for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches
Files:
tests/test_audit_recursion_filter.pysrc/brainlayer/mcp/search_handler.pytests/test_search_exact_chunk_id.pysrc/brainlayer/search_repo.pysrc/brainlayer/mcp/__init__.py
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/**/*.py: Use retry logic onSQLITE_BUSYerrors; each worker must use its own database connection to handle concurrency safely
Classification must preserveai_code,stack_trace, anduser_messageverbatim; skipnoiseentries entirely and summarizebuild_loganddir_listingentries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback viaenrichment_controller.py, and Ollama as offline last-resort; allow override viaBRAINLAYER_ENRICH_BACKENDenv var
Configure enrichment rate viaBRAINLAYER_ENRICH_RATEenvironment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns:superseded_by,aggregated_into,archived_aton chunks table; exclude lifecycle-managed chunks from default search; allowinclude_archived=Trueto show history
Implementbrain_supersedewith safety gate for personal data (journals, notes, health/finance); use soft-delete forbrain_archivewith timestamp
Addsupersedesparameter tobrain_storefor atomic store-and-replace operations
Run linting and formatting with:ruff check src/ && ruff format src/
Run tests withpytest
UsePRAGMA wal_checkpoint(FULL)before and after bulk database operations to prevent WAL bloat
Files:
src/brainlayer/mcp/search_handler.pysrc/brainlayer/search_repo.pysrc/brainlayer/mcp/__init__.py
🧠 Learnings (2)
📚 Learning: 2026-03-18T00:12:08.774Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:08.774Z
Learning: In Swift files under brain-bar/Sources/BrainBar, enforce that when a critical dependency like the database is nil due to startup ordering (socket before DB), any tool handler that accesses the database must throw an explicit error (e.g., ToolError.noDatabase) instead of returning a default/empty value. Do not allow silent defaults (e.g., guard let db else { return ... }). Flag patterns that silently return defaults when db is nil, as this masks startup timing issues. This guidance applies broadly to similar Swift files in the BrainBar module, not just this one location.
Applied to files:
brain-bar/Sources/BrainBar/MCPRouter.swiftbrain-bar/Sources/BrainBar/BrainDatabase.swift
📚 Learning: 2026-03-29T18:45:40.988Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 133
File: brain-bar/Sources/BrainBar/BrainDatabase.swift:0-0
Timestamp: 2026-03-29T18:45:40.988Z
Learning: In the BrainBar module’s Swift database layer (notably BrainDatabase.swift), ensure that the `search()` function’s `unreadOnly=true` path orders results by the delivery frontier cursor so the watermark `maxRowID` stays contiguous. Specifically, when `unreadOnly` is enabled, the query must include `ORDER BY c.rowid ASC` (e.g., via `let orderByClause = unreadOnly ? "c.rowid ASC" : "f.rank"`). Do not replace the unread-only ordering with relevance-based sorting (e.g., `f.rank`) unconditionally or for the unread-only path, as it can introduce gaps in the watermark and incorrectly mark unseen rows as delivered. Flag any future change to the `ORDER BY` clause in this function that makes relevance sorting apply to the unread-only case.
Applied to files:
brain-bar/Sources/BrainBar/MCPRouter.swiftbrain-bar/Sources/BrainBar/BrainDatabase.swift
🔇 Additional comments (2)
src/brainlayer/mcp/__init__.py (1)
508-512: LGTM —include_auditschema and routing look correct.
False(Python) serializes to JSONfalseconsistently with existing boolean defaults (confirm,stats). Bothbrain_searchandbrain_recallhandlers correctly default toFalsevia.get("include_audit", False). The deprecatedbrainlayer_searchalias intentionally omits the flag (excluded audit is the safe default, and the alias is deprecated with no schema exposure of the field).Also applies to: 800-804, 1231-1231, 1311-1311
brain-bar/Sources/BrainBar/MCPRouter.swift (1)
228-228: LGTM —includeAuditextraction, pass-through, and schema entry are consistent with existing patterns.The
?? falsedefault is correct, the named-argument pass-through todb.search(...)aligns with the rest of the call site, and omitting a"default"key from the schema dict follows the same convention as"unread_only"and"stats". Theguard let db = database else { throw ToolError.noDatabase }guard at line 243 remains in place.Also applies to: 268-269, 838-838
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Codex Review: Didn't find any major issues. Delightful! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Codex Review: Didn't find any major issues. Delightful! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/brainlayer/search_repo.py (1)
425-431:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAudit exclusion needs to trigger KNN overfetch.
Line 425 and Line 762 still compute
effective_kas if no extra post-KNN filter was added. Because the audit predicate is applied after candidate selection, nearest-neighbor queries can return fewer thann_resultswhenever the top candidates are mostly audit-tagged.Suggested fix
- needs_overfetch = ( - entity_id - or (source_filter and source_filter != "claude_code") - or source_filter_like - or correction_category - ) + needs_overfetch = ( + entity_id + or (source_filter and source_filter != "claude_code") + or source_filter_like + or correction_category + or not include_audit + )- needs_overfetch = ( - entity_id or (source_filter and source_filter != "claude_code") or source_filter_like or correction_category - ) + needs_overfetch = ( + entity_id + or (source_filter and source_filter != "claude_code") + or source_filter_like + or correction_category + or not include_audit + )Also applies to: 762-765
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/search_repo.py` around lines 425 - 431, The computed effective_k (variable effective_k) doesn't account for the post-KNN audit exclusion filter, so add the audit-exclusion predicate to the needs_overfetch boolean (alongside entity_id, source_filter, source_filter_like, correction_category) so that overfetching occurs when audit filtering will be applied; update both occurrences (the block using needs_overfetch and effective_k around the first diff and the similar block near lines 762-765) so effective_k = min(n_results * 10, 1000) when the audit exclusion predicate is true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/brainlayer/search_repo.py`:
- Around line 41-45: AUDIT_RECURSION_TAG_PATTERNS currently misses exact 'r0x'
literal matches so chunks tagged exactly "r01"/"r02" still pass; update
AUDIT_RECURSION_TAG_PATTERNS to include an explicit equality pattern that
matches r0 followed by a single digit (e.g. an SQL equality/GLOB that targets
the entire tag, referenced via tag_expr) and mirror the same exact-match check
in the metadata-fallback logic (the code that inspects tags in plain Python
around lines 149-161) so those literal r0x tags are filtered out in both the SQL
filter and the metadata fallback.
In `@tests/test_audit_recursion_filter.py`:
- Around line 17-24: Tests only seed chunk_vectors but not chunk_vectors_binary,
so VectorStore.hybrid_search() still takes the FTS path; add an INSERT into
chunk_vectors_binary for each chunk so the binary KNN path (_binary_search) is
exercised. Specifically, after the existing INSERT INTO chunk_vectors, also
execute an INSERT INTO chunk_vectors_binary (chunk_id, embedding) VALUES (?, ?)
using the same chunk_id and the binary-serialized embedding (use the project’s
binary serializer helper if available, e.g., serialize_f32_binary or convert the
float array to the required blob) before inserting tags so hybrid_search() can
prefer the binary path.
---
Outside diff comments:
In `@src/brainlayer/search_repo.py`:
- Around line 425-431: The computed effective_k (variable effective_k) doesn't
account for the post-KNN audit exclusion filter, so add the audit-exclusion
predicate to the needs_overfetch boolean (alongside entity_id, source_filter,
source_filter_like, correction_category) so that overfetching occurs when audit
filtering will be applied; update both occurrences (the block using
needs_overfetch and effective_k around the first diff and the similar block near
lines 762-765) so effective_k = min(n_results * 10, 1000) when the audit
exclusion predicate is true.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a89861a6-ddfe-4d54-aebb-9c519269de3c
📒 Files selected for processing (7)
brain-bar/Tests/BrainBarTests/MCPRouterTests.swiftsrc/brainlayer/engine.pysrc/brainlayer/mcp/search_handler.pysrc/brainlayer/search_repo.pysrc/brainlayer/vector_store.pytests/test_audit_recursion_filter.pytests/test_search_exact_chunk_id.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Cursor Bugbot
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: Macroscope - Correctness Check
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests
**/*.py: Usepaths.py:get_db_path()for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches
Files:
src/brainlayer/engine.pytests/test_search_exact_chunk_id.pysrc/brainlayer/search_repo.pysrc/brainlayer/vector_store.pysrc/brainlayer/mcp/search_handler.pytests/test_audit_recursion_filter.py
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/**/*.py: Use retry logic onSQLITE_BUSYerrors; each worker must use its own database connection to handle concurrency safely
Classification must preserveai_code,stack_trace, anduser_messageverbatim; skipnoiseentries entirely and summarizebuild_loganddir_listingentries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback viaenrichment_controller.py, and Ollama as offline last-resort; allow override viaBRAINLAYER_ENRICH_BACKENDenv var
Configure enrichment rate viaBRAINLAYER_ENRICH_RATEenvironment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns:superseded_by,aggregated_into,archived_aton chunks table; exclude lifecycle-managed chunks from default search; allowinclude_archived=Trueto show history
Implementbrain_supersedewith safety gate for personal data (journals, notes, health/finance); use soft-delete forbrain_archivewith timestamp
Addsupersedesparameter tobrain_storefor atomic store-and-replace operations
Run linting and formatting with:ruff check src/ && ruff format src/
Run tests withpytest
UsePRAGMA wal_checkpoint(FULL)before and after bulk database operations to prevent WAL bloat
Files:
src/brainlayer/engine.pysrc/brainlayer/search_repo.pysrc/brainlayer/vector_store.pysrc/brainlayer/mcp/search_handler.py
|
Codex Review: Didn't find any major issues. You're on a roll. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 4a8c8ee. Configure here.
| OR \(tagValue) GLOB 'r0[0-9]' | ||
| ) | ||
| """ | ||
| } |
There was a problem hiding this comment.
Swift audit filter missing r0x tag exclusion
Medium Severity
The Swift auditRecursionTagExclusionSQL only checks LIKE '%audit%', = 'agent=auditor', and GLOB 'r0[0-9]'. It does not exclude chunks tagged r0x because 'x' is not a digit and "r0x" doesn't contain "audit". The Python counterpart explicitly includes = 'r0x' in AUDIT_RECURSION_TAG_PATTERNS. Chunks tagged solely with r0x will leak through in Swift but be correctly filtered in Python.
Reviewed by Cursor Bugbot for commit 4a8c8ee. Configure here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4a8c8ee8c1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| FROM json_each(\(tagsJSON)) audit_tags | ||
| WHERE \(tagValue) LIKE '%audit%' | ||
| OR \(tagValue) = 'agent=auditor' | ||
| OR \(tagValue) GLOB 'r0[0-9]' |
There was a problem hiding this comment.
Add r0x shorthand to Swift audit exclusion
The default BrainBar filter claims to exclude r02/r0x audit recursion tags, but this SQL only matches GLOB 'r0[0-9]', so chunks tagged exactly r0x still leak into default brain_search results when include_audit is false. This creates an inconsistent contract versus the Python path (which does filter r0x) and allows the audit-recursion contamination this change is meant to prevent.
Useful? React with 👍 / 👎.


Summary
Tests
Notes
Note
Medium Risk
Changes default retrieval behavior across Swift and Python by filtering out audit/eval-tagged chunks, which can affect recall quality and any workflows that relied on those results. Adds new SQL predicates and tag-detection logic (with legacy DB fallbacks) that could impact query performance or edge-case matching.
Overview
Default search now excludes audit/eval “recursion” chunks unless callers pass
include_audit=true/false(Swift/Python).This threads a new
include_auditflag throughbrain_search/brain_recall(tool schemas + router/handlers) and the underlying retrieval paths (hybrid_search,think,recall, plus SwiftBrainDatabase.search/FTS and exact chunk-id lookup). Filtering is implemented via tag-based predicates (e.g.,audit,agent=auditor,r0[0-9]) with achunk_tags-table fast path and a JSON-tag fallback for legacy/readonly DBs.Adds regression coverage verifying default exclusion, explicit opt-in inclusion, and avoiding false positives for tags containing
r0xsubstrings, and adjusts exact chunk-id lookups to return an empty structured result when filtered rather than short-circuiting withNone.Reviewed by Cursor Bugbot for commit 4a8c8ee. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Filter audit recursion tags from default search results across all search paths
include_auditparameter (defaultfalse) to all search entry points —hybrid_search,search,think,recall, and exact chunk-id lookup — excluding chunks tagged with audit/eval markers (audit,agent=auditor,r0[0-9],r0x) unless explicitly opted in.NOT EXISTS/json_eachapproach, with a faster path via a dedicatedchunk_tagstable when available.auditRecursionTagExclusionSQLhelper applied to both FTS and exact-ID search paths.include_auditas an optional boolean in thebrain_searchandbrain_recallMCP tool schemas for both the Python and Swift layers.include_audit: true.Macroscope summarized 4a8c8ee.
Summary by CodeRabbit
New Features
Tests