Skip to content

feat(memory_tree): consolidate module + add agentic walk tool + tests#2556

Merged
senamakel merged 4 commits into
tinyhumansai:mainfrom
senamakel:feat/memory-tree-walk
May 24, 2026
Merged

feat(memory_tree): consolidate module + add agentic walk tool + tests#2556
senamakel merged 4 commits into
tinyhumansai:mainfrom
senamakel:feat/memory-tree-walk

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented May 24, 2026

Summary

  • Consolidate src/openhuman/memory/tree/, src/openhuman/tree_summarizer/, and src/openhuman/tools/impl/memory/tree/ into a single first-class src/openhuman/memory_tree/ module. Public RPC method names (openhuman.memory_tree_*), tool names (memory_tree), and controller-schema symbols are unchanged — code-location refactor only.
  • Add memory_tree_walk: a new agentic tool that, given a free-text query, runs a turn-based inner LLM loop (lightweight summarization model from config.local_ai.chat_model_id) over inner navigation primitives (descend, peek, fetch_leaves, answer) and returns a synthesized answer + step trace. Wired both as a standalone tool and as a new "walk" mode on the consolidated MemoryTreeTool dispatcher.
  • Add 20 new tests: 14 summarizer/engine unit tests (group_by_hour edge cases, propagate_node, run_summarization), 3 tree-build e2e tests, 3 walk-tool e2e tests (wiremock-backed Provider scripted with XML <tool_call> responses).

Problem

The pre-existing layout split closely-coupled memory-tree code across three locations, and the only way to query the tree was through individual retrieval primitives invoked one-at-a-time by the orchestrator. There was no integration test of the summarizer's full hour → day → month → year → root build path with a controlled LLM, and no agentic helper that could synthesize an answer from a single (query, namespace) call.

Solution

  • Refactor (commit 1): git mv the three trees, update wiring in src/openhuman/mod.rs, src/core/all.rs, memory/mod.rs, tools/impl/memory/mod.rs, and rewrite ~90 use paths.
  • Walk tool (commit 2): src/openhuman/memory_tree/tools/walk.rs defines MemoryTreeWalkTool, run_walk, WalkOptions, WalkOutcome. Lifts only the LLM-call-and-parse pattern from agent/harness/tool_loop.rs (skipping progress/approval/stop-hooks). Uses the existing XML <tool_call> harness convention since Provider::chat_with_history returns String.
  • Tests (commit 3): pub(crate) test-access on engine::group_by_hour and engine::propagate_node. ScriptedProvider stub in unit + summarizer e2e tests; wiremock-backed OpenAiCompatibleProvider in walk e2e.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • N/A: diff coverage not measured locally in this branch — three new test files (engine_tests.rs, memory_tree_summarizer_e2e.rs, memory_tree_walk_e2e.rs) target the changed lines; CI coverage.yml will compute the actual percentage.
  • N/A: behaviour-only change — module relocation + new tool, no removed/renamed feature rows.
  • N/A: no closed Linear matrix IDs.
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • N/A: does not touch release-cut surfaces (desktop bundling, deep-link, OAuth, native window).
  • N/A: no linked issue.

Impact

  • Desktop: code-only refactor + new tool. No behaviour change for existing openhuman.memory_tree_* RPCs or the existing memory_tree tool modes. New "walk" mode + memory_tree_walk tool are additive.
  • iOS/web/CLI: unaffected.
  • Performance: walk tool defaults to max_turns: 6 (hard cap 20) at temp 0.3 against the lightweight summarization model — bounded per-call cost.
  • Security: walk tool is ReadOnly permission, no external effect.
  • Migration: none.

Related

  • Closes:
  • Follow-up PR(s)/TODOs: optional coverage pass on canonicalize/ and retrieval/ primitives — survey indicated existing in-file tests, not re-audited in this PR.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: feat/memory-tree-walk
  • Commit SHA: 969ca5ad517a4b210220468cff6389b0a6dd7105

Validation Run

  • N/A: pnpm --filter openhuman-app format:check — not run in this branch (will run in CI).
  • N/A: pnpm typecheck — pre-existing failures on iOS files (PairPhoneModal.tsx, tunnel/crypto.ts, PairScreen.tsx) untouched by this PR, from PR feat(ios): iOS client with QR pairing, E2E-encrypted tunnel, and push-to-talk #1420; this PR's pre-push hook was bypassed with --no-verify for that reason.
  • Focused tests: cargo test --lib memory_tree::summarizer::engine (14 pass), tests/memory_tree_summarizer_e2e.rs (3 pass), tests/memory_tree_walk_e2e.rs (3 pass), memory_tree::tools::walk unit tests (3 pass).
  • Rust fmt/check (if changed): cargo fmt applied; cargo check --bin openhuman-core clean.
  • N/A: Tauri fmt/check — no app/src-tauri/ changes.

Validation Blocked

  • command: pnpm compile (typecheck)
  • error: Missing modules qrcode.react, @noble/ciphers/chacha, @noble/ciphers/webcrypto, @tauri-apps/plugin-barcode-scanner — all in app/src/{components/settings/panels/devices,lib/tunnel,pages/ios}/....
  • impact: Pre-existing on main from PR feat(ios): iOS client with QR pairing, E2E-encrypted tunnel, and push-to-talk #1420 (iOS client). None of the failing files are touched by this PR. Pre-push hook bypassed with --no-verify per CLAUDE.md policy.

Behavior Changes

  • Intended behavior change: new memory_tree_walk tool / "walk" mode on consolidated dispatcher.
  • User-visible effect: orchestrator can now offload tree-walking to a single autonomous tool call instead of multi-turn manual navigation.

Parity Contract

  • Legacy behavior preserved: all existing openhuman.memory_tree_* RPCs, tool names, and controller-schema symbols unchanged across the refactor.
  • Guard/fallback/dispatch parity checks: src/core/all.rs still imports the same all_memory_tree_* and all_tree_summarizer_* symbol names from the new module path via re-export.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): none.
  • Canonical PR: this one.
  • Resolution: N/A.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added interactive "walk" mode to memory tree tool, enabling users to navigate and explore memory structures through multi-turn LLM-guided conversations with natural language queries.

senamakel added 3 commits May 23, 2026 16:46
…l impls into one module

Move three locations into a single first-class `src/openhuman/memory_tree/`
module:
- `src/openhuman/memory/tree/`         -> `src/openhuman/memory_tree/`
- `src/openhuman/tree_summarizer/`     -> `src/openhuman/memory_tree/summarizer/`
- `src/openhuman/tools/impl/memory/tree/` -> `src/openhuman/memory_tree/tools/`

Public RPC method names (`openhuman.memory_tree_*`), tool names
(`memory_tree`), and controller-schema symbol names are unchanged - this is
a code-location refactor only. `core/all.rs` keeps the same imports via
re-exports on the new module.
New `MemoryTreeWalkTool` that, given a free-text query, runs a turn-based
inner LLM loop over inner navigation primitives (`descend`, `peek`,
`fetch_leaves`, `answer`) to walk the memory tree and return a synthesized
answer + step trace. Uses the lightweight summarization model from
`config.local_ai.chat_model_id`; capped at 6 turns by default (hard cap 20).

Wired as a new `"walk"` mode on the consolidated `MemoryTreeTool` dispatcher
and registered as a standalone `MemoryTreeWalkTool` in `tools/ops.rs`. Inner
tool-calling uses the existing XML `<tool_call>` convention from the agent
harness rather than structured `tool_calls`, since `Provider::chat_with_history`
returns a plain `String`.

Includes 3 unit tests (happy-path walk, max-turn cap, unknown-node recovery)
driven by a scripted stub `Provider`.
…er and walk tool

- summarizer/engine.rs: mark `group_by_hour` and `propagate_node` `pub(crate)`,
  wire `engine_tests.rs` as a sibling test module.
- summarizer/engine_tests.rs: 14 unit tests covering group_by_hour edge cases,
  propagate_node (no-children noop, day/month from children, created_at
  preservation), and run_summarization (empty buffer, single-hour drain,
  ancestor chain, multi-hour grouping) — all driven by a scripted stub Provider.
- tests/memory_tree_summarizer_e2e.rs: 3 e2e tests calling run_summarization
  directly with a ScriptedProvider stub. Covers full hour->day->month->year->root
  build, merge-into-existing-hour with created_at preservation, and partial-
  progress retention on mid-run LLM error.
- tests/memory_tree_walk_e2e.rs: 3 e2e tests for the walk tool driving an
  OpenAI-compatible Provider against a wiremock server scripted with XML
  `<tool_call>` responses. Covers happy-path descend->fetch->answer, max-turn
  cap, and graceful unknown-node recovery.

All 20 new tests pass.
@senamakel senamakel requested a review from a team May 24, 2026 01:40
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 24, 2026

Warning

Review limit reached

@senamakel, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 9 minutes and 2 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c30d2cbf-a804-4b12-a7e9-17dae9af3102

📥 Commits

Reviewing files that changed from the base of the PR and between 969ca5a and 78469cc.

📒 Files selected for processing (3)
  • docs/whatsapp-data-flow.md
  • src/openhuman/composio/providers/slack/ingest.rs
  • src/openhuman/memory_tree/tools/walk.rs
📝 Walkthrough

Walkthrough

Large-scale namespace migration from openhuman::memory::tree to openhuman::memory_tree across controllers, CLI, jobs, stores, retrieval, scoring, and agents. Introduces memory_tree::summarizer, removes legacy exports, and adds a new multi-turn MemoryTree Walk tool with registration and comprehensive unit/e2e tests.

Changes

Memory-tree consolidation and tooling

Layer / File(s) Summary
Surface, controllers, CLI, startup
src/core/*, src/openhuman/mod.rs, src/bin/*, ...
Public exports updated, controllers/schemas registered under memory_tree, CLI routes tree-summarizer to memory_tree, and startup uses memory_tree jobs.
Agent and subconscious adjustments
src/openhuman/agent/..., src/openhuman/subconscious/...
Agent harness/tests and subconscious reporters now import chat/summarizer/retrieval/store from memory_tree.
Doctor, config, inference constants
src/openhuman/doctor/*, src/openhuman/config/ops.rs, src/openhuman/inference/local/model_requirements.rs
DB probes and settings-triggered jobs point to memory_tree; MIN_CONTEXT_TOKENS derives from the embedder under memory_tree.
Legacy memory module cleanup
src/openhuman/memory/mod.rs, src/openhuman/mod.rs
Removes memory::tree public surface and adds memory_tree module export.
Content store and canonicalization
src/openhuman/memory_tree/{canonicalize,chunker,content_store}/*
Canonicalizers and content-store modules switch to memory_tree types/utilities; tests updated.
Ingest orchestration and read RPC
src/openhuman/memory_tree/{ingest,read_rpc}.rs
Ingest and read RPC handlers rewire to memory_tree modules and constants; tests updated.
Jobs, workers, and core store
src/openhuman/memory_tree/jobs/*, src/openhuman/memory_tree/store*.rs
Jobs/worker/scheduler/store moved to memory_tree; handlers/tests updated accordingly.
Retrieval modules and tests
src/openhuman/memory_tree/retrieval/*, tests/*
All retrieval paths (fetch/search/source/topic/global/drill_down/types/rpc) and tests point to memory_tree.
Scoring, extractors, signals, store
src/openhuman/memory_tree/score/**/*
Scoring modules, LLM extractors, signals, and score store/tests refactored to memory_tree.
Tree source/global/topic
src/openhuman/memory_tree/tree_{source,global,topic}/*
Sealing/flush/registry/digest/recap/hotness logic migrates to memory_tree with adjusted writes/indexing; tests updated.
Summarizer engine and tests
src/openhuman/memory_tree/summarizer/*
Summarizer ops/cli/schemas/store/engine live under memory_tree; exposes helper visibility and adds unit/e2e tests.
Tools and new Walk feature
src/openhuman/memory_tree/tools/*, src/openhuman/tools/*, tests/memory_tree_walk_e2e.rs
Tools re-pointed to memory_tree, MemoryTreeTool gains walk mode; implements MemoryTreeWalkTool and registers it with e2e tests.

Sequence Diagram(s)

sequenceDiagram
  participant Client as MemoryTreeWalkTool
  participant LLM as Provider
  participant Store as Retrieval/Store
  Client->>LLM: chat_with_history(query + node context)
  LLM-->>Client: text + <tool_call name="descend/peek/fetch_leaves/answer">
  Client->>Store: fetch node/children or leaves (per tool_call)
  Store-->>Client: results (context or leaf text)
  Client->>LLM: next turn with updated history
  LLM-->>Client: <tool_call "answer"> final text
  Client-->>Client: assemble WalkOutcome(trace, answer)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested labels

feature, memory, agent, rust-core

Suggested reviewers

  • graycyrus
  • M3gA-Mind

Poem

A rabbit rewires the forest of lore,
Paths from memory::tree to memory_tree’s core.
New trails to wander, a Walk tool to try,
Hop, fetch the leaves, ask “where?” and “why?”.
Summaries sprout, jobs hum with glee—
Consolidated burrows of openhuman memory. 🥕🐇

@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. labels May 24, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/openhuman/composio/providers/profile.rs (1)

1-847: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Split this module to get back under the repository size threshold.

src/openhuman/composio/providers/profile.rs is now 847 lines, which exceeds the ~500-line limit and makes further changes riskier to review and maintain. Please split this into focused sibling modules (e.g., identity types/parsing, persistence, read paths, prompt rendering, tests).

As per coding guidelines **/*.{ts,tsx,rs}: “File size should not exceed approximately 500 lines. When a module grows beyond this threshold, split it into smaller, more focused modules with clear responsibilities.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/composio/providers/profile.rs` around lines 1 - 847, Split the
oversized profile.rs into smaller focused modules: create identity.rs (define
IdentityKind, canonicalize, parse_skill_identity_key), persist.rs
(persist_provider_profile, expand_identity_rows, json_str, and any
profile::profile_upsert/learning_candidate usage), read.rs
(load_connected_identities, is_self_identity, is_self_identity_any_toolkit,
delete_connected_identity_facets), render.rs
(render_connected_identities_section, ConnectedIdentity struct), and helpers.rs
(normalize_token, title_case, sanitize_prompt_value, now_secs). Move the
corresponding unit tests into matching test modules or a tests/ submodule and
update all internal references (e.g., ProviderUserProfile, FacetType,
profile_upsert, learning_candidate::global) to import from the new modules;
re-export public symbols from a new mod profile { pub use self::identity::*,
self::persist::*, ... } in the original path so external callers keep the same
API. Ensure visibility (pub/pub(crate)) and fix imports (use super:: or
crate::openhuman::composio::providers::profile::...) and run cargo test to
resolve any naming/borrow changes.
src/openhuman/memory_tree/retrieval/source.rs (1)

1-690: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Please split this module to comply with the repo’s file-size rule.

This file is substantially above the ~500-line target; moving tests and/or semantic rerank helpers into sibling modules would make it easier to maintain.

As per coding guidelines **/*.{ts,tsx,rs}: “File size should not exceed approximately 500 lines.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/retrieval/source.rs` around lines 1 - 690, The
module is too large — split out the semantic rerank logic and the tests into
sibling modules: move rerank_by_semantic_similarity (and its helper imports like
build_embedder_from_config, cosine_similarity and any HashMap/embedding lookup
code) into a new retrieval/semantic.rs module and export it (pub(crate) or pub
as needed), and move the entire #[cfg(test)] mod tests into retrieval/tests.rs
(or retrieval/source_tests.rs) as a test-only module that imports the public
helpers from source.rs; update source.rs to declare the new submodules (mod
semantic; #[cfg(test)] mod tests;) and replace internal calls like
rerank_by_semantic_similarity(...) and any moved helper references with the
re-exported symbols, adjust visibility of
collect_hits_and_nodes/select_trees/scope_matches_kind if tests need access
(make them pub(crate) instead of fn), and fix imports/usages so compilation and
tests still pass.
src/openhuman/memory_tree/tools/walk.rs (1)

1-957: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Split this module to stay under the repository file-size threshold.

This file is ~957 lines; please split operational pieces (parser/helpers/tests/adapter) into focused sibling modules.

As per coding guidelines **/*.{ts,tsx,rs}: “File size should not exceed approximately 500 lines.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/tools/walk.rs` around lines 1 - 957, Split this
large module into focused sibling modules to keep file size under ~500 lines:
keep the public API and core loop (MemoryTreeWalkTool, run_walk, WalkOptions,
WalkOutcome, WalkStep, WalkStopReason) in the original file and move parsing,
adapter, inner helpers, and tests into new modules; specifically extract
parse_walk_tool_calls and InnerCall into a parser module, move
ChatProviderAdapter into an adapter module, move dispatch_inner_call,
build_node_context, build_system_prompt, build_inner_tools_text, and
synthesize_fallback_answer into a helpers (or primitives) module, and relocate
the #[cfg(test)] test module to a tests module/file; update the original file to
import these with mod/use and re-export symbols if needed so run_walk and the
Tool implementation still call parser::parse_walk_tool_calls,
adapter::ChatProviderAdapter, and helpers::dispatch_inner_call (and
helpers::build_node_context, build_system_prompt, build_inner_tools_text,
synthesize_fallback_answer) with minimal changes to function signatures.
src/openhuman/memory_tree/jobs/handlers/mod.rs (1)

1-1521: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

This module should be split before merge.

At ~1521 lines, this file is far above the repository threshold and is now difficult to reason about; please break handlers/tests into smaller focused modules.

As per coding guidelines **/*.{ts,tsx,rs}: “File size should not exceed approximately 500 lines.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/jobs/handlers/mod.rs` around lines 1 - 1521, Large
single-file module (~1521 lines) violates the ~500-line guideline; split it into
smaller modules. Extract per-kind handlers (handle_extract,
handle_append_buffer, handle_seal, handle_topic_route, handle_digest_daily,
handle_flush_stale, handle_reembed_backfill), their related constants
(L0_DEFAULT_FLUSH_AGE_SECS, REEMBED_BACKFILL_BATCH, REEMBED_BACKFILL_REVISIT_MS)
and helper functions (try_mark_chunk_reembed_skipped,
try_mark_summary_reembed_skipped) into a new handlers/*.rs (or multiple files)
and re-export or call them from this mod.rs's handle_job dispatcher; move the
#[cfg(test)] block/tests into a tests/ submodule or separate test files
preserving test functions (e.g.,
source_tree_seal_handler_enqueues_summary_topic_route,
reembed_backfill_repopulates_then_completes,
reembed_backfill_tombstones_orphan_and_terminates) and adapt visibility
(pub(crate) or pub) and use/import paths accordingly; update mod declarations
and use paths in this file so existing callers (handle_job,
worker::wake_workers, chunk_store::tree_active_signature, etc.) continue to
compile. Ensure transactional helpers (chunk_store::with_connection calls) and
logging remain reachable after the split and run cargo test to fix any
visibility/import issues.
🧹 Nitpick comments (12)
src/openhuman/channels/runtime/startup.rs (1)

42-635: 🏗️ Heavy lift

Split this startup module into smaller focused units before adding more wiring.

This file is already well beyond the size threshold, which makes channel boot flow ownership and maintenance harder. Please break it into focused modules (e.g., bus/subscriber registration, provider/memory bootstrap, channel construction).

As per coding guidelines "**/*.{ts,tsx,rs}: File size should not exceed approximately 500 lines. When a module grows beyond this threshold, split it into smaller, more focused modules with clear responsibilities."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/channels/runtime/startup.rs` around lines 42 - 635, The
start_channels function has grown too large; split its responsibilities into
smaller functions/modules: extract bus/subscriber registration into a new
module/function (e.g., register_startup_subscribers) that encapsulates calls to
event_bus::init_global, TracingSubscriber subscribe, register_health_subscriber,
register_skill_cleanup_subscriber, the Phase 2–4 learning OnceLock blocks and
cron/proactive/tree subscribers; extract provider/memory/bootstrap logic into a
new function (e.g., bootstrap_provider_and_memory) that returns (provider, mem,
runtime, security, audit, provider_runtime_options) and contains
create_intelligent_routing_provider, provider.warmup,
host_runtime::create_runtime, SecurityPolicy::from_config,
get_or_create_workspace_audit_logger, memory::create_memory_with_local_ai;
extract channel list construction into its own function (e.g.,
build_channel_list) that returns Vec<Arc<dyn Channel>> and contains all
config.channels_config.* branch logic and spawn_supervised_listener wiring; keep
start_channels to orchestrate the high-level flow (call the new functions,
compute backoff/limits, create runtime_ctx, and call run_message_dispatch_loop).
Refactor by moving extracted code into new files/modules, keeping existing
symbol names (start_channels, spawn_supervised_listener,
run_message_dispatch_loop, build_system_prompt, tools::all_tools_with_runtime)
so callers remain unchanged and tests compile.
src/openhuman/memory_tree/store_tests.rs (1)

47-771: 🏗️ Heavy lift

Split this test module into smaller focused suites.

The current test file is large enough that targeted maintenance and failure triage are getting expensive; grouping by behavior (connection cache, journaling, re-embed, schema/init) would improve clarity.

As per coding guidelines "**/*.{ts,tsx,rs}: File size should not exceed approximately 500 lines. When a module grows beyond this threshold, split it into smaller, more focused modules with clear responsibilities."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/store_tests.rs` around lines 47 - 771, The test
file is too large; split it into focused test modules (e.g., connection_cache,
journaling/migration, reembed, schema_init) by moving related tests into new
test files and keeping shared helpers in a single test_helpers module. For each
group, create a new test module/file containing the tests that reference the
same behavior (e.g., move with_connection_serialises_concurrent_schema_init,
is_transient_cold_start_classifies_known_extended_codes,
with_connection_keeps_foreign_keys_on_for_every_call,
memory_tree_uses_truncate_journal_not_wal, existing_wal_db_migrates_to_truncate
into a journaling/schema module; move
connection_cache_returns_same_arc_for_same_workspace,
connection_cache_uses_separate_connections_for_different_workspaces,
circuit_breaker_trips_after_threshold into a connection_cache module; move
clear_chunk_reembed_skipped_is_idempotent,
clear_reembed_skipped_for_signature_removes_all_tombstones_for_sig,
validate_reembed_skip_key into a reembed module; keep
legacy_embeddings_migrate_to_sidecar_once with embedding-related helpers),
update imports to use the shared helpers (e.g., test_config, sample_chunk,
with_connection, get_or_init_connection, clear_connection_cache,
try_cleanup_stale_files, clear_connection_cache, mark_chunk_reembed_skipped,
clear_reembed_skipped_for_signature, validate_reembed_skip_key), and add mod
declarations so Cargo runs them as tests; ensure visibility of helper functions
(pub(crate) or move to a common tests/helpers module) so the split files compile
and tests run.
src/bin/slack_backfill.rs (1)

149-577: 🏗️ Heavy lift

Break this CLI binary into subcommand-focused modules.

main now spans too much behavior (probe modes, backfill modes, seal probe, ratelimit probe), making changes harder to reason about and test in isolation.

As per coding guidelines "**/*.{ts,tsx,rs}: File size should not exceed approximately 500 lines. When a module grows beyond this threshold, split it into smaller, more focused modules with clear responsibilities."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bin/slack_backfill.rs` around lines 149 - 577, main is too large and
should be split into focused subcommand handlers; extract the big conditional
blocks into separate functions/modules and dispatch from a small main that only
does init and CLI parsing. Specifically: move the seal-probe block into a new
handler function (e.g. handle_seal_probe) that takes (&Cli, &Config) and uses
ingest_chat; move the SLACK_SEARCH_MESSAGES probe into handle_probe_search(&Cli,
&Config, &client_kind) which calls execute_action; move the probe_ratelimit loop
into handle_probe_ratelimit(&Cli, &Config, &client_kind) (preserve Outcome enum
and list_connections_via_kind usage); extract the search backfill loop that
calls run_backfill_via_search into handle_search_backfill(&Cli, &Config,
&connections); and extract the non-search per-connection backfill (provider.sync
loop) into handle_sync_backfill(&Cli, &Config, &provider, &candidates). Keep
init_default_providers, memory::global::init, tracing/env_logger setup and
create_composio_client in main, then dispatch to these handlers based on CLI
flags; wire return Result<()> through each handler and move related helper
imports (chrono, ingest_chat, execute_action, list_connections_via_kind,
ProviderContext) into their new modules.
tests/memory_tree_summarizer_e2e.rs (1)

1-579: 🏗️ Heavy lift

Split this test module below the 500-line threshold.

This new file is ~579 lines, which exceeds the repository’s module-size guideline for Rust files. Please split it into focused modules (for example: env/provider harness helpers vs scenario tests) to keep maintenance and reviewability manageable.

As per coding guidelines: "**/*.{ts,tsx,rs}: File size should not exceed approximately 500 lines."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/memory_tree_summarizer_e2e.rs` around lines 1 - 579, The file exceeds
the 500-line guideline; split helper/harness code out of the test module into a
smaller helper module and keep only the three scenario tests in the e2e test
file. Move EnvVarGuard, ENV_LOCK/env_lock, ScriptedProvider (and its Provider
impl), build_config, ts_hour14/ts_hour15, and NS into a new module (e.g.,
summarizer_harness) and make those items public, then in the original tests file
replace the moved definitions with a mod/use to import
summarizer_harness::{EnvVarGuard, env_lock, ScriptedProvider, build_config,
ts_hour14, ts_hour15, NS}; ensure visibility changes (pub) where needed and
update imports so engine::run_summarization and store::* usages in the three
test functions remain unchanged.
src/openhuman/context/segment_recap_summarizer_tests.rs (1)

19-19: 🏗️ Heavy lift

Split this test module to align with the Rust file-size guideline.

This file exceeds the ~500-line target; please break it into focused test modules.

As per coding guidelines: **/*.{ts,tsx,rs}: File size should not exceed approximately 500 lines. When a module grows beyond this threshold, split it into smaller, more focused modules with clear responsibilities.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/context/segment_recap_summarizer_tests.rs` at line 19, The test
module in segment_recap_summarizer_tests.rs has grown past the ~500-line
guideline; split it into smaller focused test modules (e.g.,
segment_recap_summarizer_unit_tests.rs,
segment_recap_summarizer_integration_tests.rs) by moving related test functions
into new files, preserving imports like ChatPrompt and any helper fixtures,
exporting or re-exporting shared helpers via a common mod (or a tests/util.rs)
so tests still compile, and update the parent mod declarations (pub mod ... or
mod ...) so the test suite runs unchanged; ensure each new file contains the
appropriate use crate::openhuman::memory_tree::chat::ChatPrompt import and
adjust visibility of helpers as needed.
src/openhuman/memory_tree/read_rpc.rs (1)

34-39: 🏗️ Heavy lift

Decompose this RPC module before merge to satisfy size constraints.

This module is far beyond the ~500-line limit and should be split by concern (list/search/recall, mutation endpoints, graph export, LLM config).

As per coding guidelines: **/*.{ts,tsx,rs}: File size should not exceed approximately 500 lines. When a module grows beyond this threshold, split it into smaller, more focused modules with clear responsibilities.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/read_rpc.rs` around lines 34 - 39, The file
src/openhuman/memory_tree/read_rpc.rs is too large and must be split by concern;
create smaller modules (e.g., list_search_recall.rs for listing/search/recall
logic that uses content_read and NodeKind/SourceKind, mutations.rs for mutation
endpoints that use chunk_store/with_connection and score_store, graph_export.rs
for graph export code, and llm_config.rs for LLM configuration and related
RPCs), move the corresponding functions/types into those files, export them from
a new mod.rs or update the parent mod to pub use the new modules, update imports
in callers to reference the new module paths instead of read_rpc.rs, and run
cargo check to fix any visibility or import issues.
src/openhuman/memory_tree/retrieval/topic.rs (1)

20-28: 🏗️ Heavy lift

Split this retrieval module to meet the repository size guideline.

The file is now above the ~500-line threshold; splitting query/rerank/hydration/test helpers into submodules will keep maintenance safer.

As per coding guidelines: **/*.{ts,tsx,rs}: File size should not exceed approximately 500 lines. When a module grows beyond this threshold, split it into smaller, more focused modules with clear responsibilities.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/retrieval/topic.rs` around lines 20 - 28, The
topic.rs module has grown too large; split it into focused submodules (e.g.,
query.rs, rerank.rs, hydrate.rs, test_helpers.rs) and move the corresponding
functionality into them: relocate query-related functions/types into query.rs,
reranking logic into rerank.rs, hydration/assembly code into hydrate.rs, and any
test helpers into test_helpers.rs; then add mod declarations in topic.rs (mod
query; mod rerank; mod hydrate; mod test_helpers;) and re-export the public APIs
you need (pub use query::..., etc.), update all internal uses/imports (e.g.,
hit_from_summary, QueryResponse, RetrievalHit, build_embedder_from_config,
cosine_similarity, lookup_entity, EntityHit, Tree/TreeKind) to the new module
paths, and adjust visibility (pub/pub(crate)) so existing callers keep working
and tests compile.
src/openhuman/memory_tree/tree_global/seal.rs (1)

20-33: 🏗️ Heavy lift

Split this module to stay within the repository size ceiling.

This file is now above the ~500-line limit; please split sealing logic/tests into smaller focused modules.

As per coding guidelines: **/*.{ts,tsx,rs}: File size should not exceed approximately 500 lines. When a module grows beyond this threshold, split it into smaller, more focused modules with clear responsibilities.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/tree_global/seal.rs` around lines 20 - 33, This
file exceeds the repository size ceiling; split its sealing logic and tests into
smaller modules: extract core sealing functions and types (the main seal
implementation that uses stage_summary, SummaryComposeInput, SummaryTreeKind,
new_summary_id, with_connection, and store) into a focused seal_core.rs, move
configuration/threshold constants (GLOBAL_TOKEN_BUDGET, MONTHLY_SEAL_THRESHOLD,
WEEKLY_SEAL_THRESHOLD, YEARLY_SEAL_THRESHOLD) into a seal_config.rs, isolate
embedding/score helper logic that uses build_embedder_from_config into
seal_embed.rs, and place summariser-related code (Summariser, SummaryContext,
SummaryInput) and Tree/Buffer types (Buffer, SummaryNode, Tree, TreeKind) in a
seal_summariser.rs or re-export them from the new modules; also move large test
cases into a parallel tests/ module/file so each source file stays under ~500
lines and update module declarations and re-exports accordingly.
src/core/all.rs (1)

188-190: 🏗️ Heavy lift

Split controller/schema aggregation out of this oversized module.

Adding more registry entries here keeps growing a single hotspot that is already well beyond the size cap; please break registration/schema builders into smaller focused modules and compose them from this file.

As per coding guidelines: **/*.{ts,tsx,rs}: File size should not exceed approximately 500 lines.

Also applies to: 229-229, 316-317, 335-335

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/all.rs` around lines 188 - 190, The all.rs module is becoming too
large due to direct aggregation of many controllers/schemas; extract the
registration logic into smaller focused modules (e.g., create new modules like
openhuman::memory_tree::registration and
openhuman::memory_tree::retrieval_registration) that each expose functions such
as all_memory_tree_registered_controllers and
all_retrieval_registered_controllers (or analogous names) and move the
schema/controller builder code into those modules, then in all.rs simply call
controllers.extend(...) with those exported helper functions so all.rs only
composes registrations instead of containing their implementations.
src/openhuman/memory_tree/retrieval/rpc.rs (1)

310-313: 🏗️ Heavy lift

Move this large inline test block into a sibling rpc_test.rs.

This module is already over the size threshold; extracting the #[cfg(test)] section will keep handler code focused and reduce churn in one file.

As per coding guidelines: **/*.{ts,tsx,rs}: File size should not exceed approximately 500 lines. and src/**/*.rs: ... prefer a sibling *_test.rs file ....

Also applies to: 321-324

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/retrieval/rpc.rs` around lines 310 - 313, Move the
large inline #[cfg(test)] test block out of this module into a sibling
rpc_test.rs to reduce rpc.rs size: cut the entire test module from
src/openhuman/memory_tree/retrieval/rpc.rs and paste it into
src/openhuman/memory_tree/retrieval/rpc_test.rs, update imports inside the new
file to use the same symbols (content_store, upsert_chunks, and
types::{chunk_id, Chunk, Metadata, SourceRef}, chrono::Utc) and any
crate-relative paths so the tests compile, and remove the #[cfg(test)] section
from rpc.rs so only the production handler code remains in that file. Ensure the
new rpc_test.rs has the appropriate use declarations and #[cfg(test)] mod so
cargo test picks it up.
tests/memory_tree_walk_e2e.rs (1)

1-536: ⚡ Quick win

Split this test module to stay within the file-size cap.

This file is ~536 lines, above the ~500-line threshold. Please extract shared test utilities (e.g., ScriptedResponder, seeding/provider helpers) into a sibling test-support module to keep this file focused on scenario assertions.

As per coding guidelines: "**/*.{ts,tsx,rs}: File size should not exceed approximately 500 lines."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/memory_tree_walk_e2e.rs` around lines 1 - 536, The module is too large;
extract shared test utilities into a sibling test-support module: move
ScriptedResponder, env_lock/ENV_LOCK, test_config, make_node, seed_tree,
make_provider and any helper imports (e.g., derive_parent_id,
level_from_node_id, estimate_tokens, write_node) into a new test-support
file/module, re-export or pub use the necessary symbols, then update this test
file to import those helpers and keep only the scenario tests
(walks_descend_fetch_answer, respects_max_turns_cap_with_mock,
handles_unknown_node_gracefully) plus their local setup; ensure run_walk,
WalkOptions and WalkStopReason usages remain unchanged and update module paths
so the tests compile.
src/openhuman/memory_tree/tools/walk.rs (1)

452-466: ⚡ Quick win

Add diagnostics when a <tool_call> block is malformed/invalid JSON.

Right now malformed blocks are silently skipped, which makes walk failures hard to debug in production traces.

As per coding guidelines **/*.rs: “Debug logging must follow these rules… log … branches … and errors… All changes lacking diagnosis logging are incomplete.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/tools/walk.rs` around lines 452 - 466, The parser
currently silently skips malformed <tool_call> blocks; update the block in
walk.rs (the branch handling `None => break` and the `Some(close_idx)` branch
that parses `after_open`) to emit diagnostic logs: when `None` occurs log a
debug/error with context (e.g., the remaining `after_open` content) indicating
an unclosed/malformed tool_call, when
`serde_json::from_str::<Value>(inner.trim())` returns Err log the JSON parse
error and the `inner` string, and when `val.get("name")` is missing or not a
string log a debug entry showing the parsed `val`; reference the variables
`after_open`, `close_idx`, `inner`, the `calls` push of `InnerCall`, and ensure
logs use the crate logger (e.g., log::debug!/log::error!) with concise
contextual messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/core/jsonrpc.rs`:
- Line 1561: Remove the direct domain bootstrap call
`crate::openhuman::memory_tree::jobs::start(config.clone())` from the transport
layer (jsonrpc.rs) and instead invoke that startup from the application/domain
wiring layer (controller/bootstrap code) where domain initialization belongs;
delete the call in `jsonrpc.rs`, ensure `memory_tree::jobs::start` remains a
public domain API (or add a thin public wrapper) that accepts the same `config`,
and call that API from your domain/controller initialization path (e.g., the
central bootstrap or controller init function) so transport code remains
transport-only and the domain start is performed by the wiring layer.

In `@src/openhuman/composio/providers/slack/ingest.rs`:
- Around line 33-39: Docstrings still reference the old module name
"memory::tree"; update them to the new "memory_tree" naming to match the imports
(e.g., symbols ChatBatch, ChatMessage, raw_store/raw_rel_path/RawItem/RawKind,
ingest_chat, set_chunk_raw_refs/RawRef, redact). Search for occurrences of
"memory::tree" in this file and replace with "memory_tree" (and adjust any
surrounding phrasing) so the comments accurately reflect the current module
layout and imported symbols.

In `@src/openhuman/memory_tree/chunker.rs`:
- Around line 19-20: This file is too large; split chunker.rs into focused
modules by moving the core chunking implementation (functions/types that perform
token counting, chunk creation, and public APIs—e.g., functions referencing
approx_token_count, Chunk, Metadata, SourceKind, and redact) into a new
src/openhuman/memory_tree/chunker_core.rs and move helpers and test utilities
into src/openhuman/memory_tree/chunker_helpers.rs (and tests into
chunker_tests.rs or the tests/ directory); update chunker.rs to re-export the
public items with pub mod declarations (pub mod chunker_core; pub mod
chunker_helpers;) so existing callers keep using the same symbols, and adjust
use paths to import approx_token_count, Chunk, Metadata, SourceKind, and redact
from the new modules.

In `@src/openhuman/memory_tree/store.rs`:
- Around line 38-40: The file memory_tree/store.rs is too large and mixes
schema, connection/cache lifecycle, migrations, and embedding logic; split it
into focused modules (e.g., schema.rs, connection.rs, migrations.rs,
embedding.rs, content_store.rs) and keep memory_tree/store.rs as a lightweight
wiring and re-exports file. Move definitions and implementations that deal with
Chunk, Metadata, SourceKind, SourceRef, and StagedChunk out into the appropriate
new modules, export the necessary types from those modules, and update mod
declarations and use sites to re-export the public API from store.rs so callers
see the same symbols but the implementation is decomposed. Ensure each new file
stays under the ~500-line target and preserve existing function signatures and
visibility so tests/builds remain unchanged.

In `@src/openhuman/memory_tree/tools/walk.rs`:
- Around line 775-776: The code in StubProvider uses
self.responses.lock().unwrap().drain(0..1).next(), which panics when the vector
is empty; replace that pattern with a safe check-and-remove: lock the mutex into
a mutable variable (let mut responses = self.responses.lock().unwrap()), return
Err(anyhow::anyhow!("StubProvider: no more scripted responses")) if
responses.is_empty(), otherwise call responses.remove(0) (or
responses.pop_front() if you change the collection to VecDeque) and return that
value; apply the same change to the other occurrence at the later block so both
sites never panic and instead return the intended error.
- Around line 591-621: The branch fetching leaves calls
retrieval::drill_down(config, &node_id, 1, None, Some(10)) and
do_fetch_leaves(...) without passing the active namespace, which can mix data
across namespaces; update the calls to pass the same namespace used by
run_walk/node reads (e.g., the namespace field or variable in scope) into
retrieval::drill_down and into do_fetch_leaves so both queries are
namespace-aware (adjust argument order/signature if necessary to supply
namespace to retrieval::drill_down and do_fetch_leaves).

In `@src/openhuman/subconscious/engine.rs`:
- Around line 24-27: The module is too large and mixes responsibilities; split
it into smaller focused sibling modules by extracting the tick loop, provider
routing, parsing, and persistence helpers into their own files; create new
modules (e.g., tick_loop.rs, provider_router.rs, parser.rs, persistence.rs) and
move related functions and types out of the large engine.rs while keeping public
APIs intact, updating imports where MemoryClientRef, build_chat_provider,
ChatConsumer, ChatPrompt, and ChatProvider are referenced so engine.rs composes
these smaller modules rather than containing all logic.

---

Outside diff comments:
In `@src/openhuman/composio/providers/profile.rs`:
- Around line 1-847: Split the oversized profile.rs into smaller focused
modules: create identity.rs (define IdentityKind, canonicalize,
parse_skill_identity_key), persist.rs (persist_provider_profile,
expand_identity_rows, json_str, and any
profile::profile_upsert/learning_candidate usage), read.rs
(load_connected_identities, is_self_identity, is_self_identity_any_toolkit,
delete_connected_identity_facets), render.rs
(render_connected_identities_section, ConnectedIdentity struct), and helpers.rs
(normalize_token, title_case, sanitize_prompt_value, now_secs). Move the
corresponding unit tests into matching test modules or a tests/ submodule and
update all internal references (e.g., ProviderUserProfile, FacetType,
profile_upsert, learning_candidate::global) to import from the new modules;
re-export public symbols from a new mod profile { pub use self::identity::*,
self::persist::*, ... } in the original path so external callers keep the same
API. Ensure visibility (pub/pub(crate)) and fix imports (use super:: or
crate::openhuman::composio::providers::profile::...) and run cargo test to
resolve any naming/borrow changes.

In `@src/openhuman/memory_tree/jobs/handlers/mod.rs`:
- Around line 1-1521: Large single-file module (~1521 lines) violates the
~500-line guideline; split it into smaller modules. Extract per-kind handlers
(handle_extract, handle_append_buffer, handle_seal, handle_topic_route,
handle_digest_daily, handle_flush_stale, handle_reembed_backfill), their related
constants (L0_DEFAULT_FLUSH_AGE_SECS, REEMBED_BACKFILL_BATCH,
REEMBED_BACKFILL_REVISIT_MS) and helper functions
(try_mark_chunk_reembed_skipped, try_mark_summary_reembed_skipped) into a new
handlers/*.rs (or multiple files) and re-export or call them from this mod.rs's
handle_job dispatcher; move the #[cfg(test)] block/tests into a tests/ submodule
or separate test files preserving test functions (e.g.,
source_tree_seal_handler_enqueues_summary_topic_route,
reembed_backfill_repopulates_then_completes,
reembed_backfill_tombstones_orphan_and_terminates) and adapt visibility
(pub(crate) or pub) and use/import paths accordingly; update mod declarations
and use paths in this file so existing callers (handle_job,
worker::wake_workers, chunk_store::tree_active_signature, etc.) continue to
compile. Ensure transactional helpers (chunk_store::with_connection calls) and
logging remain reachable after the split and run cargo test to fix any
visibility/import issues.

In `@src/openhuman/memory_tree/retrieval/source.rs`:
- Around line 1-690: The module is too large — split out the semantic rerank
logic and the tests into sibling modules: move rerank_by_semantic_similarity
(and its helper imports like build_embedder_from_config, cosine_similarity and
any HashMap/embedding lookup code) into a new retrieval/semantic.rs module and
export it (pub(crate) or pub as needed), and move the entire #[cfg(test)] mod
tests into retrieval/tests.rs (or retrieval/source_tests.rs) as a test-only
module that imports the public helpers from source.rs; update source.rs to
declare the new submodules (mod semantic; #[cfg(test)] mod tests;) and replace
internal calls like rerank_by_semantic_similarity(...) and any moved helper
references with the re-exported symbols, adjust visibility of
collect_hits_and_nodes/select_trees/scope_matches_kind if tests need access
(make them pub(crate) instead of fn), and fix imports/usages so compilation and
tests still pass.

In `@src/openhuman/memory_tree/tools/walk.rs`:
- Around line 1-957: Split this large module into focused sibling modules to
keep file size under ~500 lines: keep the public API and core loop
(MemoryTreeWalkTool, run_walk, WalkOptions, WalkOutcome, WalkStep,
WalkStopReason) in the original file and move parsing, adapter, inner helpers,
and tests into new modules; specifically extract parse_walk_tool_calls and
InnerCall into a parser module, move ChatProviderAdapter into an adapter module,
move dispatch_inner_call, build_node_context, build_system_prompt,
build_inner_tools_text, and synthesize_fallback_answer into a helpers (or
primitives) module, and relocate the #[cfg(test)] test module to a tests
module/file; update the original file to import these with mod/use and re-export
symbols if needed so run_walk and the Tool implementation still call
parser::parse_walk_tool_calls, adapter::ChatProviderAdapter, and
helpers::dispatch_inner_call (and helpers::build_node_context,
build_system_prompt, build_inner_tools_text, synthesize_fallback_answer) with
minimal changes to function signatures.

---

Nitpick comments:
In `@src/bin/slack_backfill.rs`:
- Around line 149-577: main is too large and should be split into focused
subcommand handlers; extract the big conditional blocks into separate
functions/modules and dispatch from a small main that only does init and CLI
parsing. Specifically: move the seal-probe block into a new handler function
(e.g. handle_seal_probe) that takes (&Cli, &Config) and uses ingest_chat; move
the SLACK_SEARCH_MESSAGES probe into handle_probe_search(&Cli, &Config,
&client_kind) which calls execute_action; move the probe_ratelimit loop into
handle_probe_ratelimit(&Cli, &Config, &client_kind) (preserve Outcome enum and
list_connections_via_kind usage); extract the search backfill loop that calls
run_backfill_via_search into handle_search_backfill(&Cli, &Config,
&connections); and extract the non-search per-connection backfill (provider.sync
loop) into handle_sync_backfill(&Cli, &Config, &provider, &candidates). Keep
init_default_providers, memory::global::init, tracing/env_logger setup and
create_composio_client in main, then dispatch to these handlers based on CLI
flags; wire return Result<()> through each handler and move related helper
imports (chrono, ingest_chat, execute_action, list_connections_via_kind,
ProviderContext) into their new modules.

In `@src/core/all.rs`:
- Around line 188-190: The all.rs module is becoming too large due to direct
aggregation of many controllers/schemas; extract the registration logic into
smaller focused modules (e.g., create new modules like
openhuman::memory_tree::registration and
openhuman::memory_tree::retrieval_registration) that each expose functions such
as all_memory_tree_registered_controllers and
all_retrieval_registered_controllers (or analogous names) and move the
schema/controller builder code into those modules, then in all.rs simply call
controllers.extend(...) with those exported helper functions so all.rs only
composes registrations instead of containing their implementations.

In `@src/openhuman/channels/runtime/startup.rs`:
- Around line 42-635: The start_channels function has grown too large; split its
responsibilities into smaller functions/modules: extract bus/subscriber
registration into a new module/function (e.g., register_startup_subscribers)
that encapsulates calls to event_bus::init_global, TracingSubscriber subscribe,
register_health_subscriber, register_skill_cleanup_subscriber, the Phase 2–4
learning OnceLock blocks and cron/proactive/tree subscribers; extract
provider/memory/bootstrap logic into a new function (e.g.,
bootstrap_provider_and_memory) that returns (provider, mem, runtime, security,
audit, provider_runtime_options) and contains
create_intelligent_routing_provider, provider.warmup,
host_runtime::create_runtime, SecurityPolicy::from_config,
get_or_create_workspace_audit_logger, memory::create_memory_with_local_ai;
extract channel list construction into its own function (e.g.,
build_channel_list) that returns Vec<Arc<dyn Channel>> and contains all
config.channels_config.* branch logic and spawn_supervised_listener wiring; keep
start_channels to orchestrate the high-level flow (call the new functions,
compute backoff/limits, create runtime_ctx, and call run_message_dispatch_loop).
Refactor by moving extracted code into new files/modules, keeping existing
symbol names (start_channels, spawn_supervised_listener,
run_message_dispatch_loop, build_system_prompt, tools::all_tools_with_runtime)
so callers remain unchanged and tests compile.

In `@src/openhuman/context/segment_recap_summarizer_tests.rs`:
- Line 19: The test module in segment_recap_summarizer_tests.rs has grown past
the ~500-line guideline; split it into smaller focused test modules (e.g.,
segment_recap_summarizer_unit_tests.rs,
segment_recap_summarizer_integration_tests.rs) by moving related test functions
into new files, preserving imports like ChatPrompt and any helper fixtures,
exporting or re-exporting shared helpers via a common mod (or a tests/util.rs)
so tests still compile, and update the parent mod declarations (pub mod ... or
mod ...) so the test suite runs unchanged; ensure each new file contains the
appropriate use crate::openhuman::memory_tree::chat::ChatPrompt import and
adjust visibility of helpers as needed.

In `@src/openhuman/memory_tree/read_rpc.rs`:
- Around line 34-39: The file src/openhuman/memory_tree/read_rpc.rs is too large
and must be split by concern; create smaller modules (e.g.,
list_search_recall.rs for listing/search/recall logic that uses content_read and
NodeKind/SourceKind, mutations.rs for mutation endpoints that use
chunk_store/with_connection and score_store, graph_export.rs for graph export
code, and llm_config.rs for LLM configuration and related RPCs), move the
corresponding functions/types into those files, export them from a new mod.rs or
update the parent mod to pub use the new modules, update imports in callers to
reference the new module paths instead of read_rpc.rs, and run cargo check to
fix any visibility or import issues.

In `@src/openhuman/memory_tree/retrieval/rpc.rs`:
- Around line 310-313: Move the large inline #[cfg(test)] test block out of this
module into a sibling rpc_test.rs to reduce rpc.rs size: cut the entire test
module from src/openhuman/memory_tree/retrieval/rpc.rs and paste it into
src/openhuman/memory_tree/retrieval/rpc_test.rs, update imports inside the new
file to use the same symbols (content_store, upsert_chunks, and
types::{chunk_id, Chunk, Metadata, SourceRef}, chrono::Utc) and any
crate-relative paths so the tests compile, and remove the #[cfg(test)] section
from rpc.rs so only the production handler code remains in that file. Ensure the
new rpc_test.rs has the appropriate use declarations and #[cfg(test)] mod so
cargo test picks it up.

In `@src/openhuman/memory_tree/retrieval/topic.rs`:
- Around line 20-28: The topic.rs module has grown too large; split it into
focused submodules (e.g., query.rs, rerank.rs, hydrate.rs, test_helpers.rs) and
move the corresponding functionality into them: relocate query-related
functions/types into query.rs, reranking logic into rerank.rs,
hydration/assembly code into hydrate.rs, and any test helpers into
test_helpers.rs; then add mod declarations in topic.rs (mod query; mod rerank;
mod hydrate; mod test_helpers;) and re-export the public APIs you need (pub use
query::..., etc.), update all internal uses/imports (e.g., hit_from_summary,
QueryResponse, RetrievalHit, build_embedder_from_config, cosine_similarity,
lookup_entity, EntityHit, Tree/TreeKind) to the new module paths, and adjust
visibility (pub/pub(crate)) so existing callers keep working and tests compile.

In `@src/openhuman/memory_tree/store_tests.rs`:
- Around line 47-771: The test file is too large; split it into focused test
modules (e.g., connection_cache, journaling/migration, reembed, schema_init) by
moving related tests into new test files and keeping shared helpers in a single
test_helpers module. For each group, create a new test module/file containing
the tests that reference the same behavior (e.g., move
with_connection_serialises_concurrent_schema_init,
is_transient_cold_start_classifies_known_extended_codes,
with_connection_keeps_foreign_keys_on_for_every_call,
memory_tree_uses_truncate_journal_not_wal, existing_wal_db_migrates_to_truncate
into a journaling/schema module; move
connection_cache_returns_same_arc_for_same_workspace,
connection_cache_uses_separate_connections_for_different_workspaces,
circuit_breaker_trips_after_threshold into a connection_cache module; move
clear_chunk_reembed_skipped_is_idempotent,
clear_reembed_skipped_for_signature_removes_all_tombstones_for_sig,
validate_reembed_skip_key into a reembed module; keep
legacy_embeddings_migrate_to_sidecar_once with embedding-related helpers),
update imports to use the shared helpers (e.g., test_config, sample_chunk,
with_connection, get_or_init_connection, clear_connection_cache,
try_cleanup_stale_files, clear_connection_cache, mark_chunk_reembed_skipped,
clear_reembed_skipped_for_signature, validate_reembed_skip_key), and add mod
declarations so Cargo runs them as tests; ensure visibility of helper functions
(pub(crate) or move to a common tests/helpers module) so the split files compile
and tests run.

In `@src/openhuman/memory_tree/tools/walk.rs`:
- Around line 452-466: The parser currently silently skips malformed <tool_call>
blocks; update the block in walk.rs (the branch handling `None => break` and the
`Some(close_idx)` branch that parses `after_open`) to emit diagnostic logs: when
`None` occurs log a debug/error with context (e.g., the remaining `after_open`
content) indicating an unclosed/malformed tool_call, when
`serde_json::from_str::<Value>(inner.trim())` returns Err log the JSON parse
error and the `inner` string, and when `val.get("name")` is missing or not a
string log a debug entry showing the parsed `val`; reference the variables
`after_open`, `close_idx`, `inner`, the `calls` push of `InnerCall`, and ensure
logs use the crate logger (e.g., log::debug!/log::error!) with concise
contextual messages.

In `@src/openhuman/memory_tree/tree_global/seal.rs`:
- Around line 20-33: This file exceeds the repository size ceiling; split its
sealing logic and tests into smaller modules: extract core sealing functions and
types (the main seal implementation that uses stage_summary,
SummaryComposeInput, SummaryTreeKind, new_summary_id, with_connection, and
store) into a focused seal_core.rs, move configuration/threshold constants
(GLOBAL_TOKEN_BUDGET, MONTHLY_SEAL_THRESHOLD, WEEKLY_SEAL_THRESHOLD,
YEARLY_SEAL_THRESHOLD) into a seal_config.rs, isolate embedding/score helper
logic that uses build_embedder_from_config into seal_embed.rs, and place
summariser-related code (Summariser, SummaryContext, SummaryInput) and
Tree/Buffer types (Buffer, SummaryNode, Tree, TreeKind) in a seal_summariser.rs
or re-export them from the new modules; also move large test cases into a
parallel tests/ module/file so each source file stays under ~500 lines and
update module declarations and re-exports accordingly.

In `@tests/memory_tree_summarizer_e2e.rs`:
- Around line 1-579: The file exceeds the 500-line guideline; split
helper/harness code out of the test module into a smaller helper module and keep
only the three scenario tests in the e2e test file. Move EnvVarGuard,
ENV_LOCK/env_lock, ScriptedProvider (and its Provider impl), build_config,
ts_hour14/ts_hour15, and NS into a new module (e.g., summarizer_harness) and
make those items public, then in the original tests file replace the moved
definitions with a mod/use to import summarizer_harness::{EnvVarGuard, env_lock,
ScriptedProvider, build_config, ts_hour14, ts_hour15, NS}; ensure visibility
changes (pub) where needed and update imports so engine::run_summarization and
store::* usages in the three test functions remain unchanged.

In `@tests/memory_tree_walk_e2e.rs`:
- Around line 1-536: The module is too large; extract shared test utilities into
a sibling test-support module: move ScriptedResponder, env_lock/ENV_LOCK,
test_config, make_node, seed_tree, make_provider and any helper imports (e.g.,
derive_parent_id, level_from_node_id, estimate_tokens, write_node) into a new
test-support file/module, re-export or pub use the necessary symbols, then
update this test file to import those helpers and keep only the scenario tests
(walks_descend_fetch_answer, respects_max_turns_cap_with_mock,
handles_unknown_node_gracefully) plus their local setup; ensure run_walk,
WalkOptions and WalkStopReason usages remain unchanged and update module paths
so the tests compile.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5ba91539-c9af-4891-8751-7a160758246e

📥 Commits

Reviewing files that changed from the base of the PR and between 6a06bae and 969ca5a.

📒 Files selected for processing (174)
  • src/bin/gmail_backfill_3d.rs
  • src/bin/memory_tree_init_smoke.rs
  • src/bin/slack_backfill.rs
  • src/core/all.rs
  • src/core/cli.rs
  • src/core/jsonrpc.rs
  • src/openhuman/agent/harness/archivist.rs
  • src/openhuman/agent/harness/archivist_tests.rs
  • src/openhuman/agent/harness/payload_summarizer.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/subagent_runner/handoff.rs
  • src/openhuman/agent/tree_loader.rs
  • src/openhuman/channels/runtime/startup.rs
  • src/openhuman/composio/ops_test.rs
  • src/openhuman/composio/providers/gmail/ingest.rs
  • src/openhuman/composio/providers/profile.rs
  • src/openhuman/composio/providers/slack/ingest.rs
  • src/openhuman/config/ops.rs
  • src/openhuman/context/segment_recap_summarizer_tests.rs
  • src/openhuman/doctor/core.rs
  • src/openhuman/doctor/core_tests.rs
  • src/openhuman/inference/local/model_requirements.rs
  • src/openhuman/memory/mod.rs
  • src/openhuman/memory/ops/learn.rs
  • src/openhuman/memory/stm_recall/recall_tests.rs
  • src/openhuman/memory/sync_status/rpc.rs
  • src/openhuman/memory_tree/README.md
  • src/openhuman/memory_tree/canonicalize/README.md
  • src/openhuman/memory_tree/canonicalize/chat.rs
  • src/openhuman/memory_tree/canonicalize/document.rs
  • src/openhuman/memory_tree/canonicalize/email.rs
  • src/openhuman/memory_tree/canonicalize/email_clean.rs
  • src/openhuman/memory_tree/canonicalize/mod.rs
  • src/openhuman/memory_tree/chat/cloud.rs
  • src/openhuman/memory_tree/chat/local.rs
  • src/openhuman/memory_tree/chat/mod.rs
  • src/openhuman/memory_tree/chunker.rs
  • src/openhuman/memory_tree/content_store/README.md
  • src/openhuman/memory_tree/content_store/atomic.rs
  • src/openhuman/memory_tree/content_store/compose.rs
  • src/openhuman/memory_tree/content_store/mod.rs
  • src/openhuman/memory_tree/content_store/obsidian.rs
  • src/openhuman/memory_tree/content_store/obsidian_defaults/graph.json
  • src/openhuman/memory_tree/content_store/obsidian_defaults/types.json
  • src/openhuman/memory_tree/content_store/paths.rs
  • src/openhuman/memory_tree/content_store/raw.rs
  • src/openhuman/memory_tree/content_store/read.rs
  • src/openhuman/memory_tree/content_store/tags.rs
  • src/openhuman/memory_tree/ingest.rs
  • src/openhuman/memory_tree/jobs/README.md
  • src/openhuman/memory_tree/jobs/handlers/README.md
  • src/openhuman/memory_tree/jobs/handlers/mod.rs
  • src/openhuman/memory_tree/jobs/mod.rs
  • src/openhuman/memory_tree/jobs/redact.rs
  • src/openhuman/memory_tree/jobs/scheduler.rs
  • src/openhuman/memory_tree/jobs/store.rs
  • src/openhuman/memory_tree/jobs/testing.rs
  • src/openhuman/memory_tree/jobs/types.rs
  • src/openhuman/memory_tree/jobs/worker.rs
  • src/openhuman/memory_tree/mod.rs
  • src/openhuman/memory_tree/read_rpc.rs
  • src/openhuman/memory_tree/retrieval/README.md
  • src/openhuman/memory_tree/retrieval/benchmarks.rs
  • src/openhuman/memory_tree/retrieval/drill_down.rs
  • src/openhuman/memory_tree/retrieval/fetch.rs
  • src/openhuman/memory_tree/retrieval/global.rs
  • src/openhuman/memory_tree/retrieval/integration_test.rs
  • src/openhuman/memory_tree/retrieval/mod.rs
  • src/openhuman/memory_tree/retrieval/rpc.rs
  • src/openhuman/memory_tree/retrieval/schemas.rs
  • src/openhuman/memory_tree/retrieval/search.rs
  • src/openhuman/memory_tree/retrieval/source.rs
  • src/openhuman/memory_tree/retrieval/topic.rs
  • src/openhuman/memory_tree/retrieval/types.rs
  • src/openhuman/memory_tree/rpc.rs
  • src/openhuman/memory_tree/schemas.rs
  • src/openhuman/memory_tree/score/README.md
  • src/openhuman/memory_tree/score/embed/README.md
  • src/openhuman/memory_tree/score/embed/cloud.rs
  • src/openhuman/memory_tree/score/embed/factory.rs
  • src/openhuman/memory_tree/score/embed/inert.rs
  • src/openhuman/memory_tree/score/embed/mod.rs
  • src/openhuman/memory_tree/score/embed/ollama.rs
  • src/openhuman/memory_tree/score/extract/README.md
  • src/openhuman/memory_tree/score/extract/extractor.rs
  • src/openhuman/memory_tree/score/extract/llm.rs
  • src/openhuman/memory_tree/score/extract/llm_tests.rs
  • src/openhuman/memory_tree/score/extract/mod.rs
  • src/openhuman/memory_tree/score/extract/regex.rs
  • src/openhuman/memory_tree/score/extract/types.rs
  • src/openhuman/memory_tree/score/mod.rs
  • src/openhuman/memory_tree/score/mod_tests.rs
  • src/openhuman/memory_tree/score/resolver.rs
  • src/openhuman/memory_tree/score/signals/README.md
  • src/openhuman/memory_tree/score/signals/interaction.rs
  • src/openhuman/memory_tree/score/signals/metadata_weight.rs
  • src/openhuman/memory_tree/score/signals/mod.rs
  • src/openhuman/memory_tree/score/signals/ops.rs
  • src/openhuman/memory_tree/score/signals/source_weight.rs
  • src/openhuman/memory_tree/score/signals/token_count.rs
  • src/openhuman/memory_tree/score/signals/types.rs
  • src/openhuman/memory_tree/score/signals/unique_words.rs
  • src/openhuman/memory_tree/score/store.rs
  • src/openhuman/memory_tree/score/store_tests.rs
  • src/openhuman/memory_tree/store.rs
  • src/openhuman/memory_tree/store_tests.rs
  • src/openhuman/memory_tree/summarizer/bus.rs
  • src/openhuman/memory_tree/summarizer/cli.rs
  • src/openhuman/memory_tree/summarizer/engine.rs
  • src/openhuman/memory_tree/summarizer/engine_tests.rs
  • src/openhuman/memory_tree/summarizer/mod.rs
  • src/openhuman/memory_tree/summarizer/ops.rs
  • src/openhuman/memory_tree/summarizer/schemas.rs
  • src/openhuman/memory_tree/summarizer/store.rs
  • src/openhuman/memory_tree/summarizer/store_tests.rs
  • src/openhuman/memory_tree/summarizer/types.rs
  • src/openhuman/memory_tree/tools/drill_down.rs
  • src/openhuman/memory_tree/tools/fetch_leaves.rs
  • src/openhuman/memory_tree/tools/ingest_document.rs
  • src/openhuman/memory_tree/tools/mod.rs
  • src/openhuman/memory_tree/tools/query_global.rs
  • src/openhuman/memory_tree/tools/query_source.rs
  • src/openhuman/memory_tree/tools/query_topic.rs
  • src/openhuman/memory_tree/tools/search_entities.rs
  • src/openhuman/memory_tree/tools/walk.rs
  • src/openhuman/memory_tree/tree_global/README.md
  • src/openhuman/memory_tree/tree_global/digest.rs
  • src/openhuman/memory_tree/tree_global/digest_tests.rs
  • src/openhuman/memory_tree/tree_global/mod.rs
  • src/openhuman/memory_tree/tree_global/recap.rs
  • src/openhuman/memory_tree/tree_global/registry.rs
  • src/openhuman/memory_tree/tree_global/seal.rs
  • src/openhuman/memory_tree/tree_source/README.md
  • src/openhuman/memory_tree/tree_source/bucket_seal.rs
  • src/openhuman/memory_tree/tree_source/bucket_seal_tests.rs
  • src/openhuman/memory_tree/tree_source/flush.rs
  • src/openhuman/memory_tree/tree_source/mod.rs
  • src/openhuman/memory_tree/tree_source/registry.rs
  • src/openhuman/memory_tree/tree_source/source_file.rs
  • src/openhuman/memory_tree/tree_source/store.rs
  • src/openhuman/memory_tree/tree_source/store_tests.rs
  • src/openhuman/memory_tree/tree_source/summariser/README.md
  • src/openhuman/memory_tree/tree_source/summariser/inert.rs
  • src/openhuman/memory_tree/tree_source/summariser/llm.rs
  • src/openhuman/memory_tree/tree_source/summariser/mod.rs
  • src/openhuman/memory_tree/tree_source/types.rs
  • src/openhuman/memory_tree/tree_topic/README.md
  • src/openhuman/memory_tree/tree_topic/backfill.rs
  • src/openhuman/memory_tree/tree_topic/curator.rs
  • src/openhuman/memory_tree/tree_topic/hotness.rs
  • src/openhuman/memory_tree/tree_topic/mod.rs
  • src/openhuman/memory_tree/tree_topic/registry.rs
  • src/openhuman/memory_tree/tree_topic/routing.rs
  • src/openhuman/memory_tree/tree_topic/store.rs
  • src/openhuman/memory_tree/tree_topic/types.rs
  • src/openhuman/memory_tree/types.rs
  • src/openhuman/memory_tree/util/README.md
  • src/openhuman/memory_tree/util/mod.rs
  • src/openhuman/memory_tree/util/redact.rs
  • src/openhuman/mod.rs
  • src/openhuman/subconscious/engine.rs
  • src/openhuman/subconscious/situation_report/digest.rs
  • src/openhuman/subconscious/situation_report/hotness.rs
  • src/openhuman/subconscious/situation_report/query_window.rs
  • src/openhuman/subconscious/situation_report/summaries.rs
  • src/openhuman/subconscious/source_chunk.rs
  • src/openhuman/test_support/rpc.rs
  • src/openhuman/tools/impl/memory/mod.rs
  • src/openhuman/tools/ops.rs
  • src/openhuman/whatsapp_data/sqlite_retry.rs
  • tests/agent_retrieval_e2e.rs
  • tests/json_rpc_e2e.rs
  • tests/memory_tree_summarizer_e2e.rs
  • tests/memory_tree_walk_e2e.rs
💤 Files with no reviewable changes (1)
  • src/openhuman/memory/mod.rs

Comment thread src/core/jsonrpc.rs
Comment thread src/openhuman/composio/providers/slack/ingest.rs
Comment thread src/openhuman/memory_tree/chunker.rs
Comment thread src/openhuman/memory_tree/store.rs
Comment thread src/openhuman/memory_tree/tools/walk.rs
Comment thread src/openhuman/memory_tree/tools/walk.rs Outdated
Comment thread src/openhuman/subconscious/engine.rs
- walk.rs StubProvider: replace `drain(0..1)` (panics when queue empty) with an
  explicit empty-check + `remove(0)`, returning the intended
  `"no more scripted responses"` error.
- docs/whatsapp-data-flow.md: point at the new `memory_tree/tools/` path
  (broken Lychee link from the consolidation refactor).
- composio/providers/slack/ingest.rs: update docstring references from
  `memory::tree::*` to `memory_tree::*` to match the post-refactor imports.
@senamakel senamakel merged commit cf600a9 into tinyhumansai:main May 24, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. feature Net-new user-facing capability or product behavior. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant