feat: add 6 Boost VC AI capability domains#2261
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds six OpenHuman domains (voice_assistant, live_captions, voice_actions, operator_inbox, chat_with_data, guided_flows) with engines, types, RPC handlers, schemas, persistence/connectors, shared utilities, capability catalog updates, Cargo dependency additions, and JSON-RPC E2E tests. ChangesAll domains and integration
Estimated code review effort: Possibly related PRs:
Suggested reviewers:
|
7049d70 to
77b0c4c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (24)
src/openhuman/inference/provider/reliable.rs-1039-1046 (1)
1039-1046:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRetry loop within a single stream candidate is ineffective.
After a stream yields an error, the code sleeps and re-polls the same
candidate_stream. However, HTTP-based streams don't recover after errors—once a connection fails or returns an error, continued polling will typically yieldNoneor more errors, not fresh data. The backoff time is wasted since the stream cannot initiate a new request.The comment "stream may yield more items" is misleading: the stream might yield additional errors or
None, but it won't recover to yield successful chunks.🐛 Suggested fix: remove ineffective in-stream retry
- Some(Err(ref e)) => { - let non_retryable = is_stream_error_non_retryable(e); - - tracing::warn!( - provider = provider_name, - model = current_model, - attempt = attempts + 1, - error = %e, - "Streaming failed{}", if non_retryable { " (non-retryable)" } else { "" } - ); - - if non_retryable || attempts >= max_retries { - break; // Move to next candidate - } - - attempts += 1; - tokio::time::sleep(Duration::from_millis(backoff_ms)).await; - backoff_ms = (backoff_ms.saturating_mul(2)).min(10_000); - // Continue inner loop — stream may yield more items - } + Some(Err(ref e)) => { + tracing::warn!( + provider = provider_name, + model = current_model, + error = %e, + non_retryable = is_stream_error_non_retryable(e), + "Streaming failed, moving to next candidate" + ); + break; // Move to next candidate — can't retry same stream + }If actual retries are desired, streams must be created lazily inside the spawned task so a new request can be initiated per retry attempt.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/inference/provider/reliable.rs` around lines 1039 - 1046, The in-stream retry inside the candidate_stream polling loop is ineffective; when the stream (from candidate_stream) yields an error you should stop polling that stream and re-create a fresh HTTP stream for a new attempt instead of sleeping and repolling the same stream. Modify the logic in the function that drives candidate_stream (the block containing the attempts/backoff variables) to remove the inner “attempts += 1; sleep; backoff_ms = …; continue” retry path when an error is returned from the stream; instead break out of the current stream loop on error and perform retries by recreating the stream (i.e., move the retry loop to wrap stream creation so each attempt constructs a new candidate_stream) or spawn the task to lazily create a new HTTP stream per retry attempt.src/openhuman/chat_with_data/rpc.rs-6-23 (1)
6-23:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail fast on missing required RPC inputs instead of defaulting.
Line 6, Line 7, Line 22, Line 30, Line 31, Line 41, and Line 67 silently substitute defaults (
"unnamed","csv",0,"") for required fields. This accepts invalid requests and can create incorrect state.Suggested direction
pub async fn handle_query(p: Map<String, Value>) -> Result<Value, String> { - let id = p.get("dataset_id").and_then(|v| v.as_str()).unwrap_or(""); - let question = p.get("question").and_then(|v| v.as_str()).unwrap_or(""); + let Some(id) = p.get("dataset_id").and_then(|v| v.as_str()) else { + return Ok(json!({"ok":false,"error":"missing required field: dataset_id"})); + }; + let Some(question) = p.get("question").and_then(|v| v.as_str()) else { + return Ok(json!({"ok":false,"error":"missing required field: question"})); + };Also applies to: 30-32, 41-42, 67-67
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/rpc.rs` around lines 6 - 23, The current RPC handler silently substitutes defaults for required fields (variables name, source, columns, row_count) before calling engine::register_dataset, allowing invalid requests to proceed; change the parsing to validate and fail fast: check that p.get("name"), p.get("source") (and map to a valid DataSource variant), p.get("columns") as an array, and p.get("row_count") are present and of the expected types, return an error or Err early (instead of unwrap_or defaults) when any required field is missing/invalid, and only call engine::register_dataset with validated values; ensure you surface useful error messages for each missing/invalid field so callers can correct requests.src/openhuman/chat_with_data/rpc.rs-5-74 (1)
5-74: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd structured debug/trace logs on RPC entry, exit, and error branches.
There are no RPC-level logs for method entry/exit or failures. Please add stable-prefix logs (for example,
[chat_with_data][rpc]) with method anddataset_idwhere available.As per coding guidelines: use
log/tracingatdebugortracelevel on RPC entry/exit, error paths, and state transitions with grep-friendly prefixes and correlation fields.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/rpc.rs` around lines 5 - 74, Add structured debug/trace logs at RPC entry, exit, and on error paths for each handler (handle_register_dataset, handle_query, handle_generate_insight, handle_list_datasets, handle_list_insights, handle_get_dataset). At the top of each function emit an entry log like "[chat_with_data][rpc] enter <handler>" with correlation fields (e.g., dataset_id, name, question) where available; on successful return emit an exit log "[chat_with_data][rpc] exit <handler>" with result identifiers (dataset_id, insight_id, number of results); on errors log "[chat_with_data][rpc] error <handler>" including the error string and the same correlation fields. Use the project's logging facade (log::debug or tracing::debug/trace per style guide) and keep the stable prefix "[chat_with_data][rpc]" so logs are grep-friendly.src/openhuman/chat_with_data/rpc.rs-5-66 (1)
5-66: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdopt
RpcOutcome<T>for all RPC handlers.These handlers currently use
Result<Value, String>directly instead of the domain RPC return pattern, which makes this module inconsistent with core controller conventions.As per coding guidelines:
src/openhuman/*/rpc.rsandsrc/openhuman/**/rpc.rsrequire using theRpcOutcome<T>pattern for RPC controller return types.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/rpc.rs` around lines 5 - 66, Several RPC handlers (handle_register_dataset, handle_query, handle_generate_insight, handle_list_datasets, handle_list_insights, handle_get_dataset) return Result<Value, String> instead of the project’s RpcOutcome<T> pattern; change each function signature to return RpcOutcome<Value>, convert returned Ok(json!(...)) into RpcOutcome::ok(json!(...)) and convert error branches into RpcOutcome::err(...) (or RpcOutcome::err_string) as appropriate, and add any necessary use/import for RpcOutcome so compilation succeeds. Ensure you update all match arms (e.g., engine::query_dataset and engine::generate_insight branches) to produce RpcOutcome variants instead of Result and keep payload shapes identical.src/openhuman/chat_with_data/engine.rs-36-46 (1)
36-46:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrevent silent dataset overwrite on name collisions.
Line 36 derives
idfrom normalizedname, and Line 45 inserts unconditionally. Re-registering the same dataset name overwrites existing metadata without warning.Suggested fix
pub fn register_dataset( name: &str, source: DataSource, columns: Vec<String>, row_count: u64, ) -> DatasetMeta { - let id = format!("ds-{}", name.to_lowercase().replace(' ', "_")); + let base = format!("ds-{}", name.to_lowercase().replace(' ', "_")); + let mut store = DATASETS.lock().unwrap(); + let mut id = base.clone(); + let mut n = 1u32; + while store.contains_key(&id) { + id = format!("{base}-{n}"); + n += 1; + } let d = DatasetMeta { id: id.clone(), name: name.into(), source, columns, row_count, registered_at: now_epoch(), }; - DATASETS.lock().unwrap().insert(id, d.clone()); + store.insert(id, d.clone()); tracing::info!(dataset_id = %d.id, "[chat_with_data] dataset registered"); d }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/engine.rs` around lines 36 - 46, The code currently computes id from name and unconditionally inserts into DATASETS, silently overwriting existing entries (id, DatasetMeta). Before inserting, acquire the DATASETS lock and check for an existing entry for that id; if one exists, either return an error / early-continue or log a warning with the existing dataset id and do not overwrite (use tracing::warn or tracing::error). If you want to allow re-registration only when metadata is identical, compare the existing DatasetMeta with the new DatasetMeta and only insert if they match; otherwise avoid the insert and surface the conflict. Update the block that creates id and calls DATASETS.lock().unwrap().insert(...) to perform this existence check and conditional behavior.src/openhuman/operator_inbox/rpc.rs-12-14 (1)
12-14:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate required RPC fields instead of silently defaulting.
Defaulting required inputs (
sender,subject,body,follow_up_at) to empty/zero accepts malformed requests and can persist invalid triage data.Suggested validation pattern
- let sender = p.get("sender").and_then(|v| v.as_str()).unwrap_or(""); - let subject = p.get("subject").and_then(|v| v.as_str()).unwrap_or(""); - let body = p.get("body").and_then(|v| v.as_str()).unwrap_or(""); + let sender = match p.get("sender").and_then(|v| v.as_str()) { + Some(v) if !v.is_empty() => v, + _ => return Ok(json!({"ok":false,"error":"missing required field: sender"})), + }; + let subject = match p.get("subject").and_then(|v| v.as_str()) { + Some(v) if !v.is_empty() => v, + _ => return Ok(json!({"ok":false,"error":"missing required field: subject"})), + }; + let body = match p.get("body").and_then(|v| v.as_str()) { + Some(v) if !v.is_empty() => v, + _ => return Ok(json!({"ok":false,"error":"missing required field: body"})), + }; ... - let at = p.get("follow_up_at").and_then(|v| v.as_u64()).unwrap_or(0); + let at = match p.get("follow_up_at").and_then(|v| v.as_u64()) { + Some(v) => v, + None => return Ok(json!({"ok":false,"error":"missing required field: follow_up_at"})), + };Also applies to: 39-40
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/operator_inbox/rpc.rs` around lines 12 - 14, The code currently defaults required RPC inputs (sender, subject, body, and follow_up_at) to empty/zero which accepts malformed requests; update the request parsing in rpc.rs to validate these fields instead: for sender, subject, body (and follow_up_at at the other location) verify p.get(...).and_then(|v| v.as_str()) (or as_i64/as_u64 for follow_up_at) returns Some and return an RPC error/early Err response when any required field is missing or wrong-typed, rather than using unwrap_or defaults; reference the variables sender, subject, body and follow_up_at in the validation so the handler rejects invalid requests and logs/returns a clear error.src/openhuman/operator_inbox/engine.rs-117-123 (1)
117-123:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrevent ID collisions in triage record generation.
uuid_v4()is time-derived and can produce duplicate IDs under concurrent/same-resolution calls;HashMap::insertthen silently overwrites an existing triage record.Suggested fix
+use std::sync::atomic::{AtomicU64, Ordering}; + +static ID_SEQ: AtomicU64 = AtomicU64::new(0); + fn uuid_v4() -> String { use std::time::{SystemTime, UNIX_EPOCH}; let t = SystemTime::now() .duration_since(UNIX_EPOCH) .unwrap_or_default(); - format!("oi-{:x}-{:x}", t.as_secs(), t.subsec_nanos()) + let seq = ID_SEQ.fetch_add(1, Ordering::Relaxed); + format!("oi-{:x}-{:x}-{:x}", t.as_secs(), t.subsec_nanos(), seq) }Also applies to: 32-32
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/operator_inbox/engine.rs` around lines 117 - 123, The uuid_v4() function generates time-derived IDs which can collide under concurrent/same-resolution calls (causing HashMap::insert to overwrite triage records); replace uuid_v4() with a proper random UUID v4 generator (e.g., use the uuid crate Uuid::new_v4() or otherwise combine the current time with cryptographic randomness) so each triage record ID is globally unique; update any call sites that rely on uuid_v4 (the uuid_v4 function and places where its output is used as triage record keys) to use the new UUID generation approach.src/openhuman/operator_inbox/engine.rs-33-33 (1)
33-33: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd debug/trace diagnostics for state transitions and error branches.
Current logs are sparse and at
info; this module mutates triage state in multiple paths (generate_draft,schedule_followup,archive_triage) and returns not-found errors without diagnostic traces.As per coding guidelines: "Use
log/tracingatdebugortracelevel on ... error paths, state transitions ..."Also applies to: 65-69, 37-93
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/operator_inbox/engine.rs` at line 33, Replace the lone info-level log at the tracing::info! call with finer-grained debug/trace diagnostics: change the current info log to debug or trace and add additional trace/debug logs at each triage state mutation and error path inside generate_draft, schedule_followup, and archive_triage so transitions and reasons are recorded (e.g., old_state -> new_state, IDs like rec.id, priority, and contextual data). Also add debug/trace logging immediately before returning not-found or other errors to include diagnostic context (triage_id, expected state, and any relevant identifiers) and keep the existing info-level message only for high-level successful milestones. Ensure logs use structured fields (triage_id = %rec.id, priority = ?rec.priority, etc.) consistent with the existing tracing usage.src/openhuman/operator_inbox/rpc.rs-5-71 (1)
5-71: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd debug/trace logs on RPC entry, exit, and error responses.
These handlers currently have no RPC diagnostics, which makes request-path debugging and incident triage harder.
As per coding guidelines: "Use
log/tracingatdebugortracelevel on RPC entry and exit, error paths..."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/operator_inbox/rpc.rs` around lines 5 - 71, Each RPC handler (handle_triage_message, handle_generate_draft, handle_schedule_followup, handle_get_triage, handle_list_triage, handle_archive) must emit trace/debug logs on entry (including relevant request parameters like source/sender/triage_id/tone/follow_up_at but avoid logging sensitive body content), on successful exit (including key result identifiers such as triage_id, draft_id, follow_up_at) and on error paths (log the error returned from engine::triage_message, engine::generate_draft, engine::schedule_followup, engine::get_triage, engine::list_triage, engine::archive_triage). Use the project logging crate (log or tracing) at trace/debug for entry/exit and error for failures, include clear context fields (e.g., "handler" => "handle_generate_draft", "triage_id" => id) so logs can be correlated, and add logs immediately before calling the engine:: functions and immediately before returning the JSON response or error.src/openhuman/operator_inbox/rpc.rs-5-5 (1)
5-5: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winSwitch RPC handlers to
RpcOutcome<T>signatures.These handlers currently use
Result<Value, String>instead of the domain RPC contract type, which makes this module inconsistent with the core RPC pattern.As per coding guidelines: "Use the
RpcOutcome<T>pattern for RPC controller return types in domainrpc.rsfiles."Also applies to: 21-21, 38-38, 47-47, 57-57, 65-65
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/operator_inbox/rpc.rs` at line 5, Change the RPC handler signatures from Result<Value, String> to the domain RPC type RpcOutcome<Value> (e.g. pub async fn handle_triage_message(p: Map<String, Value>) -> RpcOutcome<Value>) and update all other RPC handlers in this file to the same pattern; import or path-resolve RpcOutcome (use crate::rpc::RpcOutcome or equivalent), convert existing Ok(value) returns to the RpcOutcome success form and convert Err(string) returns to the RpcOutcome error form (replace direct Result::Err/Err(String) usages with the RpcOutcome error constructor), and adjust any early-return error handling in functions like handle_triage_message to produce RpcOutcome errors instead of String results.src/openhuman/live_captions/store.rs-11-29 (1)
11-29:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrevent silent transcript overwrite on ID collision.
start_transcriptcurrently overwrites existing records whenidalready exists, anduuid_v4()is timestamp-based (collision-prone under concurrency). This can silently replace prior transcripts.Proposed fix
-pub fn start_transcript( +pub fn start_transcript( id: Option<String>, source: CaptionSource, title: Option<String>, -) -> Transcript { - let tid = id.unwrap_or_else(uuid_v4); +) -> Result<Transcript, String> { let now = now_epoch(); + let mut store = TRANSCRIPTS.lock().unwrap(); + let tid = if let Some(explicit) = id { + if store.contains_key(&explicit) { + return Err(format!("transcript already exists: {explicit}")); + } + explicit + } else { + let mut candidate = uuid_v4(); + while store.contains_key(&candidate) { + candidate = uuid_v4(); + } + candidate + }; let t = Transcript { id: tid.clone(), source, @@ - TRANSCRIPTS.lock().unwrap().insert(tid, t.clone()); + store.insert(tid, t.clone()); tracing::info!(transcript_id = %t.id, "[live_captions] transcript started"); - t + Ok(t) }Also applies to: 124-130
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/live_captions/store.rs` around lines 11 - 29, start_transcript currently may silently overwrite an existing Transcript when a provided id collides (and uuid_v4 is timestamp-based); update start_transcript to check TRANSCRIPTS.lock().unwrap() for an existing key before inserting: if an id was provided and already exists, return an error or a Result indicating collision; if no id provided, generate a non-collision id using a safe loop with a truly random UUID generator (replace uuid_v4 with a uuid::Uuid::new_v4()-based generator) and retry until TRANSCRIPTS does not contain the key; ensure the function signature and callers are adjusted to propagate/handle the Result to avoid silent overwrite.src/openhuman/live_captions/rpc.rs-6-100 (1)
6-100: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftAlign RPC handlers to
RpcOutcome<T>instead of rawResult<Value, String>.These handlers use
Result<Value, String>, which diverges from the required core RPC controller contract for domainrpc.rsfiles.As per coding guidelines, "Use
RpcOutcome<T>for all RPC controller return types in Rust core (see AGENTS.md for details)."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/live_captions/rpc.rs` around lines 6 - 100, Change these handlers to return RpcOutcome<Value> instead of Result<Value, String> and adapt their success/error returns accordingly: update the signatures of handle_start_transcript, handle_append_segment, handle_complete_transcript, handle_summarize_transcript, handle_get_transcript, and handle_list_transcripts to return RpcOutcome<Value>, import RpcOutcome if necessary, replace plain Ok(json!(...)) with the success constructor (e.g. RpcOutcome::ok(...) or equivalent) and replace Err/Ok-wrapped error branches to the RpcOutcome error constructor (e.g. RpcOutcome::err(...) or equivalent) so all return paths conform to the RpcOutcome<T> contract used by the core RPC controller.src/openhuman/live_captions/rpc.rs-25-41 (1)
25-41:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not silently coerce missing required fields in
append_segment.Using
unwrap_or("")/unwrap_or(0)for required inputs allows invalid writes (e.g., emptytext, missingtranscript_id) instead of deterministic request rejection.Proposed fix
pub async fn handle_append_segment(p: Map<String, Value>) -> Result<Value, String> { - let tid = p - .get("transcript_id") - .and_then(|v| v.as_str()) - .unwrap_or(""); + let tid = match p.get("transcript_id").and_then(|v| v.as_str()) { + Some(v) if !v.is_empty() => v, + _ => return Ok(json!({ "ok": false, "error": "missing required field: transcript_id" })), + }; + let text = match p.get("text").and_then(|v| v.as_str()) { + Some(v) if !v.is_empty() => v.to_string(), + _ => return Ok(json!({ "ok": false, "error": "missing required field: text" })), + }; + let start_ms = match p.get("start_ms").and_then(|v| v.as_u64()) { + Some(v) => v, + None => return Ok(json!({ "ok": false, "error": "missing required field: start_ms" })), + }; + let end_ms = match p.get("end_ms").and_then(|v| v.as_u64()) { + Some(v) => v, + None => return Ok(json!({ "ok": false, "error": "missing required field: end_ms" })), + }; let seg = CaptionSegment { - text: p - .get("text") - .and_then(|v| v.as_str()) - .unwrap_or("") - .to_string(), - start_ms: p.get("start_ms").and_then(|v| v.as_u64()).unwrap_or(0), - end_ms: p.get("end_ms").and_then(|v| v.as_u64()).unwrap_or(0), + text, + start_ms, + end_ms,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/live_captions/rpc.rs` around lines 25 - 41, The handler handle_append_segment must not silently coerce required fields; replace the unwrap_or("")/unwrap_or(0) usage for required inputs (transcript_id / text and any required numeric fields like start_ms/end_ms) with explicit validation: fetch tid from p.get("transcript_id") and return Err(String) if missing or not a string, likewise require a non-empty text from p.get("text") and return Err if absent/empty, and validate start_ms/end_ms (and other truly required fields like confidence/is_final if needed) by checking their types and returning Err with a clear message on failure; only construct the CaptionSegment once all required fields are validated, using optional mapping for truly optional fields such as speaker.src/openhuman/voice_assistant/rpc.rs-27-162 (1)
27-162: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd debug/trace RPC entry/exit and branch logs for full observability.
Only
start_session/stop_sessioncurrently emit info logs; other handlers and key branches (decode failures, not-found, spawn decision, poll empty/non-empty) should emit structured debug/trace logs with stable context fields.As per coding guidelines: “Use
log/tracingatdebugortracelevel on RPC entry and exit, error paths, state transitions, and any branch that is hard to infer from tests alone.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_assistant/rpc.rs` around lines 27 - 162, The handlers lack consistent debug/trace logging on RPC entry/exit and key branches; add structured debug/trace logs in handle_start_session, handle_push_audio, handle_poll_response, handle_get_status and handle_stop_session: log entry with request context (e.g., session_id, stt_provider, tts_provider) at start of each handler, log parameter parse failures where serde_json::from_value is used, log decode failures where decode_pcm16le_b64 is called, log session-missing / error paths returned by SessionRegistry::with_session and SessionRegistry::start/stop, log the branch decision when spawning brain::run_turn in handle_push_audio (include turn_started boolean), and log poll outcome in handle_poll_response showing whether pcm_base64/transcript/reply_text were empty; ensure logs use debug or trace levels and include stable context fields (session_id, turn_started, error) before returning RpcOutcome::new or propagating errors.src/openhuman/voice_assistant/brain.rs-33-76 (1)
33-76:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReset session state on STT/LLM/TTS failures.
run_turnsetsProcessing(Line 35), but?returns from STT/LLM/TTS failures without restoring state. A failed turn can leave the session permanently stuck inprocessing.Proposed fix pattern
pub async fn run_turn(session_id: &str) -> Result<(), String> { @@ - let transcript = run_stt(&config, &pcm, &stt_provider_name, language.as_deref()).await?; + let transcript = match run_stt(&config, &pcm, &stt_provider_name, language.as_deref()).await { + Ok(v) => v, + Err(e) => { + let _ = SessionRegistry::with_session(session_id, |s| s.state = SessionState::Listening); + return Err(e); + } + }; @@ - let reply = run_llm(&config, &transcript, &history).await?; + let reply = match run_llm(&config, &transcript, &history).await { + Ok(v) => v, + Err(e) => { + let _ = SessionRegistry::with_session(session_id, |s| s.state = SessionState::Listening); + return Err(e); + } + }; @@ - let tts_pcm = run_tts(&config, &reply, &tts_provider_name).await?; + let tts_pcm = match run_tts(&config, &reply, &tts_provider_name).await { + Ok(v) => v, + Err(e) => { + let _ = SessionRegistry::with_session(session_id, |s| s.state = SessionState::Listening); + return Err(e); + } + };Also applies to: 85-94
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_assistant/brain.rs` around lines 33 - 76, The session is set to SessionState::Processing via SessionRegistry::with_session but early returns on STT/LLM/TTS failures (e.g., errors from run_stt, empty transcript, or later LLM/TTS calls) leave the session stuck in Processing; before any ?-based early return or .await that can error (notably around run_stt, transcript emptiness check, and the LLM/TTS steps), call SessionRegistry::with_session(session_id, |s| { s.state = SessionState::Listening; }) to restore state, or wrap the fallible calls to map_err/and_then so you reset state on error (reference symbols: SessionRegistry::with_session, SessionState::Processing, SessionState::Listening, run_stt, transcript) so the session is always returned to Listening on failure.src/openhuman/voice_assistant/session.rs-91-97 (1)
91-97:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFeed VAD only with retained samples to prevent false turn triggers.
Line 96 currently feeds VAD with
samples, even whento_push < samples.len(). That lets VAD advance on audio that was not buffered, which can trigger spurious end-of-utterance turns.Proposed fix
let remaining = MAX_INBOUND_SAMPLES.saturating_sub(self.inbound_pcm.len()); let to_push = samples.len().min(remaining); + if to_push == 0 { + return VadEvent::Idle; + } self.inbound_pcm.extend_from_slice(&samples[..to_push]); self.inbound_samples += to_push; - self.vad.feed(samples) + self.vad.feed(&samples[..to_push])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_assistant/session.rs` around lines 91 - 97, The VAD is being fed the full `samples` buffer even when only `to_push` samples are actually retained, causing VAD state to advance on audio not stored; change the call in the function that computes `remaining`/`to_push` so that `self.vad.feed` receives the retained slice (`&samples[..to_push]`) and skip feeding entirely when `to_push == 0` to avoid advancing VAD on discarded audio; update references around `remaining`, `to_push`, `self.inbound_pcm`, `self.inbound_samples`, and the `self.vad.feed` call accordingly.src/openhuman/voice_assistant/rpc.rs-66-76 (1)
66-76:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrevent overlapping brain turns for the same session.
handle_push_audiocan spawn multiplerun_turntasks for one session if multipleEndOfUtteranceevents arrive before the first task flips state. Add an atomic state gate before spawning.Proposed fix pattern
- let event = SessionRegistry::with_session(&req.session_id, |s| s.push_inbound_pcm(&samples))?; - - let turn_started = matches!(event, VadEvent::EndOfUtterance); + let turn_started = SessionRegistry::with_session(&req.session_id, |s| { + let event = s.push_inbound_pcm(&samples); + if matches!(event, VadEvent::EndOfUtterance) && s.state == SessionState::Listening { + s.state = SessionState::Processing; + true + } else { + false + } + })?;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_assistant/rpc.rs` around lines 66 - 76, The EndOfUtterance branch can spawn overlapping brain::run_turn tasks for the same session; add an atomic per-session gate (e.g. an AtomicBool on the Session struct with helper methods like start_turn_if_idle() and clear_turn()) and check it before spawning: inside the VadEvent::EndOfUtterance handling (the code that calls SessionRegistry::with_session and checks matches!(event, VadEvent::EndOfUtterance)), call start_turn_if_idle() and only spawn the tokio::spawn if it returns true; ensure the spawned task clears the gate (via SessionRegistry::with_session or a helper clear_turn()) when brain::run_turn completes or errors so subsequent EndOfUtterance events can start the next turn.src/openhuman/voice_assistant/brain.rs-271-276 (1)
271-276:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse a UTF-8-safe truncation method in
truncate.The current implementation at line 275 slices at an arbitrary byte boundary using
&s[..max], which will panic whenmaxlands inside a multi-byte UTF-8 character. This is reachable since the function is called with&transcriptand&replyfrom external sources that may contain non-ASCII text (e.g., translated STT output, LLM responses in other languages).The tests only cover ASCII input and miss this edge case. Replace with a UTF-8-safe approach:
Proposed fix
fn truncate(s: &str, max: usize) -> &str { if s.len() <= max { s } else { - &s[..max] + let end = s + .char_indices() + .take_while(|(i, _)| *i <= max) + .last() + .map(|(i, _)| i) + .unwrap_or(0); + &s[..end] } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_assistant/brain.rs` around lines 271 - 276, The function truncate currently slices bytes with &s[..max], which can split multi-byte UTF-8 characters; change it to compute the byte index of the max-th unicode scalar boundary and slice there (e.g. use s.char_indices().nth(max) to get the byte offset and return &s[..offset] or s if none), keeping the signature fn truncate(s: &str, max: usize) -> &str; update the implementation in the truncate function referenced by callers (e.g., where &transcript and &reply are passed) so it never panics on non-ASCII input.src/openhuman/voice_actions/engine.rs-145-180 (1)
145-180: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winState transitions are under-logged for reject/execute/fail paths.
Please add structured
debug/tracelogs for transition success/failure in these mutation paths for parity with recognize/confirm observability.As per coding guidelines, “Use
log/tracingatdebugortracelevel on RPC entry and exit, error paths, state transitions, and any branch that is hard to infer from tests alone.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_actions/engine.rs` around lines 145 - 180, Add structured debug/trace logs to the mutation functions reject_intent, mark_executed, and mark_failed around RPC entry/exit, error branches, and state transitions: log when each function is entered (including intent_id), log and include the current status when the get_mut lookup fails or when the status precondition is wrong (e.g., not Pending for reject_intent, not Confirmed for mark_executed), and log the state change after updating intent.status and any attached payloads (result or error) before returning Ok. Use the existing INTENTS lock/unlock points to capture pre- and post-change intent state and include IntentStatus and intent_id in messages at debug/trace level for parity with recognize/confirm observability.src/openhuman/voice_actions/schemas.rs-61-105 (1)
61-105:⚠️ Potential issue | 🟠 Major | ⚡ Quick winController schemas are out of sync with RPC response payloads.
recognizeandget_intentschemas do not describe several fields actually returned by handlers, creating contract drift for schema-driven callers.Suggested schema alignment (example)
// s_recognize outputs +FieldSchema { name: "namespace", ty: TypeSchema::String, comment: "Target namespace.", required: true }, +FieldSchema { name: "function", ty: TypeSchema::String, comment: "Target function.", required: true }, +FieldSchema { name: "confidence", ty: TypeSchema::Number, comment: "Recognition confidence.", required: true }, // s_get outputs +FieldSchema { name: "utterance", ty: TypeSchema::String, comment: "Original utterance.", required: true }, +FieldSchema { name: "safety", ty: TypeSchema::String, comment: "Safety level.", required: true }, +FieldSchema { name: "result", ty: TypeSchema::Json, comment: "Execution result.", required: false }, +FieldSchema { name: "error", ty: TypeSchema::String, comment: "Execution error.", required: false },Also applies to: 175-213
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_actions/schemas.rs` around lines 61 - 105, The ControllerSchema for s_recognize (and the get_intent schema) is missing several fields that the actual RPC handlers return, causing contract drift; inspect the recognize and get_intent handler implementations to enumerate the real response keys (e.g., any intent metadata like confidence, slots/parameters, raw_text/raw_intent, confirmation_required/confirmation_token, error/message fields) and update the outputs vector in s_recognize and the get_intent schema to include those exact fields with appropriate TypeSchema entries and comments and required flags so the schema matches the handler payloads. Ensure names match the handler JSON keys exactly and add/remove required: true accordingly.src/openhuman/voice_actions/rpc.rs-6-57 (2)
6-57: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winRPC handlers are missing required debug/trace observability.
Please add structured logs on handler entry, successful exit, and error branches with stable
[voice_actions][rpc]context and identifiers (e.g.,intent_idwhen present).As per coding guidelines, “Use
log/tracingatdebugortracelevel on RPC entry and exit, error paths, state transitions, and any branch that is hard to infer from tests alone.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_actions/rpc.rs` around lines 6 - 57, All RPC handlers (handle_recognize, handle_confirm, handle_reject, handle_get_intent, handle_list_mappings) lack structured observability; add tracing/log statements at entry, on success exit, and on error branches using the stable context tag "[voice_actions][rpc]". At each handler entry emit a debug/trace log including the incoming params (mask sensitive fields), on success include intent_id/namespace/function where available and any status/result, and on errors log the error with intent_id when present; use the project's logging API (log::debug/log::trace and log::error or tracing equivalents) and ensure logs are structured (key=value pairs) so fields like intent_id, action, status, confidence/safety are present for handle_recognize, handle_confirm, handle_reject, handle_get_intent, and mappings count for handle_list_mappings.
6-57:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRPC handlers should use
RpcOutcome<T>instead ofResult<Value, String>.This file currently bypasses the repository's RPC outcome contract, which risks inconsistency across controller wiring and shared RPC tooling. All other domain
rpc.rsfiles in the codebase follow theRpcOutcome<T>pattern (seevoice_assistant,people,whatsapp_data, and others).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_actions/rpc.rs` around lines 6 - 57, Change the RPC handlers (handle_recognize, handle_confirm, handle_reject, handle_get_intent, handle_list_mappings) to return the repository-wide RpcOutcome<Value> instead of Result<Value, String>: update each function signature to -> RpcOutcome<Value>, import RpcOutcome, and convert the current match arms so successful json! responses are returned via the RpcOutcome success constructor (e.g., RpcOutcome::ok or equivalent) and errors are returned via the RpcOutcome error constructor (e.g., RpcOutcome::err or equivalent) instead of Ok(json!(...)) / Ok(json!( { "ok": false, ... })); keep the same payload shapes but use the RpcOutcome API consistently across all handlers.src/openhuman/voice_actions/engine.rs-208-214 (1)
208-214:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTimestamp-only intent IDs can collide and overwrite stored intents.
format!("va-{:x}-{:x}", secs, nanos)is not collision-safe under concurrent recognitions. A duplicate key will replace an existingVoiceIntentinINTENTS.Suggested fix
+use std::sync::atomic::{AtomicU64, Ordering}; + +static INTENT_SEQ: AtomicU64 = AtomicU64::new(1); + fn uuid_v4() -> String { use std::time::{SystemTime, UNIX_EPOCH}; let t = SystemTime::now() .duration_since(UNIX_EPOCH) .unwrap_or_default(); - format!("va-{:x}-{:x}", t.as_secs(), t.subsec_nanos()) + let seq = INTENT_SEQ.fetch_add(1, Ordering::Relaxed); + format!("va-{:x}-{:x}-{:x}", t.as_secs(), t.subsec_nanos(), seq) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_actions/engine.rs` around lines 208 - 214, The current uuid_v4() function generates IDs from timestamps only, which can collide under concurrent recognitions and overwrite entries in INTENTS; replace the timestamp-only scheme with a proper random UUID (e.g., use uuid::Uuid::new_v4()) and return a string ID (optionally prefixed with "va-") so generated IDs are collision-resistant; update the uuid_v4 function to call Uuid::new_v4().to_string(), add the uuid crate/imports, and ensure callers that expect the old format still work with the new ID string for VoiceIntent keys in INTENTS.src/openhuman/voice_actions/engine.rs-172-179 (1)
172-179:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
mark_failedallows invalid lifecycle transitions.This currently permits changing any intent state (including
Rejected/Executed) toFailed, which breaks intent state integrity.Suggested guard
pub fn mark_failed(intent_id: &str, error: &str) -> Result<VoiceIntent, String> { let mut store = INTENTS.lock().unwrap(); let intent = store .get_mut(intent_id) .ok_or_else(|| format!("intent not found: {intent_id}"))?; + if intent.status != IntentStatus::Confirmed { + return Err("intent must be confirmed before failure is recorded".into()); + } intent.status = IntentStatus::Failed; intent.error = Some(error.to_string()); Ok(intent.clone()) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_actions/engine.rs` around lines 172 - 179, mark_failed currently mutates any intent's status to IntentStatus::Failed allowing invalid lifecycle transitions; update the function mark_failed to first fetch the mutable intent from INTENTS (using the existing get_mut/intent variable), check intent.status against allowed source states (e.g., only IntentStatus::Pending and IntentStatus::InProgress), and if the current status is not allowed return an Err with a descriptive message (e.g., "invalid transition from {current_status} to Failed"); only set intent.status = IntentStatus::Failed and intent.error = Some(...) when the guard permits the transition.
🟡 Minor comments (4)
src/openhuman/guided_flows/schemas.rs-197-247 (1)
197-247:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMissing
recommendationoutput field inschema_get_session.The
handle_get_sessionRPC handler (rpc.rs lines 82-89) returns arecommendationobject when the session is completed, but this schema doesn't declare it. This mismatch means clients relying on the schema won't expect the recommendation field.🔧 Proposed fix
FieldSchema { name: "answers_count", ty: TypeSchema::F64, comment: "Number of answers submitted.", required: true, }, + FieldSchema { + name: "recommendation", + ty: TypeSchema::Json, + comment: "Recommendation (if completed).", + required: false, + }, ], } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/guided_flows/schemas.rs` around lines 197 - 247, The outputs of ControllerSchema in schema_get_session are missing the optional "recommendation" field returned by handle_get_session; update the outputs Vec in schema_get_session to include a FieldSchema for "recommendation" (use the appropriate TypeSchema for the recommendation object or TypeSchema::Object/Map if a struct isn't defined) and mark it required: false so clients know it may be absent unless the session is completed.src/openhuman/live_captions/store.rs-33-43 (1)
33-43:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReject invalid segment time windows before appending.
append_segmentshould validateend_ms >= start_ms; otherwise malformed segments are accepted and downstream duration math becomes inconsistent.Proposed fix
pub fn append_segment(transcript_id: &str, segment: CaptionSegment) -> Result<Transcript, String> { + if segment.end_ms < segment.start_ms { + return Err("segment end_ms must be >= start_ms".into()); + } let mut store = TRANSCRIPTS.lock().unwrap();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/live_captions/store.rs` around lines 33 - 43, In append_segment, reject segments whose time window is invalid by checking the CaptionSegment's start_ms and end_ms before mutating the store: ensure end_ms >= start_ms (or end_ms > start_ms if open interval desired) and return an Err with a clear message (e.g., "invalid segment time window: end_ms < start_ms") instead of appending; locate this check at the top of append_segment (before accessing TRANSCRIPTS or pushing to t.segments) and keep the existing TranscriptState and timestamp update logic unchanged.src/openhuman/voice_assistant/session.rs-164-177 (1)
164-177:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
startdocs do not match behavior.The doc says “Returns error if session_id already exists”, but existing sessions are intentionally restarted (Lines 174-178). Please align the comment with runtime behavior to avoid contract confusion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_assistant/session.rs` around lines 164 - 177, The docstring for the start function is wrong — it claims the function "Returns error if session_id already exists" but the implementation (session::start) actually performs an idempotent restart by detecting session_id via registry_map().contains_key, logging a restart, removing the old entry and continuing. Update the comment on start to describe the actual behavior: that calling start with an existing session_id will close/remove the old session and open a new one (i.e., idempotent restart), and optionally mention the log message emitted (debug "{LOG_PREFIX} restarting existing session={session_id}").src/openhuman/voice_actions/rpc.rs-7-8 (1)
7-8:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRequired RPC fields should be validated explicitly, not defaulted to empty strings.
Using
unwrap_or("")hides malformed requests and returns less precise downstream errors. Return a direct"missing required field: ..."response instead.Also applies to: 19-20, 27-28, 35-36
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_actions/rpc.rs` around lines 7 - 8, The RPC handler currently uses patterns like p.get("utterance").and_then(|v| v.as_str()).unwrap_or("") which masks malformed requests; instead explicitly validate required fields (e.g., "utterance" and the other keys referenced around lines 19-20, 27-28, 35-36) and return a clear error response such as "missing required field: <field_name>" when a field is absent or not a string; locate the call sites (e.g., the variable bindings for utterance used before calling engine::recognize_intent) and replace the unwrap_or behavior with an early-return error path that sends the missing-field response, and apply the same validation pattern to the other fields mentioned so downstream functions always receive validated strings.
🧹 Nitpick comments (5)
src/openhuman/inference/provider/reliable.rs (2)
74-81: 💤 Low valueAuth failure hints are less comprehensive than
is_non_retryable.The non-streaming
is_non_retryablefunction (lines 33-45) checks for many more auth failure patterns like"incorrect api key","missing api key","authentication failed","permission denied","access denied", and"invalid token". These would incorrectly be treated as retryable in the streaming path, wasting retry attempts on unrecoverable errors.♻️ Suggested fix to align auth hints
StreamError::Provider(msg) => { let lower = msg.to_lowercase(); - lower.contains("invalid api key") + let auth_hints = [ + "invalid api key", + "incorrect api key", + "missing api key", + "api key not set", + "authentication failed", + "auth failed", + "unauthorized", + "forbidden", + "permission denied", + "access denied", + "invalid token", + ]; + auth_hints.iter().any(|hint| lower.contains(hint)) || lower.contains("unauthorized") || lower.contains("forbidden") || lower.contains("model") && (lower.contains("not found") || lower.contains("unsupported")) }Or better, extract a shared
AUTH_FAILURE_HINTSconstant to keep both classifiers in sync.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/inference/provider/reliable.rs` around lines 74 - 81, The StreamError::Provider branch in reliable.rs only checks a subset of auth failure messages and so will treat many non-retryable auth errors as retryable; update the streaming-path checks to match the broader set used by is_non_retryable (include patterns like "incorrect api key", "missing api key", "authentication failed", "permission denied", "access denied", "invalid token", etc.), or better yet factor the list into a shared AUTH_FAILURE_HINTS constant and have both is_non_retryable and the StreamError::Provider matcher consult that constant so both classifiers stay in sync.
1008-1069: 💤 Low valueConsider adding structured logging with request correlation.
The spawned streaming task lacks a correlation ID to tie log entries back to the originating request. Per coding guidelines, include correlation fields like request IDs for debuggability, especially for async/background tasks.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/inference/provider/reliable.rs` around lines 1008 - 1069, The background task spawned with tokio::spawn over candidate_streams needs a request correlation id added and propagated: capture/clone the request_id (or correlation_id) into the move closure that contains tokio::spawn, create a tracing span (e.g., let span = tracing::info_span!("streaming", request_id = %request_id)) and instrument the async block with .instrument(span) (or at minimum include request_id as a field on each tracing::warn call), and add the same request_id field to the final tx.send error message logs; update the tracing::warn invocations inside the loop (and any other logs inside the spawned block) to include request_id = %request_id so all entries can be correlated to the originating request.src/openhuman/guided_flows/rpc.rs (1)
7-94: 💤 Low valueConsider adding trace-level logging on RPC entry/exit.
Per coding guidelines, RPC handlers should use
tracingatdebugortracelevel on entry, exit, and error paths with structured context. This aids debugging without impacting the functional code.💡 Example for handle_start_flow
pub async fn handle_start_flow(p: Map<String, Value>) -> Result<Value, String> { let flow_id = p.get("flow_id").and_then(|v| v.as_str()).unwrap_or(""); + tracing::trace!(flow_id, "[guided_flows][rpc] start_flow called"); let session_id = p .get("session_id") .and_then(|v| v.as_str()) .map(String::from); match engine::start_flow(flow_id, session_id) { - Ok(s) => Ok(json!({ + Ok(s) => { + tracing::trace!(session_id = %s.session_id, "[guided_flows][rpc] start_flow succeeded"); + Ok(json!({ "ok": true, "session_id": s.session_id, ... - })), - Err(e) => Ok(json!({ "ok": false, "error": e })), + })) + } + Err(e) => { + tracing::debug!(flow_id, error = %e, "[guided_flows][rpc] start_flow failed"); + Ok(json!({ "ok": false, "error": e })) + } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/guided_flows/rpc.rs` around lines 7 - 94, Add tracing debug/trace logs at entry, successful exit, and error branches for each RPC handler (handle_list_flows, handle_start_flow, handle_submit_answer, handle_get_session). Use the tracing crate (e.g., trace! or debug!) with structured fields: for handle_list_flows log entry and number of flows on success; for handle_start_flow include flow_id and session_id on entry and log session_id/flow_id/current_step on success or the error on failure; for handle_submit_answer include session_id/step_id/value on entry and log session_id/current_step/state and optional recommendation fields on success or the error on failure; for handle_get_session include session_id on entry and log returned session metadata or the error on failure. Ensure logs live in the async fn bodies near the match points (both Ok and Err branches) so structured context is available for debugging.src/openhuman/live_captions/rpc.rs (1)
6-100: ⚡ Quick winAdd structured
debug/tracelogs on RPC entry, exit, and error branches.The handlers currently have no RPC-path diagnostics, which makes incident triage and request tracing difficult.
As per coding guidelines, "Use
log/tracingatdebugortracelevel on RPC entry and exit, error paths, state transitions, and any branch that is hard to infer from tests alone. Use structured, grep-friendly context with stable prefixes (e.g.[domain],[rpc],[ui-flow]) and include correlation fields such as request IDs, method names, and entity IDs when available."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/live_captions/rpc.rs` around lines 6 - 100, Add structured debug/trace logs to all RPC handlers (handle_start_transcript, handle_append_segment, handle_complete_transcript, handle_summarize_transcript, handle_get_transcript, handle_list_transcripts): log at entry (trace/debug) with a stable prefix like "[rpc]" and fields method=<fn name> and any request ids or key args (transcript_id, source, title, segment info), log on success/exit with method, transcript_id and result summary (state, segment_count, summary), and log errors on Err branches with method, transcript_id and the error; use the project's log/tracing macros consistently (e.g., trace!/debug!/error!) and keep messages structured and grep-friendly so callers can correlate requests.src/openhuman/voice_actions/schemas.rs (1)
258-272: ⚡ Quick winPlease verify handler export convention for schema wiring (
handle_*vs privateh_*).This file uses private
h_*wrappers; repository guidance calls outhandle_*(_: Map<String, Value>) -> ControllerFuturehandler functions in domain schema wiring. Confirm whether this domain intentionally deviates from the standard pattern.As per coding guidelines, “For each domain, define controller schema metadata in a dedicated
schemas.rsmodule … and export … handler functionsfn handle_*(_: Map<String, Value>) -> ControllerFuturefrom the domainmod.rs…”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_actions/schemas.rs` around lines 258 - 272, The schema wiring currently defines private wrapper functions h_recognize, h_confirm, h_reject, h_get, and h_list which call super::rpc::handle_*; change these to follow the project convention by exporting public handler functions named handle_recognize, handle_confirm, handle_reject, handle_get, and handle_list with the same signature fn handle_*(_: Map<String, Value>) -> ControllerFuture (or alternatively keep the h_* wrappers but add pub fn handle_* that simply forwards to the existing h_*), so the domain exposes the expected handle_* symbols for schema wiring and consumers.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai resolve |
✅ Actions performedComments resolved and changes approved. |
2d23a1f to
8b5be07
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/openhuman/voice_assistant/noise_cancel.rs (1)
69-74:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
feed_reference()never affectsprocess().
process()only callscancel_echo()when a freshreferenceslice is passed, but the stateful path stores reference audio inreference_bufviafeed_reference(). Withreference = None, the buffered echo reference is ignored, so echo cancellation is effectively disabled for the normal session flow.Suggested fix
- if self.config.echo_cancel_enabled { - if let Some(ref_signal) = reference { - samples = self.cancel_echo(&samples, ref_signal); - } + if let Some(ref_signal) = reference { + self.feed_reference(ref_signal); + } + if self.config.echo_cancel_enabled && !self.reference_buf.is_empty() { + samples = self.cancel_echo(&samples, &[]); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_assistant/noise_cancel.rs` around lines 69 - 74, process() currently only calls cancel_echo(&samples, ref_signal) when a fresh reference slice is passed, so feed_reference()'s buffered reference (reference_buf) is never used; change process() to fall back to the stored reference_buf when reference is None (e.g., use reference.as_ref().or(self.reference_buf.as_deref()) or equivalent) and pass that slice into cancel_echo(), ensuring ownership/borrowing matches cancel_echo(&samples, ref_signal) and that reference_buf is not cleared prematurely.src/openhuman/voice_assistant/session.rs (1)
302-309:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCleanup is still incomplete for evicted sessions.
evict_nc_state()only runs instop(), but sessions are also removed here via restart, idle eviction, and LRU eviction. Those paths still leaveNC_STATESentries behind, so the leak survives the automated cleanup paths.Suggested fix
impl SessionRegistry { + fn remove_session( + map: &mut HashMap<String, VoiceAssistantSession>, + session_id: &str, + ) -> Option<VoiceAssistantSession> { + super::brain::evict_nc_state(session_id); + map.remove(session_id) + } + pub fn start( session_id: &str, stt_provider: &str, @@ if map.contains_key(session_id) { debug!("{LOG_PREFIX} restarting existing session={session_id}"); - map.remove(session_id); + Self::remove_session(&mut map, session_id); } @@ warn!("{LOG_PREFIX} evicting LRU session={lru_id} (at capacity {MAX_SESSIONS})"); - map.remove(&lru_id); + Self::remove_session(&mut map, &lru_id); } } @@ pub fn stop(session_id: &str) -> Result<VoiceAssistantSession, String> { - super::brain::evict_nc_state(session_id); let mut map = registry_map() .lock() .map_err(|e| format!("{LOG_PREFIX} lock poisoned: {e}"))?; - map.remove(session_id) + Self::remove_session(&mut map, session_id) .ok_or_else(|| format!("{LOG_PREFIX} session not found: {session_id}")) } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_assistant/session.rs` around lines 302 - 309, The NC_STATES cleanup (super::brain::evict_nc_state) is only invoked in stop() but sessions get removed elsewhere (restart flow, idle eviction, LRU eviction), leaving NC_STATES entries behind; update all places that remove a session from registry_map (e.g., restart handler, idle eviction routine, LRU eviction logic) to call super::brain::evict_nc_state(session_id) before or as part of removal, or factor removal into a single helper (e.g., a remove_session(session_id) that calls evict_nc_state and then registry_map().lock().remove) and replace direct map.remove(...) calls with that helper to ensure consistent cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/live_captions/voice_profiles.rs`:
- Around line 119-127: The save_profiles_to_disk function currently writes
directly to the target file which can leave voice_profiles.json corrupted on
crash; change save_profiles_to_disk to perform an atomic write by writing the
JSON to a temporary file in the same directory (use profiles_path() to derive
the dir and final filename), flush and sync the temp file (fsync) and then
atomically rename (std::fs::rename) the temp file over the target; also handle
and log errors from creating/writing/syncing/renaming the temp file (instead of
relying on std::fs::write) so failures don't truncate or lose the original file.
- Around line 42-43: The info! log currently emits raw user-supplied profile
names which leaks PII; update the logging in the block that calls
save_profiles_to_disk(&store) and the info! invocation to avoid printing the
`name` variable—log only the profile `id` or a redacted/hashed form instead
(e.g., omit `name` entirely or replace with a deterministic hash) so the
callsite around save_profiles_to_disk and the info!("[voice-profiles] registered
...") no longer includes raw profile names.
In `@src/openhuman/voice_actions/engine.rs`:
- Around line 33-56: The eviction logic in get_context is never run by
record_context so CONTEXTS can grow past MAX_CONTEXTS and the current ">" check
also misses the first overflow; extract the eviction block into a helper (e.g.,
evict_contexts or ensure_capacity) that accepts the shared store and now, then
call this helper from both get_context and record_context at the proper time;
when calling before inserting use a check of store.len() >= MAX_CONTEXTS (or
call the helper after inserting and keep store.len() > MAX_CONTEXTS) so the
first insert beyond capacity is handled, and keep the two-pass
stale-then-oldest-by-last_active eviction behavior unchanged.
---
Outside diff comments:
In `@src/openhuman/voice_assistant/noise_cancel.rs`:
- Around line 69-74: process() currently only calls cancel_echo(&samples,
ref_signal) when a fresh reference slice is passed, so feed_reference()'s
buffered reference (reference_buf) is never used; change process() to fall back
to the stored reference_buf when reference is None (e.g., use
reference.as_ref().or(self.reference_buf.as_deref()) or equivalent) and pass
that slice into cancel_echo(), ensuring ownership/borrowing matches
cancel_echo(&samples, ref_signal) and that reference_buf is not cleared
prematurely.
In `@src/openhuman/voice_assistant/session.rs`:
- Around line 302-309: The NC_STATES cleanup (super::brain::evict_nc_state) is
only invoked in stop() but sessions get removed elsewhere (restart flow, idle
eviction, LRU eviction), leaving NC_STATES entries behind; update all places
that remove a session from registry_map (e.g., restart handler, idle eviction
routine, LRU eviction logic) to call super::brain::evict_nc_state(session_id)
before or as part of removal, or factor removal into a single helper (e.g., a
remove_session(session_id) that calls evict_nc_state and then
registry_map().lock().remove) and replace direct map.remove(...) calls with that
helper to ensure consistent cleanup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9ad982e8-50ed-4ac8-b577-55218961d2a5
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
Cargo.tomldocs/YC_CAPABILITIES.mdsrc/openhuman/live_captions/voice_profiles.rssrc/openhuman/voice_actions/engine.rssrc/openhuman/voice_assistant/brain.rssrc/openhuman/voice_assistant/noise_cancel.rssrc/openhuman/voice_assistant/session.rssrc/openhuman/voice_assistant/wake_word.rs
✅ Files skipped from review due to trivial changes (1)
- docs/YC_CAPABILITIES.md
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/openhuman/chat_with_data/rpc.rs (1)
36-82: ⚡ Quick winConsider adding input length limits to prevent abuse.
The
questionparameter is used directly in SQL generation and LLM prompts without length validation. A malicious or accidental very long question could cause performance issues or exceed LLM context limits.♻️ Proposed fix
+const MAX_QUESTION_LEN: usize = 2000; + pub async fn handle_query(p: Map<String, Value>) -> Result<Value, String> { let id = p.get("dataset_id").and_then(|v| v.as_str()).unwrap_or(""); let question = p.get("question").and_then(|v| v.as_str()).unwrap_or(""); if id.is_empty() { return Ok(json!({"ok": false, "error": "dataset_id is required"})); } if question.is_empty() { return Ok(json!({"ok": false, "error": "question is required"})); } + if question.len() > MAX_QUESTION_LEN { + return Ok(json!({"ok": false, "error": format!("question exceeds {} chars", MAX_QUESTION_LEN)})); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/rpc.rs` around lines 36 - 82, The handle_query function currently accepts an unbounded question string used by sql_gen::generate_sql_for_question, try_llm_sql_gen, and engine::query_dataset; add a max length check (e.g. const MAX_QUESTION_LEN) and enforce it early: trim whitespace, if question.len() > MAX_QUESTION_LEN return an error JSON like {"ok":false,"error":"question too long"} or alternatively truncate the question before passing to sql_gen::generate_sql_for_question and try_llm_sql_gen (but still return a warning/caveat in the response); apply the same limit/trim when calling try_llm_sql_gen and engine::query_dataset, and log the truncation/limit hit using debug! for visibility.src/openhuman/live_captions/translate.rs (1)
11-12: 💤 Low valueRemove unused imports.
serde_json::jsonandtracing::warnare imported but never used in this file.♻️ Proposed fix
-use serde_json::json; -use tracing::{debug, warn}; +use tracing::debug;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/live_captions/translate.rs` around lines 11 - 12, The file imports unused symbols—remove serde_json::json and tracing::warn to clean up unused imports: update the import lines in src/openhuman/live_captions/translate.rs so that serde_json::json is no longer imported and tracing only imports debug (e.g., change or replace the use lines that include json and warn), leaving any remaining used imports like tracing::debug intact; ensure there are no other references to json or warn in functions within this module before removing.src/openhuman/voice_assistant/noise_cancel.rs (1)
244-246: 💤 Low valueTest assertion is trivially true.
out_rms >= 0.0is always true for any RMS calculation. Consider asserting something meaningful, such as that the output differs from zero (signal passed through) or that the length matches.The test comment says "verify output is produced without panic and has correct length" — but the length check is on line 242. The RMS assertion adds no additional verification.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_assistant/noise_cancel.rs` around lines 244 - 246, The RMS assertion is vacuous because out_rms >= 0.0 is always true; replace it with a meaningful check: assert that the output buffer out has the expected length (already present) and that it contains non-zero signal (e.g., verify out.iter().any(|&s| s != 0) or that out_rms is greater than a small epsilon like 1e-6) so the test ensures output was actually produced by the processing code (referencing out and out_rms).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/voice_assistant/ws_transport.rs`:
- Around line 100-110: The spawned task can leak the processing lock if
super::brain::run_turn panics because SessionRegistry::release_processing(&sid)
is only called on normal Err returns; fix by ensuring release_processing runs in
all cases: wrap the task body with a RAII drop guard that calls
SessionRegistry::release_processing(&sid) in its Drop impl (created inside the
tokio::spawn), or run the work inside std::panic::catch_unwind (or await the
JoinHandle and cleanup on panic) so that regardless of panic or error you always
call SessionRegistry::release_processing(&sid); apply this around the
tokio::spawn closure surrounding super::brain::run_turn and the acquired flag
logic from SessionRegistry::try_acquire_processing.
---
Nitpick comments:
In `@src/openhuman/chat_with_data/rpc.rs`:
- Around line 36-82: The handle_query function currently accepts an unbounded
question string used by sql_gen::generate_sql_for_question, try_llm_sql_gen, and
engine::query_dataset; add a max length check (e.g. const MAX_QUESTION_LEN) and
enforce it early: trim whitespace, if question.len() > MAX_QUESTION_LEN return
an error JSON like {"ok":false,"error":"question too long"} or alternatively
truncate the question before passing to sql_gen::generate_sql_for_question and
try_llm_sql_gen (but still return a warning/caveat in the response); apply the
same limit/trim when calling try_llm_sql_gen and engine::query_dataset, and log
the truncation/limit hit using debug! for visibility.
In `@src/openhuman/live_captions/translate.rs`:
- Around line 11-12: The file imports unused symbols—remove serde_json::json and
tracing::warn to clean up unused imports: update the import lines in
src/openhuman/live_captions/translate.rs so that serde_json::json is no longer
imported and tracing only imports debug (e.g., change or replace the use lines
that include json and warn), leaving any remaining used imports like
tracing::debug intact; ensure there are no other references to json or warn in
functions within this module before removing.
In `@src/openhuman/voice_assistant/noise_cancel.rs`:
- Around line 244-246: The RMS assertion is vacuous because out_rms >= 0.0 is
always true; replace it with a meaningful check: assert that the output buffer
out has the expected length (already present) and that it contains non-zero
signal (e.g., verify out.iter().any(|&s| s != 0) or that out_rms is greater than
a small epsilon like 1e-6) so the test ensures output was actually produced by
the processing code (referencing out and out_rms).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e2a4ea5c-03eb-4566-a7d8-354ffffe5d50
📒 Files selected for processing (18)
Cargo.tomldocs/YC_CAPABILITIES.mdsrc/openhuman/chat_with_data/engine.rssrc/openhuman/chat_with_data/rpc.rssrc/openhuman/chat_with_data/sql_gen.rssrc/openhuman/inference/provider/reliable.rssrc/openhuman/live_captions/mod.rssrc/openhuman/live_captions/rpc.rssrc/openhuman/live_captions/store.rssrc/openhuman/live_captions/translate.rssrc/openhuman/live_captions/voice_profiles.rssrc/openhuman/operator_inbox/rpc.rssrc/openhuman/voice_actions/engine.rssrc/openhuman/voice_assistant/brain.rssrc/openhuman/voice_assistant/noise_cancel.rssrc/openhuman/voice_assistant/session.rssrc/openhuman/voice_assistant/types.rssrc/openhuman/voice_assistant/ws_transport.rs
💤 Files with no reviewable changes (1)
- src/openhuman/voice_assistant/types.rs
✅ Files skipped from review due to trivial changes (1)
- docs/YC_CAPABILITIES.md
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
src/openhuman/chat_with_data/rpc.rs (4)
52-53:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep the dataset lookup failure in the handler's normal response shape.
All other validation/runtime failures in this handler come back as
{"ok": false, ...}, butengine::get_dataset(id)?escapes as a top-levelErr(String). A nonexistent dataset will therefore produce a different wire shape than the rest ofhandle_query, which is a client-facing contract break.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/rpc.rs` around lines 52 - 53, The dataset lookup uses engine::get_dataset(id)? which bubbles an Err out of handle_query and breaks the handler's uniform response shape; replace the fallible ? usage with explicit error handling inside handle_query (e.g., match or map_err) and return the handler's normal {"ok": false, ...} response object when the dataset is not found or lookup fails so all validation/runtime failures use the same wire format.
58-72:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftThis branch returns SQL text instead of answering the query.
Once the LLM-generated SQL passes validation, the handler stops and returns
"Generated SQL (LLM): ..."as the answer. That means the complex-query path never actually queries the dataset, so users get a SQL string back instead of the result they asked for.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/rpc.rs` around lines 58 - 72, The Fallback branch currently returns the validated LLM SQL string as the answer instead of executing it; after sql_gen::validate_sql and sql_gen::is_safe_query succeed for the llm_sql produced by try_llm_sql_gen(&ds, question).await, execute that llm_sql against the dataset (use the dataset execution/query function on ds, e.g., whatever method runs SQL for ds) and build the JSON response from the execution result (rows/count) rather than returning the SQL text; ensure you await the execution, handle execution errors (map them to an Err result), and keep metadata like "source": "llm" and an appropriate confidence/caveats field.
6-6: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftSwitch these handlers to the repo's
RpcOutcome<T>contract.Every exported controller in this domain
rpc.rsstill returnsResult<Value, String>. That bypasses the Rust-core RPC pattern the rest of the controller layer expects and makes success/error handling inconsistent at the boundary. As per coding guidelines,src/openhuman/**/rpc.rs: "Use theRpcOutcome<T>pattern for RPC controller return types in domainrpc.rsfiles."Also applies to: 36-36, 125-125, 170-170, 178-178, 186-186, 196-196, 220-220, 231-231
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/rpc.rs` at line 6, Change the RPC handler return types from Result<Value, String> to the repo's RpcOutcome<T> contract (e.g., make handle_register_dataset return RpcOutcome<Value>), import or qualify RpcOutcome, and update all success/error returns to use RpcOutcome::Ok(payload) and RpcOutcome::Err(error_string) (instead of Ok(...) / Err(...)). Apply the same change to every exported controller in this file (including the other handlers referenced in the comment), and adjust any early-return/error mapping to produce a String error or convert existing error types into the RpcOutcome::Err payload so the function signatures and callers align with the project's RPC pattern.
202-217:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't count rows that were stripped down to empty maps.
This parser drops every non-
f64field, but still keeps the row even if nothing survives. A row like{"name":"acme"}becomes{}here, gets passed toengine::ingest_rows, and incrementsingested, which silently loses data while reporting success.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/rpc.rs` around lines 202 - 217, The current mapping builds Vec<HashMap<String,f64>> including maps that became empty after filtering non-f64 values, causing engine::ingest_rows(id, rows) to receive and count empty rows; modify the transformation that produces rows (the closure over rows_val) to drop any map with .is_empty() (or len()==0) so only rows with at least one f64 field are collected, then compute count from the filtered rows before calling engine::ingest_rows(id, rows) and returning the ingested count.src/openhuman/voice_assistant/ws_transport.rs (1)
100-117:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftCompleted turns won't be pushed unless another inbound frame arrives.
run_turn()is spawned asynchronously, but the outbound queue is only drained insideprocess_ws_frame(), andhandle_ws()only calls that when a new binary frame comes in. If the client stops sending audio afterEndOfUtterance, transcript/TTS generated by the completed turn can sit in the session indefinitely, so this endpoint doesn't actually stream server-side results after capture ends.Also applies to: 119-179, 229-246
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice_assistant/ws_transport.rs` around lines 100 - 117, The spawned async run_turn currently returns results but nothing forces the session outbound queue to be drained unless a new inbound frame arrives; after run_turn completes (in the tokio::spawn block where run_turn(&sid) is awaited) ensure the session's outbound queue is flushed/pushed so transcripts/TTS are sent immediately — for example, invoke the same outbound-drain logic used by process_ws_frame (or call a helper exposed from handle_ws/session manager) after run_turn finishes, or have run_turn notify SessionRegistry/Session to trigger an immediate flush; update the tokio::spawn block (the run_turn await path and error path) to call that flush/notify so EndOfUtterance-produced outputs are streamed without requiring another inbound frame.src/openhuman/voice_assistant/noise_cancel.rs (1)
88-95:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't feed the buffered reference back into itself.
This branch clones
self.reference_bufintobuf_refand passes it tocancel_echo(), butcancel_echo()always appends itsreferenceargument back intoself.reference_buf. WhenreferenceisNone, everyprocess()call duplicates the entire buffered reference history and shifts the NLMS window, which will quickly misalign echo cancellation.
🧹 Nitpick comments (1)
src/openhuman/chat_with_data/rpc.rs (1)
6-33: ⚡ Quick winFill in the missing structured debug logs for the RPC boundary.
Only some handlers log entry, and the success/error exits are mostly silent. That leaves dataset mutations, list/read calls, and failure branches much harder to reconstruct in production than the repo standard expects. As per coding guidelines,
src/**/*.rs: "Uselog/tracingatdebugortracelevel on RPC entry and exit, error paths, state transitions, and any branch that is hard to infer from tests alone."Also applies to: 36-84, 125-143, 170-240
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/rpc.rs` around lines 6 - 33, Add structured RPC entry/exit and error logs for handle_register_dataset: log an entry at the start with debug! including fields name, source, columns length (or columns), and row_count; after calling engine::register_dataset, log a debug!/trace! success with the returned dataset's id, name, columns and row_count; on Err(e) log an error! with the error string and the input fields. Apply the same pattern (entry debug, success debug with returned fields, and error error!) to the other handlers noted (the blocks covering lines 36-84, 125-143, 170-240) so all RPC boundaries consistently emit structured logs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/inference/provider/reliable.rs`:
- Around line 1056-1068: The handler that currently logs errors inside the
Some(Err(ref e)) arm uses is_stream_error_non_retryable(e) only for logging and
then continues to try other providers, which masks actionable non-retryable
failures; change the logic in the matching arms (the Some(Err(ref e)) block and
the similar block around 1081–1086) so that if is_stream_error_non_retryable(e)
is true you immediately propagate the error out (e.g., return Err(...) or
convert e into the function's error type) instead of falling through to the next
candidate; otherwise keep the existing warning-and-continue behavior. Ensure you
use the existing is_stream_error_non_retryable function to decide and propagate
the original error (preserving its message) so callers can trigger
sign-in/config recovery.
---
Outside diff comments:
In `@src/openhuman/chat_with_data/rpc.rs`:
- Around line 52-53: The dataset lookup uses engine::get_dataset(id)? which
bubbles an Err out of handle_query and breaks the handler's uniform response
shape; replace the fallible ? usage with explicit error handling inside
handle_query (e.g., match or map_err) and return the handler's normal {"ok":
false, ...} response object when the dataset is not found or lookup fails so all
validation/runtime failures use the same wire format.
- Around line 58-72: The Fallback branch currently returns the validated LLM SQL
string as the answer instead of executing it; after sql_gen::validate_sql and
sql_gen::is_safe_query succeed for the llm_sql produced by try_llm_sql_gen(&ds,
question).await, execute that llm_sql against the dataset (use the dataset
execution/query function on ds, e.g., whatever method runs SQL for ds) and build
the JSON response from the execution result (rows/count) rather than returning
the SQL text; ensure you await the execution, handle execution errors (map them
to an Err result), and keep metadata like "source": "llm" and an appropriate
confidence/caveats field.
- Line 6: Change the RPC handler return types from Result<Value, String> to the
repo's RpcOutcome<T> contract (e.g., make handle_register_dataset return
RpcOutcome<Value>), import or qualify RpcOutcome, and update all success/error
returns to use RpcOutcome::Ok(payload) and RpcOutcome::Err(error_string)
(instead of Ok(...) / Err(...)). Apply the same change to every exported
controller in this file (including the other handlers referenced in the
comment), and adjust any early-return/error mapping to produce a String error or
convert existing error types into the RpcOutcome::Err payload so the function
signatures and callers align with the project's RPC pattern.
- Around line 202-217: The current mapping builds Vec<HashMap<String,f64>>
including maps that became empty after filtering non-f64 values, causing
engine::ingest_rows(id, rows) to receive and count empty rows; modify the
transformation that produces rows (the closure over rows_val) to drop any map
with .is_empty() (or len()==0) so only rows with at least one f64 field are
collected, then compute count from the filtered rows before calling
engine::ingest_rows(id, rows) and returning the ingested count.
In `@src/openhuman/voice_assistant/ws_transport.rs`:
- Around line 100-117: The spawned async run_turn currently returns results but
nothing forces the session outbound queue to be drained unless a new inbound
frame arrives; after run_turn completes (in the tokio::spawn block where
run_turn(&sid) is awaited) ensure the session's outbound queue is flushed/pushed
so transcripts/TTS are sent immediately — for example, invoke the same
outbound-drain logic used by process_ws_frame (or call a helper exposed from
handle_ws/session manager) after run_turn finishes, or have run_turn notify
SessionRegistry/Session to trigger an immediate flush; update the tokio::spawn
block (the run_turn await path and error path) to call that flush/notify so
EndOfUtterance-produced outputs are streamed without requiring another inbound
frame.
---
Nitpick comments:
In `@src/openhuman/chat_with_data/rpc.rs`:
- Around line 6-33: Add structured RPC entry/exit and error logs for
handle_register_dataset: log an entry at the start with debug! including fields
name, source, columns length (or columns), and row_count; after calling
engine::register_dataset, log a debug!/trace! success with the returned
dataset's id, name, columns and row_count; on Err(e) log an error! with the
error string and the input fields. Apply the same pattern (entry debug, success
debug with returned fields, and error error!) to the other handlers noted (the
blocks covering lines 36-84, 125-143, 170-240) so all RPC boundaries
consistently emit structured logs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 50c2ee07-b583-4073-aafc-e73b9ee8c262
📒 Files selected for processing (5)
src/openhuman/chat_with_data/rpc.rssrc/openhuman/inference/provider/reliable.rssrc/openhuman/live_captions/translate.rssrc/openhuman/voice_assistant/noise_cancel.rssrc/openhuman/voice_assistant/ws_transport.rs
Implement all sub-issues of tinyhumansai#1830: - tinyhumansai#1831: voice_assistant — standalone STT→LLM→TTS session loop - tinyhumansai#1832: live_captions — real-time transcript + summarization - tinyhumansai#1833: voice_actions — utterance→controller intent mapping - tinyhumansai#1834: operator_inbox — message triage + draft generation - tinyhumansai#1835: chat_with_data — NL dataset querying + insights - tinyhumansai#1836: guided_flows — quiz state machine + recommendations Each domain: types, engine, rpc, schemas, wired into core/all.rs controller registry + about_app/catalog.rs. 128 unit tests, zero clippy warnings. Closes tinyhumansai#1830 Closes tinyhumansai#1831 Closes tinyhumansai#1832 Closes tinyhumansai#1833 Closes tinyhumansai#1834 Closes tinyhumansai#1835 Closes tinyhumansai#1836
- Register 6 new domains in build_declared_controller_schemas() - Fix lock ordering in guided_flows::submit_answer (FLOWS→SESSIONS) to match start_flow ordering and prevent potential deadlock
- voice_assistant/brain.rs: use floor_char_boundary for safe multi-byte truncation - operator_inbox/engine.rs: use get(3..) with fallback instead of direct slice - add edge-case tests for both
- 6 JSON-RPC E2E tests covering full lifecycle per domain - capability plan doc with commercial inspirations + non-goals - fix unused warn imports (clippy clean)
- voice_actions: use recognize/list_mappings (not register_action) - chat_with_data: use source/columns/row_count params - cargo fmt applied
Replace fake keyword-matching query engine with real SQL generation using sqlparser for validation. Generates syntactically correct SQL for aggregation, time-filter, and group-by patterns. Includes safety checks blocking DROP/DELETE/UPDATE operations. Closes part of tinyhumansai#1835
Parse raw RFC 5322 emails into structured fields (sender, subject, body, threading, attachments). Extract urgency signals for priority scoring. Uses mail-parser crate already in project deps. Closes part of tinyhumansai#1834
- quote sql identifiers (injection prevention) - ws processing lock + brain turn session check - pre-alloc noise cancel buffers (no per-frame alloc) - VecDeque for session history (O(1) pop_front) - evict_old_intents loops until under cap - atomic voice profile writes (tmp+rename) - input validation all rpc handlers - rm dead DetectedLanguage/SpeechEmotion structs - doc gaps: voice cloning, neural diarize as future work
a218f04 to
30bbe08
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 20
🧹 Nitpick comments (2)
src/openhuman/chat_with_data/anomaly.rs (1)
186-190: ⚡ Quick winUse debug-level for detector completion logs.
This is diagnostic noise and should stay at
debug/tracelevel.As per coding guidelines, “Use `log` / `tracing` at `debug` or `trace` level for Rust diagnostic logs.”Proposed fix
- info!( + debug!( anomaly_count = combined.len(), series_len = data.len(), "[chat-with-data-anomaly] combined detection complete" );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/anomaly.rs` around lines 186 - 190, The log at the end of the anomaly detection currently uses info! and produces diagnostic noise; change the info! macro call that logs anomaly_count = combined.len(), series_len = data.len(), "[chat-with-data-anomaly] combined detection complete" to a debug! (or trace!) call instead so this detector completion message is emitted at debug level; update the invocation in anomaly.rs where combined and data are referenced to use debug! with the same fields and message.src/openhuman/chat_with_data/engine.rs (1)
332-339: 💤 Low valueNon-deterministic column selection for anomaly scan.
r.values().next()retrieves an arbitrary column value sinceHashMapiteration order is not guaranteed. This means different runs could analyze different columns, leading to inconsistent anomaly detection results.Proposed fix: use sorted keys for deterministic selection
let values: Option<Vec<f64>> = ROW_STORE.lock().ok().and_then(|store| { store.get(ds_id).map(|rows| { - // Use the first numeric column's values. - rows.iter() - .filter_map(|r| r.values().next().copied()) + // Use the first column alphabetically for deterministic results. + let col = rows.first().and_then(|r| r.keys().min().cloned()); + rows.iter() + .filter_map(|r| col.as_ref().and_then(|c| r.get(c).copied())) .collect() }) });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/engine.rs` around lines 332 - 339, The current anomaly scan picks a non-deterministic column via r.values().next()—change it to pick a deterministic numeric column by inspecting column keys in a stable order: for each row in the ROW_STORE lookup (the block using ROW_STORE.lock().and_then(|store| store.get(ds_id)...)), obtain the row's keys (e.g., r.keys()), sort them (or select the smallest key), then use that selected key to extract the numeric value (e.g., r.get(&selected_key)). Collect those values into values: Option<Vec<f64>> so the same column is used every run; update the code paths around rows.iter().filter_map(|r| r.values().next().copied()).collect() to use the deterministic key lookup instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/about_app/catalog.rs`:
- Around line 1228-1231: The catalog entries whose descriptions mention cloud
fallback/LLM-backed flows currently set leaves_device: false; update those
entries to conservatively declare potential egress by setting leaves_device:
true (or replace with a dedicated conditional constant like
PRIVACY_CONDITIONAL_CLOUD if your codebase defines one) for the items containing
the shown description fields so the runtime catalog accurately reports possible
remote data transfer; change the leaves_device flag for the items around the
provided description snippets (and the other noted blocks at lines with similar
descriptions) rather than altering the descriptive text.
In `@src/openhuman/chat_with_data/anomaly.rs`:
- Around line 54-57: Validate tuning inputs and data before running detection:
in detect_zscore (and similarly in detect_mad and detect_iqr) check that
threshold > 0 (and k > 0 where k is used), that data.len() meets the minimum
(already checks <3), and that all data values (and threshold/k if floats) are
finite (use is_finite semantics); if any check fails, return an empty
Vec<Anomaly> immediately. Update the guards at the top of detect_zscore and the
same guard logic in the other detector functions referenced to perform these
validations so invalid parameters or non-finite inputs do not produce misleading
anomalies.
In `@src/openhuman/chat_with_data/db_connector.rs`:
- Around line 56-77: The current code silently drops per-row mapping errors by
using filter_map(|r| r.ok()) after stmt.query_map(...) when building rows;
replace that by collecting the iterator of Result<HashMap<_,_>, _> into a
Result<Vec<_>, _> so mapping errors are surfaced and propagated (e.g., remove
filter_map and use a collect that yields Result<Vec<_>, E>, then map_err to
include LOG_PREFIX and context) for the rows variable created from
stmt.query_map; apply the same change to the other occurrences (the blocks
around the other ranges mentioned) so any rusqlite row-mapping errors from
query_map are not silently ignored but returned or logged with contextual
information.
- Line 39: The debug log currently prints raw SQL via debug!("{LOG_PREFIX}
executing on {db_path}: {sql}"), which risks exposing sensitive literals; update
the logging in the DB execution path to omit the full SQL and instead log
structured metadata such as LOG_PREFIX, db_path, the SQL length (sql.len()), and
an optional non-reversible fingerprint (e.g., SHA256 of sql) or a
redacted/normalized query form; reference the existing variables LOG_PREFIX,
db_path, and sql in your change and ensure no raw SQL text is emitted to logs.
In `@src/openhuman/chat_with_data/engine.rs`:
- Around line 404-412: delete_dataset currently removes the dataset entry from
DATASETS but leaves per-row entries in ROW_STORE causing orphaned data; update
the delete_dataset function to also acquire the ROW_STORE lock (similar to
DATASETS.lock()), and remove all rows whose dataset_id matches the deleted
dataset (e.g., iterate and retain only rows where row.dataset_id != dataset_id
or remove_by_key predicate), ensuring both locks are error-checked and that
ROW_STORE is cleaned before returning Ok from delete_dataset.
In `@src/openhuman/chat_with_data/rpc.rs`:
- Around line 52-54: Replace the use of the question-mark operator on
engine::get_dataset(id) so dataset lookup errors return the same JSON error
shape as other validations; call engine::get_dataset(id) in a matched Result (or
map_err) and on Err return Ok(json!({"ok": false, "error": format!("failed to
get dataset: {}", err)})) (or similar message), so the handler consistently
returns Ok(JSON) instead of propagating Err(String) — update the code around the
ds assignment where engine::get_dataset(id) is invoked to perform this explicit
error-to-JSON conversion.
In `@src/openhuman/chat_with_data/webhooks.rs`:
- Around line 33-49: Validate and normalize webhook target URLs in
register_webhook (and again in the code path that performs the POST) before
storing or sending: parse the provided url string, ensure scheme is http or
https, reject non-absolute URLs, resolve the hostname to IP(s) and block
loopback, link-local (169.254/fe80::), and private ranges (10/8, 172.16/12,
192.168/16 and IPv6 equivalents) and localhost names, and reject URLs containing
credentials or unsupported schemes; return a clear Err on failure and only store
a WebhookConfig when validation passes so the code that calls WebhookConfig::url
(and the sender around line ~103) can safely POST to allowed external endpoints.
- Line 48: The info! log calls that print the full webhook URL (e.g.,
info!("{LOG_PREFIX} registered webhook {id} -> {url}")) must redact sensitive
parts before logging; parse the url string (use url::Url::parse(&url)), strip
userinfo (username/password) and remove query and fragment (or rebuild from
scheme, host, and path), then log the redacted_url instead of the raw url;
update all occurrences (the info! on registration and the other info/debug logs
at the same sites referenced by id and url) to use the redacted value.
In `@src/openhuman/guided_flows/rpc.rs`:
- Around line 59-60: The call to try_llm_personalize in submit_answer is
unbounded and can stall the request path; wrap the await in a
tokio::time::timeout with a reasonable Duration (e.g., few seconds) and handle
the timeout/error by logging and falling back to a non-personalized
recommendation (or returning a controlled error) so the request continues; apply
the same timeout wrapper and handling to the other try_llm_personalize
invocation around lines 103-106 to ensure both personalization calls are
bounded.
- Around line 8-23: Change the handler to use the domain RpcOutcome<T> pattern:
update handle_list_flows signature from Result<Value, String> to
RpcOutcome<Value>, replace the raw JSON envelope return with a success
RpcOutcome containing the JSON payload produced from engine::list_flows(), and
map any failure paths into an RpcOutcome error using the project's RpcOutcome
constructors; apply the same change to the other RPC handlers in this file (the
other functions in the noted ranges) so all guided_flows RPC controllers
consistently return RpcOutcome<T>.
- Around line 149-195: The current parsing of steps silently drops malformed
entries because steps_raw -> .iter().filter_map(...) builds steps by skipping
any invalid objects; instead, change the logic that builds steps into an
explicit mapper that validates each entry and returns a
Result<super::types::FlowStep, String> (or similar error) so you can collect
into Result<Vec<_>, _> and return a JSON error when any step is malformed;
specifically replace the filter_map block that produces FlowStep with a map that
checks obj presence, required fields ("id", "prompt", "answer_type" etc.), and
conversions (choices, branches, next), returning Err with a clear message
(including the index or failing field) on first invalid step, then propagate
that error to return Ok(json!({"ok": false, "error": ...})). Ensure the new code
still constructs super::types::FlowStep and uses the same field mappings but
rejects instead of skipping invalid entries.
In `@src/openhuman/guided_flows/scoring.rs`:
- Around line 170-173: The normalization currently seeds max_score with 0.0
which breaks when all scores are negative; change the fold seed for max_score to
f64::MIN so max_score = scored.iter().map(|(_, s)| *s).fold(f64::MIN, f64::max)
(keep min_score as f64::MAX), recompute range = max_score - min_score and ensure
any subsequent normalization logic using range in this module (e.g., where
scored is iterated) guards against range == 0.0 to avoid division by zero.
In `@src/openhuman/live_captions/diarize.rs`:
- Around line 25-30: The code uses fixed WINDOW_SAMPLES and HOP_SAMPLES
(8_000/4_000) and ignores the diarize function's sample_rate parameter; update
the logic in diarize (and any related places) to either validate/enforce
sample_rate == 16_000 (return an error if not) or compute window and hop sizes
from sample_rate (e.g., WINDOW_SAMPLES = (0.5 * sample_rate) as usize,
HOP_SAMPLES = (0.25 * sample_rate) as usize) and replace uses of the constants
WINDOW_SAMPLES/HOP_SAMPLES with these computed values so non-16k inputs are
handled correctly. Ensure CHANGE_THRESHOLD usage remains unchanged.
- Around line 119-120: samples_to_ms currently divides by sample_rate and will
panic on sample_rate == 0; update the public function samples_to_ms(samples:
usize, sample_rate: u32) -> u64 to explicitly guard against sample_rate == 0
(e.g., early-return 0 or handle via Result/Error) before performing the division
so no division-by-zero can occur; modify the function body that references
samples and sample_rate to check sample_rate first and then compute (samples as
u64 * 1000) / sample_rate as u64 when safe.
In `@src/openhuman/live_captions/persist.rs`:
- Around line 37-41: The code is vulnerable to path traversal because
transcript.id is interpolated directly into filenames (used with dir.join and
passed to std::fs::write / std::fs::remove_file); fix by validating/sanitizing
IDs before using them — either reject/normalize any ID containing path
separators or suspicious components and only allow a safe charset (e.g.,
[A-Za-z0-9._-]) or replace the filename with a safe derived name (e.g., a
hex/hash of transcript.id or a UUID). Alternatively, after computing path =
dir.join(...), canonicalize the resulting path and canonicalize dir and assert
the file path starts_with the dir canonical path before writing/removing; apply
this same check for the functions that write (where std::fs::write is called)
and delete (where std::fs::remove_file or similar is used).
In `@src/openhuman/live_captions/rpc.rs`:
- Around line 271-281: The call to store::get_transcript in
handle_export_transcript currently uses the `?` operator which propagates errors
inconsistently; replace it with explicit error handling that matches the
JSON/string error pattern used elsewhere (e.g., match or map_err on
store::get_transcript(tid) and return Err(String) with the same JSON/error
message format) so failures produce the same error shape as other RPC handlers;
update the binding of `t` only on Ok and return early with the formatted Err on
Err.
- Around line 89-118: The function handle_summarize_transcript mixes early ?
propagation from store::get_transcript with JSON-style Ok(json!({"ok": false,
...})) error returns; make all error paths return JSON values instead of
propagating Err. Replace the store::get_transcript(tid)? call with a match or
map_err that converts its error into Ok(json!({"ok": false, "error": err})) and
return that; change the TranscriptState::Completed check to return
Ok(json!({"ok": false, "error": "transcript must be completed before
summarizing"})) rather than Err(...); keep the existing JSON returns for
try_llm_summarize and store::summarize_transcript paths (and still call
store::set_summary on success) so every failure path in
handle_summarize_transcript returns a consistent Ok(json!({...})) Value
response.
In `@src/openhuman/live_captions/store.rs`:
- Around line 37-49: The eviction currently only removes the oldest transcript
if it's Completed, so when the store is at MAX_TRANSCRIPTS and no Completed
items exist the new insert still pushes past capacity; change the logic in the
block around store.len(), MAX_TRANSCRIPTS, and store.insert to fall back to
evicting the absolute oldest transcript (ignoring TranscriptState) when no
Completed candidate is found — locate the code using symbols store.len(),
TranscriptState::Completed, oldest, warn!, store.remove(&old_id) and
store.insert(tid, t.clone()) and update the selection to min_by_key over all
entries if the Completed-filter yields None, then remove that id before
inserting.
In `@src/openhuman/live_captions/translate.rs`:
- Around line 62-77: The from_codes function currently matches src and tgt
exactly; normalize both inputs before matching by converting them to a canonical
form (e.g., lowercase and trimmed) so mixed- or upper-case codes like "EN" or "
En " succeed; update the beginning of the from_codes implementation to compute
normalized_src and normalized_tgt (from src and tgt via to_lowercase() and
trim()) and use those identifiers in the match arms (referring to from_codes,
normalized_src, normalized_tgt, and the existing enum variants like EnEs, EsEn,
etc.).
In `@src/openhuman/live_captions/types.rs`:
- Around line 52-54: The duration_ms function currently returns the end_ms of
the last segment (self.segments.last()), which assumes segments are time-sorted;
change it to compute the maximum end_ms across all segments (e.g., iterate over
self.segments, map to s.end_ms, and take max) and return that value or 0 if
there are no segments so duration_ms() is robust to out-of-order segments.
---
Nitpick comments:
In `@src/openhuman/chat_with_data/anomaly.rs`:
- Around line 186-190: The log at the end of the anomaly detection currently
uses info! and produces diagnostic noise; change the info! macro call that logs
anomaly_count = combined.len(), series_len = data.len(),
"[chat-with-data-anomaly] combined detection complete" to a debug! (or trace!)
call instead so this detector completion message is emitted at debug level;
update the invocation in anomaly.rs where combined and data are referenced to
use debug! with the same fields and message.
In `@src/openhuman/chat_with_data/engine.rs`:
- Around line 332-339: The current anomaly scan picks a non-deterministic column
via r.values().next()—change it to pick a deterministic numeric column by
inspecting column keys in a stable order: for each row in the ROW_STORE lookup
(the block using ROW_STORE.lock().and_then(|store| store.get(ds_id)...)), obtain
the row's keys (e.g., r.keys()), sort them (or select the smallest key), then
use that selected key to extract the numeric value (e.g., r.get(&selected_key)).
Collect those values into values: Option<Vec<f64>> so the same column is used
every run; update the code paths around rows.iter().filter_map(|r|
r.values().next().copied()).collect() to use the deterministic key lookup
instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ccef3678-a2a4-4e26-a8ee-cf29ce1cc607
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (61)
Cargo.tomldocs/YC_CAPABILITIES.mddocs/boost-vc-capability-plan.mdsrc/core/all.rssrc/openhuman/about_app/catalog.rssrc/openhuman/chat_with_data/anomaly.rssrc/openhuman/chat_with_data/db_connector.rssrc/openhuman/chat_with_data/engine.rssrc/openhuman/chat_with_data/mod.rssrc/openhuman/chat_with_data/rpc.rssrc/openhuman/chat_with_data/schemas.rssrc/openhuman/chat_with_data/sql_gen.rssrc/openhuman/chat_with_data/types.rssrc/openhuman/chat_with_data/webhooks.rssrc/openhuman/guided_flows/engine.rssrc/openhuman/guided_flows/mod.rssrc/openhuman/guided_flows/rpc.rssrc/openhuman/guided_flows/schemas.rssrc/openhuman/guided_flows/scoring.rssrc/openhuman/guided_flows/types.rssrc/openhuman/inference/provider/reliable.rssrc/openhuman/inference/provider/reliable_tests.rssrc/openhuman/live_captions/diarize.rssrc/openhuman/live_captions/mod.rssrc/openhuman/live_captions/persist.rssrc/openhuman/live_captions/rpc.rssrc/openhuman/live_captions/schemas.rssrc/openhuman/live_captions/store.rssrc/openhuman/live_captions/translate.rssrc/openhuman/live_captions/types.rssrc/openhuman/live_captions/voice_profiles.rssrc/openhuman/meet_agent/brain.rssrc/openhuman/meet_agent/rpc.rssrc/openhuman/meet_agent/wav.rssrc/openhuman/mod.rssrc/openhuman/operator_inbox/connection.rssrc/openhuman/operator_inbox/engine.rssrc/openhuman/operator_inbox/imap_client.rssrc/openhuman/operator_inbox/mod.rssrc/openhuman/operator_inbox/parser.rssrc/openhuman/operator_inbox/poller.rssrc/openhuman/operator_inbox/rpc.rssrc/openhuman/operator_inbox/schemas.rssrc/openhuman/operator_inbox/types.rssrc/openhuman/util.rssrc/openhuman/voice_actions/engine.rssrc/openhuman/voice_actions/llm_intent.rssrc/openhuman/voice_actions/mod.rssrc/openhuman/voice_actions/rpc.rssrc/openhuman/voice_actions/schemas.rssrc/openhuman/voice_actions/types.rssrc/openhuman/voice_assistant/brain.rssrc/openhuman/voice_assistant/mod.rssrc/openhuman/voice_assistant/noise_cancel.rssrc/openhuman/voice_assistant/rpc.rssrc/openhuman/voice_assistant/schemas.rssrc/openhuman/voice_assistant/session.rssrc/openhuman/voice_assistant/types.rssrc/openhuman/voice_assistant/wake_word.rssrc/openhuman/voice_assistant/ws_transport.rstests/json_rpc_e2e.rs
💤 Files with no reviewable changes (27)
- src/openhuman/operator_inbox/connection.rs
- src/openhuman/mod.rs
- src/openhuman/operator_inbox/mod.rs
- src/openhuman/operator_inbox/poller.rs
- src/openhuman/voice_actions/llm_intent.rs
- src/openhuman/operator_inbox/parser.rs
- src/openhuman/util.rs
- src/openhuman/operator_inbox/types.rs
- src/openhuman/voice_actions/mod.rs
- src/openhuman/voice_assistant/mod.rs
- src/openhuman/voice_actions/types.rs
- src/openhuman/operator_inbox/schemas.rs
- src/openhuman/voice_actions/rpc.rs
- src/openhuman/operator_inbox/imap_client.rs
- src/openhuman/voice_assistant/schemas.rs
- src/openhuman/voice_assistant/types.rs
- tests/json_rpc_e2e.rs
- src/openhuman/operator_inbox/engine.rs
- src/openhuman/voice_actions/schemas.rs
- src/openhuman/voice_assistant/wake_word.rs
- src/openhuman/voice_assistant/noise_cancel.rs
- src/openhuman/voice_assistant/brain.rs
- src/openhuman/voice_assistant/rpc.rs
- src/openhuman/voice_assistant/session.rs
- src/openhuman/voice_assistant/ws_transport.rs
- src/openhuman/voice_actions/engine.rs
- src/openhuman/operator_inbox/rpc.rs
✅ Files skipped from review due to trivial changes (2)
- docs/boost-vc-capability-plan.md
- docs/YC_CAPABILITIES.md
- privacy: mark LLM-backed caps leaves_device=true - security: SSRF guard on webhook URLs, path traversal on transcript IDs - logging: redact raw SQL and webhook URLs - robustness: input validation (anomaly, diarize, scoring, translate) - correctness: row decode errors propagated, capacity enforced, ROW_STORE cleanup - consistency: ? -> match for JSON error responses - perf: 4s timeout on LLM personalization
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/openhuman/live_captions/rpc.rs (1)
216-224: 💤 Low valueMinor inconsistency:
handle_transcribe_audiouses?for missing required fields.Lines 220 and 224 use
.ok_or("...")?which returnsErr(String), while other handlers in this file (e.g.,handle_append_segmentat lines 33-38) returnOk(json!({"ok": false, "error": ...}))for validation failures. Consider aligning with the JSON error pattern for consistency.Suggested fix for consistency
pub async fn handle_transcribe_audio(p: Map<String, Value>) -> Result<Value, String> { let transcript_id = p .get("transcript_id") .and_then(|v| v.as_str()) - .ok_or("missing transcript_id")?; + .unwrap_or(""); + if transcript_id.is_empty() { + return Ok(json!({"ok": false, "error": "transcript_id is required"})); + } let audio_b64 = p .get("audio_base64") .and_then(|v| v.as_str()) - .ok_or("missing audio_base64")?; + .unwrap_or(""); + if audio_b64.is_empty() { + return Ok(json!({"ok": false, "error": "audio_base64 is required"})); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/live_captions/rpc.rs` around lines 216 - 224, handle_transcribe_audio currently uses .ok_or("...")? which returns Err(String) on missing fields; make it consistent with other handlers like handle_append_segment by returning a JSON error object instead. Replace the .ok_or(...)? validations for "transcript_id" and "audio_base64" so that when they are missing you return Ok(json!({"ok": false, "error": "missing transcript_id"})) or Ok(json!({"ok": false, "error": "missing audio_base64"})) respectively, keeping the function signature and other logic intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/chat_with_data/webhooks.rs`:
- Around line 57-61: The current IPv6 SSRF check only rejects loopback (::1) in
the host.parse::<std::net::Ipv6Addr>() branch (uses ip.is_loopback()), but
misses link-local and unique-local ranges; update that branch to also reject
IPv6 link-local and unique-local addresses by checking
ip.is_unicast_link_local() and ip.is_unique_local() (in the same
host.parse::<std::net::Ipv6Addr>() block alongside ip.is_loopback()) and return
the same error when any of those conditions are true.
---
Nitpick comments:
In `@src/openhuman/live_captions/rpc.rs`:
- Around line 216-224: handle_transcribe_audio currently uses .ok_or("...")?
which returns Err(String) on missing fields; make it consistent with other
handlers like handle_append_segment by returning a JSON error object instead.
Replace the .ok_or(...)? validations for "transcript_id" and "audio_base64" so
that when they are missing you return Ok(json!({"ok": false, "error": "missing
transcript_id"})) or Ok(json!({"ok": false, "error": "missing audio_base64"}))
respectively, keeping the function signature and other logic intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: df946f3b-8d82-4578-ad96-61bd1e747374
📒 Files selected for processing (16)
src/openhuman/about_app/catalog.rssrc/openhuman/about_app/types.rssrc/openhuman/chat_with_data/anomaly.rssrc/openhuman/chat_with_data/db_connector.rssrc/openhuman/chat_with_data/engine.rssrc/openhuman/chat_with_data/rpc.rssrc/openhuman/chat_with_data/webhooks.rssrc/openhuman/guided_flows/rpc.rssrc/openhuman/guided_flows/schemas.rssrc/openhuman/guided_flows/scoring.rssrc/openhuman/live_captions/diarize.rssrc/openhuman/live_captions/persist.rssrc/openhuman/live_captions/rpc.rssrc/openhuman/live_captions/store.rssrc/openhuman/live_captions/translate.rssrc/openhuman/live_captions/types.rs
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/openhuman/chat_with_data/webhooks.rs (1)
33-70:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftResolve and re-check DNS hostnames before posting.
validate_webhook_urlonly screens the literal host string, so a hostname that resolves to a private address still passes here, andnotify_insightlater posts to that stored URL unchanged. That leaves a remaining SSRF hole via internal DNS names or DNS rebinding unless you resolve/block private results and re-validate at send time as well.Also applies to: 146-155
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/webhooks.rs` around lines 33 - 70, The current validate_webhook_url only checks the raw host string (in validate_webhook_url) and doesn't prevent a hostname that resolves to private/internal IPs; update the logic to resolve DNS for the webhook host and re-check all resolved A/AAAA addresses for loopback, private, link-local and RFC1918 ranges (same checks currently applied to parsed host IPs), and perform the same resolution-and-validate step in notify_insight just before sending the request to prevent DNS rebinding or changed DNS records from bypassing validation; ensure you reject the webhook when any resolved address is disallowed and log the resolution result so failures are auditable.
🧹 Nitpick comments (1)
src/openhuman/chat_with_data/webhooks.rs (1)
61-68: ⚡ Quick winAdd regression tests for the new IPv6 guards.
The new
fe80::/10andfc00::/7branches are not exercised by the current tests, so this SSRF protection could regress silently. A couple of explicit rejects for URLs likehttp://[fe80::1]/hookandhttp://[fc00::1]/hookwould lock this down.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/chat_with_data/webhooks.rs` around lines 61 - 68, Add regression tests that exercise the new IPv6 guards so the fe80::/10 and fc00::/7 branches cannot regress: write unit tests that construct URLs like "http://[fe80::1]/hook" and "http://[fc00::1]/hook", pass them into the webhook URL validation function in webhooks.rs (the code that computes let segments = ip.segments()), and assert that the call returns an Err containing the exact messages "link-local addresses are not allowed" and "private addresses are not allowed" respectively, ensuring those branches are explicitly covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/openhuman/chat_with_data/webhooks.rs`:
- Around line 33-70: The current validate_webhook_url only checks the raw host
string (in validate_webhook_url) and doesn't prevent a hostname that resolves to
private/internal IPs; update the logic to resolve DNS for the webhook host and
re-check all resolved A/AAAA addresses for loopback, private, link-local and
RFC1918 ranges (same checks currently applied to parsed host IPs), and perform
the same resolution-and-validate step in notify_insight just before sending the
request to prevent DNS rebinding or changed DNS records from bypassing
validation; ensure you reject the webhook when any resolved address is
disallowed and log the resolution result so failures are auditable.
---
Nitpick comments:
In `@src/openhuman/chat_with_data/webhooks.rs`:
- Around line 61-68: Add regression tests that exercise the new IPv6 guards so
the fe80::/10 and fc00::/7 branches cannot regress: write unit tests that
construct URLs like "http://[fe80::1]/hook" and "http://[fc00::1]/hook", pass
them into the webhook URL validation function in webhooks.rs (the code that
computes let segments = ip.segments()), and assert that the call returns an Err
containing the exact messages "link-local addresses are not allowed" and
"private addresses are not allowed" respectively, ensuring those branches are
explicitly covered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fc220e35-a39c-4a01-ac65-1cd758034ec3
📒 Files selected for processing (1)
src/openhuman/chat_with_data/webhooks.rs
|
@senamakel, I need your help here. See the docs(docs/VC_CAPABILITIES.md) in this commit and my approach, and let's discuss if there is anything wrong or right or whatever that is, I need your hand here. |
resolve Cargo.toml + json_rpc_e2e.rs conflicts
keep both voice AI schemas and upstream devices/mobile caps
Summary
Problem
OH lacked voice foundation, captioning, voice actions, inbox triage, data analytics, and guided recommendation capabilities needed for the Boost VC roadmap.
Solution
6 new domain modules under src/openhuman/, each following controller-registry pattern. No new heavy native deps (sqlparser is pure Rust). LLM calls use the established create_chat_provider("agentic", &config) pattern with graceful fallback when no session token is available.
Acceptance Criteria
#1831 — Add a free voice foundation inspired by Deepgram
.github/workflows/coverage.yml).#1832 — Add live captions and transcript workflows inspired by Ava
.github/workflows/coverage.yml).#1833 — Add voice-driven desktop actions inspired by Screevo
.github/workflows/coverage.yml).#1834 — Add an operator inbox and outreach assistant inspired by Kriya
.github/workflows/coverage.yml).#1835 — Add chat-with-data and proactive insights inspired by Athenic
.github/workflows/coverage.yml).#1836 — Add guided recommendation flows inspired by Octane AI
.github/workflows/coverage.yml).Submission Checklist
Impact
Related
Closes #1830
Closes #1831
Closes #1832
Closes #1833
Closes #1834
Closes #1835
Closes #1836
Summary by CodeRabbit
New Features
Documentation