Skip to content

release: promote recursive RLM trace observability#96

Merged
namastex888 merged 3 commits into
mainfrom
dev
May 28, 2026
Merged

release: promote recursive RLM trace observability#96
namastex888 merged 3 commits into
mainfrom
dev

Conversation

@namastex888
Copy link
Copy Markdown
Collaborator

@namastex888 namastex888 commented May 28, 2026

Summary

Promote dev to main for the recursive RLM child-run observability release.

Included

  • recursive child_start/child_end events
  • inherited bounded child flags and ancestry env
  • root/child/total usage split reporting
  • Langfuse parent trace + child spans
  • fail-closed recursive max-depth default and child process-group abort cleanup
  • bounded Langfuse flush and child usage budget accounting

Proof

Summary by CodeRabbit

  • New Features

    • Added Langfuse observability integration for tracing LLM calls with support for parent/child span tracking.
    • Introduced usage breakdown metrics to distinguish between root and child LLM consumption.
    • Enhanced recursive LLM call tracing with detailed lifecycle logging for child processes.
  • Tests

    • Added comprehensive test suite for recursive tracing helpers and Langfuse integration.

Review Change Stack

github-actions Bot and others added 3 commits May 26, 2026 07:53
Adds recursive-tree observability for rlm_query child runs.

Includes bounded child invocation, usage splits, Langfuse spans,

fail-closed recursion controls, and process-group abort cleanup.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

📝 Walkthrough

Walkthrough

This PR introduces recursive RLM child process tracing and observability via Langfuse integration. It adds a LangfuseTraceRecorder for distributed trace collection, extends RLM query APIs to support recursive depth/callbacks/structured results, introduces usage delta accounting for root-vs-child splits, adds child lifecycle event logging, and integrates everything into the main rlmLoop with usage breakdown reporting and budget tracking.

Changes

Recursive RLM tracing with Langfuse observability

Layer / File(s) Summary
Langfuse trace recorder implementation
src/langfuse.ts
LangfuseTraceRecorder class conditionally records trace and span creation/update events to Langfuse public ingestion API, with configurable host/auth, in-memory event queuing, and batched flush with timeout and error handling.
Recursive child execution types and contract
src/llm.ts
Defines RlmChildResult (answer/runId/usage/raw), RlmChildInvocationOptions (output/maxIterations/timeout/depth/cost/tokens/log/stats/noSession), and UsageBreakdown (root/child/total) types that establish the data model for recursive invocation and usage accounting.
Child process environment and output helpers
src/llm.ts
buildRlmChildArgs generates bounded CLI arguments from options, buildChildEnv wires parent/child/correlation identifiers and recursion depth into child process environment, parseRlmChildOutput extracts structured result (answer/runId/usage) from child JSON stdout.
Enhanced RLM query functions with recursive lifecycle
src/llm.ts
Rewrites rlmQuery and rlmQueryBatched to support recursive depth limits, optional logger callbacks (childStart/childEnd), lifecycle hooks (onChildStart/onChildEnd), child process environment inheritance with ancestry tracking, and structured error handling mapping exit/spawn/parse outcomes to RlmChildResult.
Updated LLM request handler for recursive options
src/llm.ts
handleLLMRequest signature extended to accept optional childUsage accumulator and recursiveOptions (including logger and lifecycle callbacks); switch cases for rlm_query and rlm_query_batched now invoke the updated recursive APIs, merge returned usage into both main and child accumulators, and return only answer strings.
Child lifecycle event logging
src/logger.ts
EventType union expanded to include child_start and child_end event types; Logger class adds childStart (correlation/prompt/depth) and childEnd (correlation/runId/tokens/cost/time/error) methods that emit structured JSONL events.
Usage delta and breakdown types
src/llm.ts
usageDelta function computes root-vs-child token/cost differences, UsageBreakdown type groups root/child/total UsageStats for comprehensive usage reporting across recursive boundaries.
Output stats with usage splits
src/output.ts
RLMResult optionally carries usageBreakdown; StatsData extended with usage_split (root/child/total); new UsageSplitStats type defines token/cost/call-count shape; toUsageSplitStats converts UsageStats to output shape; buildStats populates stats.usage_split from result breakdown.
RLM loop Langfuse and usage tracking
src/rlm.ts
rlmLoop creates and flushes LangfuseTraceRecorder per run; introduces childUsage accumulator and helpers buildUsageBreakdown (root/child/total split) and buildRemainingChildBudget (child cost/token limits); repl.onLLMRequest handler tracks child usage deltas, passes Langfuse callbacks and budget context into handleLLMRequest, records deltas back into BudgetTracker; buildResult extended to attach usageBreakdown on all completion paths (normal/abort/timeout).
Comprehensive tests and public API updates
tests/recursive-trace.test.ts, src/index.ts, package.json, src/version.ts
New test suite validates parseRlmChildOutput, buildRlmChildArgs, buildChildEnv, usageDelta, buildStats with usage splits, and LangfuseTraceRecorder batch emission; src/index.ts exports LangfuseTraceRecorder and adds UsageBreakdown to type re-exports; version bumped to 0.260528.1.

Possibly related PRs

  • automagik-dev/rlmx#45: Both PRs modify the core handleLLMRequest API and rlm_query/rlm_query_batched request plumbing in src/llm.ts, so this PR's recursive tracing implementation directly extends the retrieved PR's handler infrastructure.
  • automagik-dev/rlmx#31: Both PRs update versioning surfaces (package.json version and src/version.ts VERSION constant), indicating this PR's changes are part of a coordinated release or CI/CD-driven version update sequence.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A recursive trace through Langfuse glow,
Child spans dancing, splitting the flow,
Each delta tracked, each token accounted,
The RLM's journey, beautifully mounted!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'release: promote recursive RLM trace observability' accurately reflects the main changes in the PR, which add recursive tracing, observability features via Langfuse integration, and usage breakdown reporting.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a minimal Langfuse ingestion recorder (LangfuseTraceRecorder) to enable recursive RLM tree observability without external SDK dependencies. It adds tracking for recursive child process execution, including child start/end logging, usage breakdown (root vs. child vs. total splits), and budget limits enforcement across recursion. Feedback highlights two issues in src/llm.ts: first, when the maximum recursion depth is reached, the early return does not trigger the onChildStart and onChildEnd callbacks, leading to incomplete Langfuse traces; second, if a child process fails to spawn, both error and close events can fire, potentially causing duplicate callback invocations and double promise resolution, which can be resolved using a guard flag.

Comment thread src/llm.ts
Comment on lines +410 to +431
if (options.maxDepth !== undefined && currentDepth >= options.maxDepth) {
const error = `Error: max recursive rlm_query depth ${options.maxDepth} reached`;
const result: RlmChildResult = { answer: error };
options.logger?.childStart({
child_correlation_id: correlationId,
prompt_preview: prompt.slice(0, 200),
depth,
});
options.logger?.childEnd({
child_correlation_id: correlationId,
child_run_id: null,
input_tokens: 0,
output_tokens: 0,
cost: 0,
llm_calls: 0,
time_ms: 0,
is_error: true,
error_message: error,
});
resolve(result);
return;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When the maximum recursion depth is reached, the early return block logs the event to the local logger but does not invoke the onChildStart and onChildEnd callbacks. This prevents the aborted child run from being recorded as a span in Langfuse, leading to incomplete or missing traces for aborted recursive calls.

Invoking these callbacks ensures that the aborted run is correctly represented in Langfuse as a failed span with the appropriate error message.

    if (options.maxDepth !== undefined && currentDepth >= options.maxDepth) {
      const error = `Error: max recursive rlm_query depth ${options.maxDepth} reached`;
      const result: RlmChildResult = { answer: error };
      options.logger?.childStart({
        child_correlation_id: correlationId,
        prompt_preview: prompt.slice(0, 200),
        depth,
      });
      const spanId = options.onChildStart?.({ correlationId, prompt, depth });
      options.logger?.childEnd({
        child_correlation_id: correlationId,
        child_run_id: null,
        input_tokens: 0,
        output_tokens: 0,
        cost: 0,
        llm_calls: 0,
        time_ms: 0,
        is_error: true,
        error_message: error,
      });
      options.onChildEnd?.({ spanId, result, durationMs: 0, isError: true, errorMessage: error });
      resolve(result);
      return;
    }

Comment thread src/llm.ts
Comment on lines 480 to 532
child.on("close", (code) => {
const durationMs = Date.now() - startMs;
if (code !== 0) {
resolve(`Error: child rlmx exited with code ${code}. ${stderr}`.trim());
const errorMessage = `Error: child rlmx exited with code ${code}. ${stderr}`.trim();
const result: RlmChildResult = { answer: errorMessage };
options.logger?.childEnd({
child_correlation_id: correlationId,
child_run_id: null,
input_tokens: 0,
output_tokens: 0,
cost: 0,
llm_calls: 0,
time_ms: durationMs,
is_error: true,
error_message: errorMessage,
});
options.onChildEnd?.({ spanId, result, durationMs, isError: true, errorMessage });
resolve(result);
return;
}
try {
const result = JSON.parse(stdout);
resolve(result.answer ?? stdout);
} catch {
resolve(stdout.trim() || `Error: empty response from child rlmx`);
}
const result = parseRlmChildOutput(stdout);
options.logger?.childEnd({
child_correlation_id: correlationId,
child_run_id: result.runId ?? null,
input_tokens: result.usage?.inputTokens ?? 0,
output_tokens: result.usage?.outputTokens ?? 0,
cost: result.usage?.totalCost ?? 0,
llm_calls: result.usage?.llmCalls ?? 0,
time_ms: durationMs,
});
options.onChildEnd?.({ spanId, result, durationMs });
resolve(result);
});

child.on("error", (err) => {
resolve(`Error: failed to spawn child rlmx: ${err.message}`);
const durationMs = Date.now() - startMs;
const errorMessage = `Error: failed to spawn child rlmx: ${err.message}`;
const result: RlmChildResult = { answer: errorMessage };
options.logger?.childEnd({
child_correlation_id: correlationId,
child_run_id: null,
input_tokens: 0,
output_tokens: 0,
cost: 0,
llm_calls: 0,
time_ms: durationMs,
is_error: true,
error_message: errorMessage,
});
options.onChildEnd?.({ spanId, result, durationMs, isError: true, errorMessage });
resolve(result);
});
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the child process fails to spawn, Node.js can emit both the error and close events. Without a guard, this can result in onChildEnd being called twice and the promise being resolved twice, which may lead to duplicate span updates or errors in Langfuse.

Introducing a resolved flag ensures that the completion logic and callbacks are executed exactly once.

    let resolved = false;

    child.on("close", (code) => {
      if (resolved) return;
      resolved = true;
      const durationMs = Date.now() - startMs;
      if (code !== 0) {
        const errorMessage = `Error: child rlmx exited with code ${code}. ${stderr}`.trim();
        const result: RlmChildResult = { answer: errorMessage };
        options.logger?.childEnd({
          child_correlation_id: correlationId,
          child_run_id: null,
          input_tokens: 0,
          output_tokens: 0,
          cost: 0,
          llm_calls: 0,
          time_ms: durationMs,
          is_error: true,
          error_message: errorMessage,
        });
        options.onChildEnd?.({ spanId, result, durationMs, isError: true, errorMessage });
        resolve(result);
        return;
      }
      const result = parseRlmChildOutput(stdout);
      options.logger?.childEnd({
        child_correlation_id: correlationId,
        child_run_id: result.runId ?? null,
        input_tokens: result.usage?.inputTokens ?? 0,
        output_tokens: result.usage?.outputTokens ?? 0,
        cost: result.usage?.totalCost ?? 0,
        llm_calls: result.usage?.llmCalls ?? 0,
        time_ms: durationMs,
      });
      options.onChildEnd?.({ spanId, result, durationMs });
      resolve(result);
    });

    child.on("error", (err) => {
      if (resolved) return;
      resolved = true;
      const durationMs = Date.now() - startMs;
      const errorMessage = `Error: failed to spawn child rlmx: ${err.message}`;
      const result: RlmChildResult = { answer: errorMessage };
      options.logger?.childEnd({
        child_correlation_id: correlationId,
        child_run_id: null,
        input_tokens: 0,
        output_tokens: 0,
        cost: 0,
        llm_calls: 0,
        time_ms: durationMs,
        is_error: true,
        error_message: errorMessage,
      });
      options.onChildEnd?.({ spanId, result, durationMs, isError: true, errorMessage });
      resolve(result);
    });

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d6912c1ff

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/llm.ts
const batch = prompts.slice(i, i + MAX_CONCURRENT);
const batchResults = await Promise.all(
batch.map((p) => rlmQuery(p, cwd, signal))
batch.map((p) => rlmQuery(p, cwd, signal, options))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce shared budgets across batched children

When a REPL request uses rlm_query_batched, every prompt is spawned with the same options object, including the maxCost/maxTokens values that rlmLoop computed once before the request, and the parent budget is only updated after all batched children finish. In a batch of multiple recursive prompts, each child can therefore consume the full remaining global budget, so --max-cost or --max-tokens can be exceeded by the batch size before the parent notices. Please allocate/update remaining budget per child or stop scheduling additional children once the aggregate child usage reaches the parent limit.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/langfuse.ts (1)

120-147: 💤 Low value

Events are discarded before confirming successful ingestion.

Line 122 splices (removes) all events from the queue before the fetch completes. If the POST fails or times out, those events are permanently lost with no opportunity to retry or restore them.

Consider either:

  1. Splice after successful response, or
  2. Restore events to the queue on failure if retry is desired.

If the current "fire and forget on failure" behavior is intentional for bounded observability, a brief comment clarifying this design choice would help future readers.

♻️ Option: Restore events on failure
 async flush(): Promise<void> {
   if (!this.enabled || this.queue.length === 0) return;
   const batch = this.queue.splice(0, this.queue.length);
   const auth = Buffer.from(`${this.publicKey}:${this.secretKey}`).toString("base64");
   const abortController = new AbortController();
   const timeoutHandle = setTimeout(() => abortController.abort(), this.flushTimeoutMs);
   try {
     const res = await this.fetchImpl(`${this.host}/api/public/ingestion`, {
       method: "POST",
       headers: {
         "content-type": "application/json",
         authorization: `Basic ${auth}`,
       },
       body: JSON.stringify({ batch }),
       signal: abortController.signal,
     });
     if (!res.ok) {
+      this.queue.unshift(...batch); // Restore for potential retry
       throw new Error(`Langfuse ingestion failed: ${res.status} ${await res.text().catch(() => "")}`.trim());
     }
   } catch (err) {
+    this.queue.unshift(...batch); // Restore on any failure
     if (err instanceof Error && err.name === "AbortError") {
       throw new Error(`Langfuse ingestion timed out after ${this.flushTimeoutMs}ms`);
     }
     throw err;
   } finally {
     clearTimeout(timeoutHandle);
   }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/langfuse.ts` around lines 120 - 147, The flush() implementation currently
removes events from this.queue immediately (const batch =
this.queue.splice(...)) before the network request, which causes permanent loss
if the POST fails or times out; change the logic to only remove (splice) the
items after a successful response, or if you prefer to attempt the request then
restore on failure, keep a copy of the batch (e.g., const batch =
this.queue.slice(...)) and on error reinsert the events back into this.queue
(e.g., unshift/concat) so they can be retried; update the error/timeout handling
around fetchImpl and flushTimeoutMs accordingly, or if dropping on failure is
intentional add a clear comment in flush() explaining the design choice.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llm.ts`:
- Around line 464-468: The current logic adds an abort listener to the incoming
AbortSignal but doesn't check signal.aborted before spawning the child, allowing
the child to run if the signal was already aborted; update the rlmQuery (or
wherever the child is spawned) to first check if signal && signal.aborted and,
if so, call terminateChildTree (or otherwise short-circuit/throw) instead of
proceeding to spawn the child, otherwise continue to add the once listener as
now—this ensures pre-spawn aborts are handled.

---

Nitpick comments:
In `@src/langfuse.ts`:
- Around line 120-147: The flush() implementation currently removes events from
this.queue immediately (const batch = this.queue.splice(...)) before the network
request, which causes permanent loss if the POST fails or times out; change the
logic to only remove (splice) the items after a successful response, or if you
prefer to attempt the request then restore on failure, keep a copy of the batch
(e.g., const batch = this.queue.slice(...)) and on error reinsert the events
back into this.queue (e.g., unshift/concat) so they can be retried; update the
error/timeout handling around fetchImpl and flushTimeoutMs accordingly, or if
dropping on failure is intentional add a clear comment in flush() explaining the
design choice.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24e0e306-7622-4259-9fd4-ef9856e46cb0

📥 Commits

Reviewing files that changed from the base of the PR and between 02d97a6 and 4d6912c.

⛔ Files ignored due to path filters (15)
  • dist/src/index.d.ts is excluded by !**/dist/**
  • dist/src/index.js is excluded by !**/dist/**
  • dist/src/langfuse.d.ts is excluded by !**/dist/**
  • dist/src/langfuse.js is excluded by !**/dist/**
  • dist/src/llm.d.ts is excluded by !**/dist/**
  • dist/src/llm.js is excluded by !**/dist/**
  • dist/src/logger.d.ts is excluded by !**/dist/**
  • dist/src/logger.js is excluded by !**/dist/**
  • dist/src/output.d.ts is excluded by !**/dist/**
  • dist/src/output.js is excluded by !**/dist/**
  • dist/src/rlm.js is excluded by !**/dist/**
  • dist/src/version.d.ts is excluded by !**/dist/**
  • dist/src/version.js is excluded by !**/dist/**
  • dist/tests/recursive-trace.test.d.ts is excluded by !**/dist/**
  • dist/tests/recursive-trace.test.js is excluded by !**/dist/**
📒 Files selected for processing (9)
  • package.json
  • src/index.ts
  • src/langfuse.ts
  • src/llm.ts
  • src/logger.ts
  • src/output.ts
  • src/rlm.ts
  • src/version.ts
  • tests/recursive-trace.test.ts

Comment thread src/llm.ts
Comment on lines 464 to 468
if (signal) {
signal.addEventListener("abort", () => child.kill("SIGTERM"), {
signal.addEventListener("abort", terminateChildTree, {
once: true,
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing pre-spawn abort check allows child to run after parent aborts.

If the signal is already aborted when rlmQuery is called, the abort event listener won't fire (since the event already happened), and the child process will spawn and run to completion.

Proposed fix: check signal.aborted before spawning
     const spanId = options.onChildStart?.({ correlationId, prompt, depth });
     const startMs = Date.now();
+
+    if (signal?.aborted) {
+      const durationMs = Date.now() - startMs;
+      const errorMessage = "Error: child rlmx aborted before spawn";
+      const result: RlmChildResult = { answer: errorMessage };
+      options.logger?.childEnd({
+        child_correlation_id: correlationId,
+        child_run_id: null,
+        input_tokens: 0,
+        output_tokens: 0,
+        cost: 0,
+        llm_calls: 0,
+        time_ms: durationMs,
+        is_error: true,
+        error_message: errorMessage,
+      });
+      options.onChildEnd?.({ spanId, result, durationMs, isError: true, errorMessage });
+      resolve(result);
+      return;
+    }
+
     const child = spawn(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm.ts` around lines 464 - 468, The current logic adds an abort listener
to the incoming AbortSignal but doesn't check signal.aborted before spawning the
child, allowing the child to run if the signal was already aborted; update the
rlmQuery (or wherever the child is spawned) to first check if signal &&
signal.aborted and, if so, call terminateChildTree (or otherwise
short-circuit/throw) instead of proceeding to spawn the child, otherwise
continue to add the once listener as now—this ensures pre-spawn aborts are
handled.

@namastex888 namastex888 merged commit 865458a into main May 28, 2026
8 checks passed
@namastex888 namastex888 deleted the dev branch May 28, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant