fix: emit Langfuse generations for root RLM calls by namastex888 · Pull Request #99 · automagik-dev/rlmx

namastex888 · 2026-05-29T15:48:54Z

Summary

emit Langfuse GENERATION create/update events for root RLM model calls
cover both normal root iterations and forced final-answer calls
include model, input/output, usageDetails, costDetails, and latency metadata
add regression coverage in recursive trace tests

Operator impact

Operators no longer see root-only RLMX traces with observations=0 while local stats show real LLM work. Root model calls now appear in Langfuse as named generation observations:

Model call — root iteration N
Model call — forced final answer

Verification

npm run build
node --test dist/tests/recursive-trace.test.js
focused suite: 15 pass / 0 fail
npm run check
npm test — 376 pass / 0 fail
live Langfuse smoke:
- trace/run id: b5ebbd98-ad21-415e-8a66-86259cf7e9eb
- project path: /project/cmpprjzwq001yvn07marcb6sg/traces/b5ebbd98-ad21-415e-8a66-86259cf7e9eb
- generation observations: 2
- Langfuse total cost: $0.01321075
- model: anthropic/claude-opus-4-8

Notes

This PR intentionally only includes the root-generation trace hardening files and generated dist artifacts for those files.
The working tree has unrelated local changes from broader RLMX/University work that are not part of this PR.

Summary by CodeRabbit

New Features
- Added recording of root generation lifecycle events for LLM calls with model, input, output, usage, and error tracking.
- LLM calls are now instrumented with tracing integration.
Tests
- Added test coverage for root generation tracing events.

coderabbitai · 2026-05-29T15:49:08Z

📝 Walkthrough

Walkthrough

This PR extends Langfuse tracing support in rlmx by adding root generation lifecycle event recording. It defines data contracts for root generation start/end payloads, implements corresponding methods in LangfuseTraceRecorder, integrates tracing into the main RLM iteration loop and forced-final-answer path, and validates event emission with a new test.

Changes

Root Generation Tracing

Layer / File(s)	Summary
Root Generation Data Contract `src/langfuse.ts`	Exported `RootGenerationStartData` and `RootGenerationEndData` interfaces define the input payload structure for recording root generation lifecycle events, including model, input, iteration, output, duration, usage, and error metadata.
Root Generation Recorder Methods `src/langfuse.ts`	`rootGenerationStart()` and `rootGenerationEnd()` methods on `LangfuseTraceRecorder` generate IDs, enqueue `generation-create` and `generation-update` Langfuse events with timestamps, and compute token totals and cost aggregates from provided usage data.
Main Iteration Loop Tracing `src/rlm.ts`	Main RLM iteration loop wraps the `llmComplete()` call with `rootGenerationStart()` before execution and `rootGenerationEnd()` after, capturing model, message list, iteration number, response, duration, and usage stats.
Forced Final Answer Tracing `src/rlm.ts`	`forceFinalAnswer()` function signature extended to accept `langfuse` recorder and `iteration` parameter (default 0); the forced-final `llmComplete()` call is wrapped with root generation start/end using constructed messages and iteration metadata.
Root Generation Event Test `tests/recursive-trace.test.ts`	Test validates that `LangfuseTraceRecorder` emits correct `generation-create` and `generation-update` event payloads with trace/model linkage, input/output, token usage including cache read/write, and cost details.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

automagik-dev/rlmx#96: Extends Langfuse recursive tracing with child generation lifecycle methods, providing complementary parent trace and span instrumentation that works alongside this PR's root generation tracing.

Poem

🐰 A root generation blooms so bright,
Langfuse captures every flight,
Loop and forced path both now trace,
Events flowing into place,
Testing ensures the metrics gleam! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: emit Langfuse generations for root RLM calls' directly and specifically describes the main change: adding Langfuse generation event emissions for root RLM model calls.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/rlmx-root-generation-langfuse-observations

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d0c50fc11

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T15:50:45Z

      const llmStartMs = Date.now();
+      const generationId = langfuse.rootGenerationStart({
+        name: `Model call — root iteration ${iteration + 1}`,
+        input: messages,


Snapshot generation input before queueing

Passing the live messages array here means the queued generation-create event does not preserve the prompt that was actually sent for this LLM call. rootGenerationStart stores the object reference and the batch is JSON-serialized only later during langfuse.flush(), while the loop appends assistant/user messages and mutates the last user message for soft-limit nudges before that flush. In multi-iteration runs, earlier generation inputs will therefore show the final mutated conversation (often including that generation's own output) instead of the per-call input, corrupting the Langfuse trace.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request integrates Langfuse tracing for root generations in the RLM loop, capturing start and end events for both standard iterations and forced final answers, and adds corresponding test coverage. The reviewer suggests wrapping the forced final answer LLM call in a try...catch block to prevent generations from being left dangling in Langfuse if an error occurs. Additionally, they recommend adding a top-level cost field to the Langfuse payload to ensure proper cost tracking in the Langfuse UI.

gemini-code-assist · 2026-05-29T15:51:01Z

+  const generationId = langfuse?.rootGenerationStart({
+    name: "Model call — forced final answer",
+    input: forceMessages,
+    model: `${config.model.provider}/${config.model.model}`,
+    iteration,
+  });
+  const llmStartMs = Date.now();
  const response = await llmComplete(forceMessages, config.model, {
    signal,
    cacheConfig,
    thinkingLevel: config.gemini.thinkingLevel,
    outputSchema: config.output.schema,
    geminiConfig: config.gemini,
  });
+  if (generationId) {
+    langfuse?.rootGenerationEnd(generationId, {
+      output: response.text,
+      durationMs: Date.now() - llmStartMs,
+      usage: response.usage,
+    });
+  }


Similar to the main loop iteration, if llmComplete throws an error during the forced final answer call, the generation is left dangling in Langfuse. Wrapping this call in a try...catch block ensures the generation is correctly closed with an error status.

const generationId = langfuse?.rootGenerationStart({ name: "Model call — forced final answer", input: forceMessages, model: `${config.model.provider}/${config.model.model}`, iteration, }); const llmStartMs = Date.now(); let response: any; try { response = await llmComplete(forceMessages, config.model, { signal, cacheConfig, thinkingLevel: config.gemini.thinkingLevel, outputSchema: config.output.schema, geminiConfig: config.gemini, }); if (generationId) { langfuse?.rootGenerationEnd(generationId, { output: response.text, durationMs: Date.now() - llmStartMs, usage: response.usage, }); } } catch (err) { if (generationId) { langfuse?.rootGenerationEnd(generationId, { output: null, durationMs: Date.now() - llmStartMs, isError: true, errorMessage: err instanceof Error ? err.message : String(err), }); } throw err; }

gemini-code-assist · 2026-05-29T15:51:01Z

+      costDetails: {
+        total: data.usage?.totalCost ?? 0,
+      },


Langfuse uses the standard top-level cost field (a number) to track and display generation costs in the UI. Passing only costDetails might result in the cost not being displayed correctly in standard Langfuse cost columns. Adding cost at the top level of the payload ensures standard cost tracking works out of the box.

cost: data.usage?.totalCost ?? 0, costDetails: { total: data.usage?.totalCost ?? 0, },

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/rlm.ts (1)
435-447: ⚡ Quick win

Root generations are not closed/flushed on error or non-final exits.

flush() is only invoked inside finalize(). If llmComplete() throws, control jumps to the catch (Lines 680-700) which never calls langfuse.flush(); the empty-abort path (Lines 650-669) also skips it. In both cases the queued trace-create/generation-create events are silently dropped, so the very root generations this PR adds won't surface on timeouts/errors. Additionally, the isError/errorMessage fields on RootGenerationEndData are never populated, so a failed root call is never closed with an error status.

Consider wrapping the call in try/catch to emit rootGenerationEnd({ ..., isError: true, errorMessage }), and ensure flush() runs on all exit paths.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rlm.ts` around lines 435 - 447, Wrap the llmComplete(...) call in a
try/catch/finally so that on any error or early-abort you call
langfuse.rootGenerationEnd(generationId, {..., isError: true, errorMessage:
err.message}) before rethrowing/handling, and ensure langfuse.flush() is invoked
in the finally (or on all exit paths including the empty-abort branch) so queued
trace-create/generation-create events are flushed; update the non-error/normal
path to still call rootGenerationEnd(...) with usage/output and then flush, and
ensure the finalize() behavior is preserved or moved into the finally block to
always populate RootGenerationEndData and call flush().

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/rlm.ts`:
- Around line 429-434: The generation-create currently passes a live reference
(messages) to langfuse.rootGenerationStart so later serialization in flush()
captures the mutated final conversation instead of the per-iteration input; fix
by snapshotting messages when calling langfuse.rootGenerationStart (create a
deep copy of the messages array/objects — e.g., clone each message object or use
a safe deep-clone) and pass that snapshot as input; ensure this change is
applied where generationId is created (langfuse.rootGenerationStart) so
subsequent flush()/JSON.stringify(batch) serializes the immutable per-iteration
input rather than the live messages array.

---

Nitpick comments:
In `@src/rlm.ts`:
- Around line 435-447: Wrap the llmComplete(...) call in a try/catch/finally so
that on any error or early-abort you call
langfuse.rootGenerationEnd(generationId, {..., isError: true, errorMessage:
err.message}) before rethrowing/handling, and ensure langfuse.flush() is invoked
in the finally (or on all exit paths including the empty-abort branch) so queued
trace-create/generation-create events are flushed; update the non-error/normal
path to still call rootGenerationEnd(...) with usage/output and then flush, and
ensure the finalize() behavior is preserved or moved into the finally block to
always populate RootGenerationEndData and call flush().

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2124e5d3-13e7-47f5-a759-66d3faa3214f

📥 Commits

Reviewing files that changed from the base of the PR and between d18743f and 6d0c50f.

⛔ Files ignored due to path filters (4)

dist/src/langfuse.d.ts is excluded by !**/dist/**
dist/src/langfuse.js is excluded by !**/dist/**
dist/src/rlm.js is excluded by !**/dist/**
dist/tests/recursive-trace.test.js is excluded by !**/dist/**

📒 Files selected for processing (3)

src/langfuse.ts
src/rlm.ts
tests/recursive-trace.test.ts

coderabbitai · 2026-05-29T15:54:00Z

+      const generationId = langfuse.rootGenerationStart({
+        name: `Model call — root iteration ${iteration + 1}`,
+        input: messages,
+        model: `${config.model.provider}/${config.model.model}`,
+        iteration,
+      });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Snapshot messages when starting the root generation.

input: messages stores a reference to the live array. Events are only serialized later in flush() (JSON.stringify(batch)), but messages keeps getting mutated each iteration (messages.push(...) and the nudge lastMsg.content += nudge). As a result every generation-create will serialize the final full conversation as its input, not the input at that iteration — defeating the per-iteration capture.

🐛 Capture a snapshot at start time

const generationId = langfuse.rootGenerationStart({ name: `Model call — root iteration ${iteration + 1}`, - input: messages, + input: [...messages], model: `${config.model.provider}/${config.model.model}`, iteration, });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const generationId = langfuse.rootGenerationStart({

name: `Model call — root iteration ${iteration + 1}`,

input: messages,

model: `${config.model.provider}/${config.model.model}`,

iteration,

});

const generationId = langfuse.rootGenerationStart({

name: `Model call — root iteration ${iteration + 1}`,

input: [...messages],

model: `${config.model.provider}/${config.model.model}`,

iteration,

});

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/rlm.ts` around lines 429 - 434, The generation-create currently passes a live reference (messages) to langfuse.rootGenerationStart so later serialization in flush() captures the mutated final conversation instead of the per-iteration input; fix by snapshotting messages when calling langfuse.rootGenerationStart (create a deep copy of the messages array/objects — e.g., clone each message object or use a safe deep-clone) and pass that snapshot as input; ensure this change is applied where generationId is created (langfuse.rootGenerationStart) so subsequent flush()/JSON.stringify(batch) serializes the immutable per-iteration input rather than the live messages array.

Squash-merge by Drogo after all checks green. Follow-up to PR #99 root-generation Langfuse fix.

fix: emit Langfuse generations for root RLM calls

6d0c50f

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

namastex888 merged commit 21a55cf into main May 29, 2026
8 checks passed

namastex888 added a commit that referenced this pull request May 30, 2026

feat: harden model routes and expose CLI schema

0b6a0e2

Squash-merge by Drogo after all checks green. Follow-up to PR #99 root-generation Langfuse fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: emit Langfuse generations for root RLM calls#99

fix: emit Langfuse generations for root RLM calls#99
namastex888 merged 1 commit into
mainfrom
fix/rlmx-root-generation-langfuse-observations

namastex888 commented May 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

namastex888 commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Operator impact

Verification

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

namastex888 commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading