fix: emit Langfuse generations for root RLM calls#99
Conversation
📝 WalkthroughWalkthroughThis PR extends Langfuse tracing support in rlmx by adding root generation lifecycle event recording. It defines data contracts for root generation start/end payloads, implements corresponding methods in ChangesRoot Generation Tracing
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6d0c50fc11
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const llmStartMs = Date.now(); | ||
| const generationId = langfuse.rootGenerationStart({ | ||
| name: `Model call — root iteration ${iteration + 1}`, | ||
| input: messages, |
There was a problem hiding this comment.
Snapshot generation input before queueing
Passing the live messages array here means the queued generation-create event does not preserve the prompt that was actually sent for this LLM call. rootGenerationStart stores the object reference and the batch is JSON-serialized only later during langfuse.flush(), while the loop appends assistant/user messages and mutates the last user message for soft-limit nudges before that flush. In multi-iteration runs, earlier generation inputs will therefore show the final mutated conversation (often including that generation's own output) instead of the per-call input, corrupting the Langfuse trace.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request integrates Langfuse tracing for root generations in the RLM loop, capturing start and end events for both standard iterations and forced final answers, and adds corresponding test coverage. The reviewer suggests wrapping the forced final answer LLM call in a try...catch block to prevent generations from being left dangling in Langfuse if an error occurs. Additionally, they recommend adding a top-level cost field to the Langfuse payload to ensure proper cost tracking in the Langfuse UI.
| const generationId = langfuse?.rootGenerationStart({ | ||
| name: "Model call — forced final answer", | ||
| input: forceMessages, | ||
| model: `${config.model.provider}/${config.model.model}`, | ||
| iteration, | ||
| }); | ||
| const llmStartMs = Date.now(); | ||
| const response = await llmComplete(forceMessages, config.model, { | ||
| signal, | ||
| cacheConfig, | ||
| thinkingLevel: config.gemini.thinkingLevel, | ||
| outputSchema: config.output.schema, | ||
| geminiConfig: config.gemini, | ||
| }); | ||
| if (generationId) { | ||
| langfuse?.rootGenerationEnd(generationId, { | ||
| output: response.text, | ||
| durationMs: Date.now() - llmStartMs, | ||
| usage: response.usage, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Similar to the main loop iteration, if llmComplete throws an error during the forced final answer call, the generation is left dangling in Langfuse. Wrapping this call in a try...catch block ensures the generation is correctly closed with an error status.
const generationId = langfuse?.rootGenerationStart({
name: "Model call — forced final answer",
input: forceMessages,
model: `${config.model.provider}/${config.model.model}`,
iteration,
});
const llmStartMs = Date.now();
let response: any;
try {
response = await llmComplete(forceMessages, config.model, {
signal,
cacheConfig,
thinkingLevel: config.gemini.thinkingLevel,
outputSchema: config.output.schema,
geminiConfig: config.gemini,
});
if (generationId) {
langfuse?.rootGenerationEnd(generationId, {
output: response.text,
durationMs: Date.now() - llmStartMs,
usage: response.usage,
});
}
} catch (err) {
if (generationId) {
langfuse?.rootGenerationEnd(generationId, {
output: null,
durationMs: Date.now() - llmStartMs,
isError: true,
errorMessage: err instanceof Error ? err.message : String(err),
});
}
throw err;
}| costDetails: { | ||
| total: data.usage?.totalCost ?? 0, | ||
| }, |
There was a problem hiding this comment.
Langfuse uses the standard top-level cost field (a number) to track and display generation costs in the UI. Passing only costDetails might result in the cost not being displayed correctly in standard Langfuse cost columns. Adding cost at the top level of the payload ensures standard cost tracking works out of the box.
cost: data.usage?.totalCost ?? 0,
costDetails: {
total: data.usage?.totalCost ?? 0,
},There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/rlm.ts (1)
435-447: ⚡ Quick winRoot generations are not closed/flushed on error or non-final exits.
flush()is only invoked insidefinalize(). IfllmComplete()throws, control jumps to thecatch(Lines 680-700) which never callslangfuse.flush(); the empty-abort path (Lines 650-669) also skips it. In both cases the queuedtrace-create/generation-createevents are silently dropped, so the very root generations this PR adds won't surface on timeouts/errors. Additionally, theisError/errorMessagefields onRootGenerationEndDataare never populated, so a failed root call is never closed with an error status.Consider wrapping the call in try/catch to emit
rootGenerationEnd({ ..., isError: true, errorMessage }), and ensureflush()runs on all exit paths.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/rlm.ts` around lines 435 - 447, Wrap the llmComplete(...) call in a try/catch/finally so that on any error or early-abort you call langfuse.rootGenerationEnd(generationId, {..., isError: true, errorMessage: err.message}) before rethrowing/handling, and ensure langfuse.flush() is invoked in the finally (or on all exit paths including the empty-abort branch) so queued trace-create/generation-create events are flushed; update the non-error/normal path to still call rootGenerationEnd(...) with usage/output and then flush, and ensure the finalize() behavior is preserved or moved into the finally block to always populate RootGenerationEndData and call flush().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/rlm.ts`:
- Around line 429-434: The generation-create currently passes a live reference
(messages) to langfuse.rootGenerationStart so later serialization in flush()
captures the mutated final conversation instead of the per-iteration input; fix
by snapshotting messages when calling langfuse.rootGenerationStart (create a
deep copy of the messages array/objects — e.g., clone each message object or use
a safe deep-clone) and pass that snapshot as input; ensure this change is
applied where generationId is created (langfuse.rootGenerationStart) so
subsequent flush()/JSON.stringify(batch) serializes the immutable per-iteration
input rather than the live messages array.
---
Nitpick comments:
In `@src/rlm.ts`:
- Around line 435-447: Wrap the llmComplete(...) call in a try/catch/finally so
that on any error or early-abort you call
langfuse.rootGenerationEnd(generationId, {..., isError: true, errorMessage:
err.message}) before rethrowing/handling, and ensure langfuse.flush() is invoked
in the finally (or on all exit paths including the empty-abort branch) so queued
trace-create/generation-create events are flushed; update the non-error/normal
path to still call rootGenerationEnd(...) with usage/output and then flush, and
ensure the finalize() behavior is preserved or moved into the finally block to
always populate RootGenerationEndData and call flush().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2124e5d3-13e7-47f5-a759-66d3faa3214f
⛔ Files ignored due to path filters (4)
dist/src/langfuse.d.tsis excluded by!**/dist/**dist/src/langfuse.jsis excluded by!**/dist/**dist/src/rlm.jsis excluded by!**/dist/**dist/tests/recursive-trace.test.jsis excluded by!**/dist/**
📒 Files selected for processing (3)
src/langfuse.tssrc/rlm.tstests/recursive-trace.test.ts
| const generationId = langfuse.rootGenerationStart({ | ||
| name: `Model call — root iteration ${iteration + 1}`, | ||
| input: messages, | ||
| model: `${config.model.provider}/${config.model.model}`, | ||
| iteration, | ||
| }); |
There was a problem hiding this comment.
Snapshot messages when starting the root generation.
input: messages stores a reference to the live array. Events are only serialized later in flush() (JSON.stringify(batch)), but messages keeps getting mutated each iteration (messages.push(...) and the nudge lastMsg.content += nudge). As a result every generation-create will serialize the final full conversation as its input, not the input at that iteration — defeating the per-iteration capture.
🐛 Capture a snapshot at start time
const generationId = langfuse.rootGenerationStart({
name: `Model call — root iteration ${iteration + 1}`,
- input: messages,
+ input: [...messages],
model: `${config.model.provider}/${config.model.model}`,
iteration,
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const generationId = langfuse.rootGenerationStart({ | |
| name: `Model call — root iteration ${iteration + 1}`, | |
| input: messages, | |
| model: `${config.model.provider}/${config.model.model}`, | |
| iteration, | |
| }); | |
| const generationId = langfuse.rootGenerationStart({ | |
| name: `Model call — root iteration ${iteration + 1}`, | |
| input: [...messages], | |
| model: `${config.model.provider}/${config.model.model}`, | |
| iteration, | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/rlm.ts` around lines 429 - 434, The generation-create currently passes a
live reference (messages) to langfuse.rootGenerationStart so later serialization
in flush() captures the mutated final conversation instead of the per-iteration
input; fix by snapshotting messages when calling langfuse.rootGenerationStart
(create a deep copy of the messages array/objects — e.g., clone each message
object or use a safe deep-clone) and pass that snapshot as input; ensure this
change is applied where generationId is created (langfuse.rootGenerationStart)
so subsequent flush()/JSON.stringify(batch) serializes the immutable
per-iteration input rather than the live messages array.
Squash-merge by Drogo after all checks green. Follow-up to PR #99 root-generation Langfuse fix.
Summary
GENERATIONcreate/update events for root RLM model callsOperator impact
Operators no longer see root-only RLMX traces with
observations=0while local stats show real LLM work. Root model calls now appear in Langfuse as named generation observations:Model call — root iteration NModel call — forced final answerVerification
npm run buildnode --test dist/tests/recursive-trace.test.jsnpm run checknpm test— 376 pass / 0 failb5ebbd98-ad21-415e-8a66-86259cf7e9eb/project/cmpprjzwq001yvn07marcb6sg/traces/b5ebbd98-ad21-415e-8a66-86259cf7e9eb$0.01321075anthropic/claude-opus-4-8Notes
Summary by CodeRabbit
New Features
Tests