Skip to content

fix: reconcile Auto Run summary with cumulative session stats#735

Open
pedramamini wants to merge 3 commits intomainfrom
734-cumulative-autorun-summary-stats
Open

fix: reconcile Auto Run summary with cumulative session stats#735
pedramamini wants to merge 3 commits intomainfrom
734-cumulative-autorun-summary-stats

Conversation

@pedramamini
Copy link
Copy Markdown
Collaborator

@pedramamini pedramamini commented Apr 6, 2026

Summary

  • Auto Run summary now reconciles in-memory counters with persisted history entries, ensuring accurate cumulative stats (tasks, tokens, cost, duration) even when sessions span app restarts
  • Applied the fix to both the renderer (useBatchProcessor.ts) and CLI (batch-processor.ts) codepaths

Closes #734

Root Cause

In-memory cumulative counters (totalCompletedTasks, totalInputTokens, totalOutputTokens, totalCost) are initialized fresh each time startBatchRun is called. For long-running sessions that survive app restarts, manual stop/restart cycles, or process kills, the final "Auto Run stopped" summary only reflected stats from the most recent invocation — while all individual task history entries persisted on disk across restarts.

Fix

Before generating the final summary, read all persisted history entries for the session and compute cumulative totals from task entries (filtering out loop/session summary entries). Use Math.max(inMemory, historyDerived) to ensure the summary is accurate regardless of restart history. Falls back gracefully to in-memory counters if history read fails.

Test plan

  • Run an Auto Run session, let it complete naturally — summary should match in-memory counters (no regression)
  • Start an Auto Run, restart the app mid-session, resume the Auto Run, then stop — summary should reflect tasks from all invocations
  • Verify the summary entry in history shows correct cumulative tasks, tokens, cost, and duration
  • Run npm run lint and npm test — all pass

Summary by CodeRabbit

  • Documentation

    • Reformatted release notes for improved readability
    • Updated multi-phase project organization guidelines to use simplified flat directory structure
  • Bug Fixes

    • Improved accuracy of task metrics by reconciling in-memory counters with persisted history records

- Standardized multi-phase auto-run docs into one flat, dated subdirectory 📁
- Explicitly banned nested project/feature folder structures for phase outputs 🚫
- Improved guidance for clean lexicographic sorting with zero-padded phases 🔢
- Made it easier to add entire effort folders to auto-run at once ➕
- Clarified organization rules so related phase documents stay tightly grouped 🧭
…734)

In-memory cumulative counters (tasks, tokens, cost, duration) reset when
startBatchRun is re-invoked (e.g. after app restart), but individual task
history entries persist on disk. For long-running sessions spanning
restarts, the final summary only reflected the last invocation's stats.

Now, before generating the summary, we read all persisted history entries
for the session and compute cumulative totals. The summary uses the max
of in-memory counters vs history-derived totals, ensuring accuracy
regardless of how many times the batch processor was restarted.

Fixes both the renderer (useBatchProcessor) and CLI (batch-processor)
codepaths.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

This pull request adds cumulative session statistics reconciliation to Auto Run summaries by reading persisted history entries from disk and cross-validating in-memory counters. It also includes documentation formatting updates and a clarification on multi-phase effort directory structure.

Changes

Cohort / File(s) Summary
Documentation Formatting & Structure
docs/releases.md, src/prompts/maestro-system-prompt.md
Normalized line endings and heading structure in release notes; clarified multi-phase effort directory organization to use flat subdirectory structure rather than nested nesting.
Auto-Run History Reconciliation
src/cli/services/batch-processor.ts, src/renderer/hooks/batch/useBatchProcessor.ts
Added logic to read persisted AUTO history entries, filter non-task summaries, and reconcile cumulative session totals (tasks, tokens, cost, elapsed time) against in-memory counters before emitting final auto-run summaries.

Sequence Diagram

sequenceDiagram
    participant Client as Client/Renderer
    participant Processor as Batch Processor
    participant History as History Persistence
    participant Summary as Summary Generator
    
    Client->>Processor: Stop auto-run session
    Processor->>History: Read all AUTO entries for session.id
    History-->>Processor: Return history entries
    Processor->>Processor: Filter non-task entries (exclude Loop/PR/summary types)
    Processor->>Processor: Aggregate tokens & elapsed from history
    alt History task count > in-memory count
        Processor->>Processor: Update totals using Math.max(history, in-memory)
    else
        Processor->>Processor: Keep in-memory totals
    end
    Processor->>Summary: Compose final summary with reconciled totals
    Summary-->>Client: Emit "Auto Run completed" with cumulative stats
    Note over Client,Processor: Accurate session stats across all loop iterations
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 Hops through history with glee,
Reading every entry carefully,
No more lost loops or truncated tales—
Cumulative counts now never fail!
Session stats at last ring true, 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning All changes are scoped to the issue objectives except for a documentation formatting update in docs/releases.md (line normalization), which appears to be an unrelated maintenance change bundled into this PR. Remove the docs/releases.md formatting changes or move them to a separate documentation-focused PR to keep scope focused on the functional fix.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main fix: reconciling Auto Run summary with cumulative session stats to address issue #734.
Linked Issues check ✅ Passed The code changes fully implement the objectives from issue #734: reconciling persisted history entries with in-memory totals across both renderer and CLI codepaths, filtering non-task summaries, and using Math.max to ensure cumulative stats reflect the whole session.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 734-cumulative-autorun-summary-stats

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 6, 2026

Greptile Summary

This PR fixes a real bug where Auto Run summary stats were reset to zero on app restart by reconciling in-memory counters against persisted history entries before writing the final summary. The fix is applied to both the renderer (useBatchProcessor.ts) and CLI (batch-processor.ts) codepaths, and the approach is sound for the restart scenario.

However, the reconciliation reads all history entries for the session without scoping them to the current Auto Run invocation by timestamp. This creates a regression for the more common case where a user runs Auto Run more than once on the same agent session: every subsequent run will absorb all prior task entries and report inflated task counts, token usage, and cost in its summary.

  • src/renderer/hooks/batch/useBatchProcessor.ts — New try/catch block calls window.maestro.history.getAll(session.cwd, sessionId) and applies Math.max reconciliation. Works correctly for the restart scenario but over-counts on repeat runs of the same agent session.
  • src/cli/services/batch-processor.ts — Identical logic via readHistory(undefined, session.id). Same flaw; both paths must be kept in sync.
  • docs/releases.md / src/prompts/maestro-system-prompt.md — Formatting and minor prose changes only; no functional impact.
  • Both catch blocks are empty, swallowing unexpected errors without logging — this conflicts with the project's Sentry-first error-handling guidelines in CLAUDE.md.

Confidence Score: 2/5

Not safe to merge as-is — the reconciliation logic will produce inflated Auto Run summaries for any session that has more than one completed Auto Run invocation

The intended fix (handling app-restart continuity) is correct in principle, but the implementation has a clear logic bug that regresses the more common case of running Auto Run multiple times on the same agent. The missing timestamp boundary means all prior task history is summed into every new summary. The fix is also duplicated across two code paths that must stay in sync.

src/renderer/hooks/batch/useBatchProcessor.ts and src/cli/services/batch-processor.ts — specifically the history filter predicate that lacks a batchStartTime lower bound

Important Files Changed

Filename Overview
src/renderer/hooks/batch/useBatchProcessor.ts Adds history-based stat reconciliation before the Auto Run final summary, but the unbounded history query (no timestamp fence) will overcount tasks/tokens/cost on any repeat run of the same agent session
src/cli/services/batch-processor.ts Mirrors the renderer reconciliation logic with the identical over-counting flaw — readHistory fetches all session entries with no timestamp boundary
docs/releases.md Whitespace and list-formatting adjustments to release notes only; no functional changes
src/prompts/maestro-system-prompt.md Minor prose update to the system prompt; no functional impact

Sequence Diagram

sequenceDiagram
    participant User
    participant BatchProcessor
    participant HistoryStore

    User->>BatchProcessor: Stop / complete Auto Run
    BatchProcessor->>HistoryStore: getAll(cwd, sessionId)
    HistoryStore-->>BatchProcessor: ALL entries for session (no timestamp bound)
    BatchProcessor->>BatchProcessor: filter to AUTO task entries
    Note over BatchProcessor: ⚠️ Includes entries from prior completed runs
    alt taskEntries.length > totalCompletedTasks
        BatchProcessor->>BatchProcessor: sum tokens/cost from ALL task entries
        BatchProcessor->>BatchProcessor: Math.max(inMemory, historyDerived)
        Note over BatchProcessor: Correct for restart scenario<br/>Overcounts for repeat-run scenario
    else
        BatchProcessor->>BatchProcessor: Use in-memory counters unchanged
    end
    BatchProcessor->>HistoryStore: addHistoryEntry(finalSummary)
    BatchProcessor-->>User: Show Auto Run summary
Loading

Reviews (1): Last reviewed commit: "style: format docs/releases.md with pret..." | Re-trigger Greptile

Comment on lines +1611 to +1622
const allEntries = await window.maestro.history.getAll(session.cwd, sessionId);
if (Array.isArray(allEntries) && allEntries.length > 0) {
// Filter to individual task entries (exclude loop/session summaries)
const taskEntries = allEntries.filter(
(e) =>
e.type === 'AUTO' &&
e.summary &&
!e.summary.startsWith('Loop ') &&
!e.summary.startsWith('Auto Run ') &&
!e.summary.startsWith('PR created') &&
!e.summary.startsWith('PR creation failed')
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 History entries not scoped to the current Auto Run invocation

window.maestro.history.getAll(session.cwd, sessionId) returns all history entries ever written for this session — including entries from previous, fully-completed Auto Run sessions on the same agent. The reconciliation guard taskEntries.length > totalCompletedTasks (line 1624) fires at the start of every second run on the same agent because the old entries are still present in history.

Concrete failure scenario:

  • Run Internal Logging #1 on agent A completes normally → 30 task entries written to history.
  • Run Add Claude Code GitHub Workflow #2 on agent A completes 20 tasks.
  • At summary time: taskEntries.length = 50 (30 old + 20 new), totalCompletedTasks = 20.
  • Guard fires → summary reports 50 tasks / all tokens / all cost from both runs instead of the correct 20.

The same issue exists in src/cli/services/batch-processor.ts around line 309.

Fix: add a batchStartTime lower-bound to the filter so only entries that pre-date the current invocation are pulled in for reconciliation, while entries written during this invocation are already captured by in-memory counters:

const taskEntries = allEntries.filter(
    (e) =>
        e.type === 'AUTO' &&
        e.summary &&
        !e.summary.startsWith('Loop ') &&
        !e.summary.startsWith('Auto Run ') &&
        !e.summary.startsWith('PR created') &&
        !e.summary.startsWith('PR creation failed') &&
        e.timestamp < batchStartTime  // only include pre-restart entries
);
// In-memory totalCompletedTasks already covers the current invocation;
// add taskEntries.length for prior-restart invocations:
finalTotalTasks = totalCompletedTasks + taskEntries.length;

Comment on lines +307 to +319
const allEntries = readHistory(undefined, session.id);
if (allEntries.length > 0) {
const taskEntries = allEntries.filter(
(e) =>
e.type === 'AUTO' &&
e.summary &&
!e.summary.startsWith('Loop ') &&
!e.summary.startsWith('Auto Run ') &&
!e.summary.startsWith('PR created') &&
!e.summary.startsWith('PR creation failed')
);

if (taskEntries.length > finalTotalTasks) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Same over-counting issue as in the renderer path

readHistory(undefined, session.id) returns all history for the session with no time boundary. After a successfully-completed Auto Run, launching a second batch on the same session will trigger the reconciliation guard (old entries outnumber the new in-memory tasks) and inflate the summary stats.

Apply the same e.timestamp < batchStartTime filter described in the renderer comment:

const taskEntries = allEntries.filter(
    (e) =>
        e.type === 'AUTO' &&
        e.summary &&
        !e.summary.startsWith('Loop ') &&
        !e.summary.startsWith('Auto Run ') &&
        !e.summary.startsWith('PR created') &&
        !e.summary.startsWith('PR creation failed') &&
        e.timestamp < batchStartTime  // scope to prior-restart entries only
);

Comment on lines +1648 to +1650
} catch {
// Fall back to in-memory counters if history read fails
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent catch block swallows unexpected errors

An empty catch {} silences anything beyond the expected "history not found" case — e.g., a malformed IPC response or a type assertion failure. Per the project's guidelines in CLAUDE.md, unexpected errors should be logged and surfaced to Sentry rather than disappearing silently.

Suggested change
} catch {
// Fall back to in-memory counters if history read fails
}
} catch (err) {
// Fall back to in-memory counters if history read fails
console.warn('[BatchProcessor] History reconciliation failed, using in-memory counters', err);
}

Comment on lines +341 to +343
} catch {
// Fall back to in-memory counters if history read fails
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent catch hides reconciliation failures in the CLI path

Same pattern as the renderer: unexpected errors are swallowed without any trace. At a minimum, log with the project's logger so failures are visible in debug output:

Suggested change
} catch {
// Fall back to in-memory counters if history read fails
}
} catch (err) {
// Fall back to in-memory counters if history read fails
logger.warn('History reconciliation failed, using in-memory counters', String(err));
}

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
docs/releases.md (2)

20-20: Fix heading hierarchy regression (# used where nested headings are expected).

These headings are nested under release sections but were changed to top-level #, which can break document outline/TOC structure. Please demote them to ### (or ## where appropriate) consistently.

Proposed markdown fix
-# Major 0.15.x Additions
+### Major 0.15.x Additions

-# Smaller Changes in 014.x
+### Smaller Changes in 0.14.x

-# Other Changes
+### Other Changes

-# Onboarding, Wizard, and Tours
+### Onboarding, Wizard, and Tours

-# UI / UX Enhancements
+### UI / UX Enhancements

-# Auto Run Workflow Improvements
+### Auto Run Workflow Improvements

-# Application Behavior / Core Fixes
+### Application Behavior / Core Fixes

-# Update System
+### Update System

Also applies to: 80-80, 176-176, 268-268, 276-276, 286-286, 291-291, 297-297

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/releases.md` at line 20, Several headings in docs/releases.md (e.g., the
"Major 0.15.x Additions" heading and the other occurrences noted at lines 80,
176, 268, 276, 286, 291, 297) were mistakenly rendered as top-level `#`
headings; change those `#` headings to the appropriate nested level (use `###`
for subsection items or `##` where they are direct children of a release
section) so the document outline/TOC hierarchy is restored—scan for headings
using a single `#` in that file and demote them consistently to `###` (or `##`
when it should be a direct child of a release header).

439-448: Use a single bullet style for Markdown list consistency.

These sections use while surrounding lists use -. Standardizing improves renderer compatibility and keeps formatting consistent.

Proposed markdown fix
-• Enhanced mobile web interface with session sync and history panel 📱
+- Enhanced mobile web interface with session sync and history panel 📱
...
-• Added splash screen with logo and progress bar during startup 🎨
+- Added splash screen with logo and progress bar during startup 🎨

-• Added template variables for dynamic AI command customization 🎯
+- Added template variables for dynamic AI command customization 🎯
...
-• Updated documentation with new features and template references 📖
+- Updated documentation with new features and template references 📖

Also applies to: 462-471

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/releases.md` around lines 439 - 448, The Markdown release notes use
mixed bullet characters (the diff shows a bullet list starting with "Enhanced
mobile web interface with session sync and history panel 📱" that uses • while
other lists use -); standardize all lists to a single bullet style by replacing
every occurrence of the • bullets in this section (and the similar block around
"Also applies to: 462-471") with the hyphen (-) form so all Markdown lists use
the same dash-style bullets.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/releases.md`:
- Line 74: Fix the user-facing typo in the docs string "🗄️ Document Graphs.
Launch from file preview or from the FIle tree panel." by changing the token
"FIle" to "File" so the sentence reads "Launch from file preview or from the
File tree panel."; edit the string in docs/releases.md accordingly.

In `@src/cli/services/batch-processor.ts`:
- Around line 298-338: The reconciled totals (finalTotalTasks,
finalTotalInputTokens, finalTotalOutputTokens, finalTotalCost,
finalTotalElapsedMs) are computed but never used when emitting the terminal
"complete" event, so update the generator/emit logic that currently sends
totalTasksCompleted, totalElapsedMs, and totalCost to use the corresponding
finalTotal* variables instead; locate the code that emits the terminal
complete/summary event (the generator that emits the final CLI/JSONL complete
event) and replace references to the in-memory counters (totalCompletedTasks,
totalInputTokens, totalOutputTokens, totalCost, elapsedMs) with finalTotalTasks,
finalTotalInputTokens, finalTotalOutputTokens, finalTotalCost,
finalTotalElapsedMs so resumed runs report the reconciled totals.
- Around line 341-343: Replace the empty catch after readHistory() so the
exception is not swallowed: change to catch (err) { import and call the Sentry
helper captureException (or captureMessage) from src/utils/sentry.ts with
contextual data including session.id (e.g., captureException(err, { extra: {
sessionId: session.id } })), also emit a concise processLogger.warn/error that
includes session.id and the error message; if you can identify
expected/recoverable errors (e.g., err.code === 'NETWORK_ERROR') allow the
fallback to in-memory totals, otherwise re-throw the error so it bubbles to
Sentry/upper layers. }

In `@src/prompts/maestro-system-prompt.md`:
- Line 48: Update the multi-phase effort guidance in the four system prompt
files (wizard-inline-system.md, wizard-inline-new.md,
wizard-inline-iterate-generation.md, and wizard-document-generation.md) to
exactly match the new directive in maestro-system-prompt.md: state that
multi-phase efforts with 3+ phase documents should be placed in a single flat
subdirectory directly under {{AUTORUN_FOLDER}} prefixed with today's date
(example format YYYY-MM-DD-Feature-Name/FEATURE-NAME-01.md) and explicitly
prohibit nested subdirectories (e.g., “Do NOT create nested subdirectories — all
phase documents for a given effort go into one folder, never project/feature/
nesting”); replace the outdated wording in the existing multi-phase guidance
blocks with this precise phrasing so all prompts are consistent.

In `@src/renderer/hooks/batch/useBatchProcessor.ts`:
- Around line 1648-1650: When calling window.maestro.history.getAll() in
useBatchProcessor (in src/renderer/hooks/batch/useBatchProcessor.ts) don't
silently swallow exceptions: catch the error, call captureException(error, {
sessionId, cwd: session.cwd }) from the Sentry utilities (import from
src/utils/sentry.ts) to log context, then handle expected recoverable errors
explicitly or re-throw unexpected ones so Sentry can capture them; ensure the
fallback to in-memory counters only occurs after logging/explicit handling.
- Around line 1613-1645: The current filter that builds taskEntries incorrectly
includes "Document stalled:" AUTO entries; update the predicate used when
creating taskEntries in useBatchProcessor.ts to also exclude summaries that
start with "Document stalled:" (e.g., add && !e.summary.startsWith('Document
stalled:')). To keep CLI and renderer logic aligned, extract this predicate into
a shared exported helper (e.g., isIndividualTaskEntry or isAutoTaskEntry) and
import/use that helper both in useBatchProcessor.ts (where taskEntries is
computed) and in src/cli/services/batch-processor.ts so both paths share the
same exclusion rules.

---

Nitpick comments:
In `@docs/releases.md`:
- Line 20: Several headings in docs/releases.md (e.g., the "Major 0.15.x
Additions" heading and the other occurrences noted at lines 80, 176, 268, 276,
286, 291, 297) were mistakenly rendered as top-level `#` headings; change those
`#` headings to the appropriate nested level (use `###` for subsection items or
`##` where they are direct children of a release section) so the document
outline/TOC hierarchy is restored—scan for headings using a single `#` in that
file and demote them consistently to `###` (or `##` when it should be a direct
child of a release header).
- Around line 439-448: The Markdown release notes use mixed bullet characters
(the diff shows a bullet list starting with "Enhanced mobile web interface with
session sync and history panel 📱" that uses • while other lists use -);
standardize all lists to a single bullet style by replacing every occurrence of
the • bullets in this section (and the similar block around "Also applies to:
462-471") with the hyphen (-) form so all Markdown lists use the same dash-style
bullets.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7545d087-870e-4255-a85b-ad19eb6a650a

📥 Commits

Reviewing files that changed from the base of the PR and between 3e0e338 and 0ea6368.

📒 Files selected for processing (4)
  • docs/releases.md
  • src/cli/services/batch-processor.ts
  • src/prompts/maestro-system-prompt.md
  • src/renderer/hooks/batch/useBatchProcessor.ts


The major contributions to 0.14.x remain:

🗄️ Document Graphs. Launch from file preview or from the FIle tree panel. Explore relationships between Markdown documents that contain links between documents and to URLs.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Correct typo in user-facing text (FIleFile).

Small but visible docs typo in “File tree panel”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/releases.md` at line 74, Fix the user-facing typo in the docs string
"🗄️ Document Graphs. Launch from file preview or from the FIle tree panel." by
changing the token "FIle" to "File" so the sentence reads "Launch from file
preview or from the File tree panel."; edit the string in docs/releases.md
accordingly.

Comment on lines +298 to +338
let finalTotalTasks = totalCompletedTasks;
let finalTotalInputTokens = totalInputTokens;
let finalTotalOutputTokens = totalOutputTokens;
let finalTotalCost = totalCost;
let finalTotalElapsedMs = Date.now() - batchStartTime;

// Reconcile in-memory counters with persisted history entries.
// In-memory counters reset on process restart, but history entries persist on disk.
try {
const allEntries = readHistory(undefined, session.id);
if (allEntries.length > 0) {
const taskEntries = allEntries.filter(
(e) =>
e.type === 'AUTO' &&
e.summary &&
!e.summary.startsWith('Loop ') &&
!e.summary.startsWith('Auto Run ') &&
!e.summary.startsWith('PR created') &&
!e.summary.startsWith('PR creation failed')
);

if (taskEntries.length > finalTotalTasks) {
let historyInputTokens = 0;
let historyOutputTokens = 0;
let historyCost = 0;
let historyElapsedMs = 0;

for (const entry of taskEntries) {
if (entry.usageStats) {
historyInputTokens += entry.usageStats.inputTokens || 0;
historyOutputTokens += entry.usageStats.outputTokens || 0;
historyCost += entry.usageStats.totalCostUsd || 0;
}
historyElapsedMs += entry.elapsedTimeMs || 0;
}

finalTotalTasks = Math.max(finalTotalTasks, taskEntries.length);
finalTotalInputTokens = Math.max(finalTotalInputTokens, historyInputTokens);
finalTotalOutputTokens = Math.max(finalTotalOutputTokens, historyOutputTokens);
finalTotalCost = Math.max(finalTotalCost, historyCost);
finalTotalElapsedMs = Math.max(finalTotalElapsedMs, historyElapsedMs);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reuse the reconciled totals for the terminal complete event.

finalTotal* never leave this helper, so the generator still emits totalTasksCompleted, totalElapsedMs, and totalCost from the current process only. After a resume, CLI/JSONL consumers will still see underreported totals even though the persisted summary entry is corrected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/services/batch-processor.ts` around lines 298 - 338, The reconciled
totals (finalTotalTasks, finalTotalInputTokens, finalTotalOutputTokens,
finalTotalCost, finalTotalElapsedMs) are computed but never used when emitting
the terminal "complete" event, so update the generator/emit logic that currently
sends totalTasksCompleted, totalElapsedMs, and totalCost to use the
corresponding finalTotal* variables instead; locate the code that emits the
terminal complete/summary event (the generator that emits the final CLI/JSONL
complete event) and replace references to the in-memory counters
(totalCompletedTasks, totalInputTokens, totalOutputTokens, totalCost, elapsedMs)
with finalTotalTasks, finalTotalInputTokens, finalTotalOutputTokens,
finalTotalCost, finalTotalElapsedMs so resumed runs report the reconciled
totals.

Comment on lines +341 to +343
} catch {
// Fall back to in-memory counters if history read fails
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently swallow readHistory() failures.

Falling back to in-memory totals is fine, but the empty catch hides the exact failure mode this reconciliation depends on. Please capture/log the exception with session.id before continuing.

As per coding guidelines, "Do not silently swallow errors. Let unhandled exceptions bubble up to Sentry for error tracking in production. Handle expected/recoverable errors explicitly (e.g., NETWORK_ERROR). For unexpected errors, re-throw them to allow Sentry to capture them. Use Sentry utilities (captureException, captureMessage) from src/utils/sentry.ts for explicit error reporting with context."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/services/batch-processor.ts` around lines 341 - 343, Replace the
empty catch after readHistory() so the exception is not swallowed: change to
catch (err) { import and call the Sentry helper captureException (or
captureMessage) from src/utils/sentry.ts with contextual data including
session.id (e.g., captureException(err, { extra: { sessionId: session.id } })),
also emit a concise processLogger.warn/error that includes session.id and the
error message; if you can identify expected/recoverable errors (e.g., err.code
=== 'NETWORK_ERROR') allow the fallback to in-memory totals, otherwise re-throw
the error so it bubbles to Sentry/upper layers. }

When a user wants an auto-run document (or playbook), create a detailed multi-document, multi-point Markdown implementation plan in the `{{AUTORUN_FOLDER}}` folder. Use the format `$PREFIX-XX.md`, where `XX` is the two-digit phase number (01, 02, etc.) and `$PREFIX` is the effort name. Always zero-pad phase numbers to ensure correct lexicographic sorting. Break phases by relevant context; do not mix unrelated task results in the same document. If working within a file, group and fix all type issues in that file together. If working with an MCP, keep all related tasks in the same document. Each task must be written as `- [ ] ...` so auto-run can execute and check them off with comments on completion.

**Multi-phase efforts:** When creating 3 or more phase documents for a single effort, place them in a dedicated subdirectory prefixed with today's date (e.g., `{{AUTORUN_FOLDER}}/YYYY-MM-DD-Feature-Name/FEATURE-NAME-01.md`). This allows users to add the entire folder at once and keeps related documents organized with a clear creation date.
**Multi-phase efforts:** When creating 3 or more phase documents for a single effort, place them in a single flat subdirectory directly under `{{AUTORUN_FOLDER}}`, prefixed with today's date (e.g., `{{AUTORUN_FOLDER}}/YYYY-MM-DD-Feature-Name/FEATURE-NAME-01.md`). Do NOT create nested subdirectories — all phase documents for a given effort go into one folder, never `project/feature/` nesting. This allows users to add the entire folder at once and keeps related documents organized with a clear creation date.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Find all files with multi-phase effort directory guidance to ensure consistency

# Search for references to multi-phase efforts and YYYY-MM-DD-Feature-Name pattern
rg -n -C3 --type=md "multi-phase|Multi-phase" | grep -A3 -B3 "YYYY-MM-DD"

# Also search for the specific example pattern
rg -n --type=md "YYYY-MM-DD-Feature-Name"

Repository: RunMaestro/Maestro

Length of output: 8495


Update multi-phase effort guidance across all system prompt files.

The new guidance in maestro-system-prompt.md explicitly prohibits nested subdirectories ("Do NOT create nested subdirectories — all phase documents for a given effort go into one folder, never project/feature/ nesting"), but four other files retain outdated wording that omits this critical prohibition:

  • src/prompts/wizard-inline-system.md:51
  • src/prompts/wizard-inline-new.md:34
  • src/prompts/wizard-inline-iterate-generation.md:201
  • src/prompts/wizard-document-generation.md:278

Update these to match the new directive: use "single flat subdirectory directly under" and include the explicit prohibition against nested structures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/prompts/maestro-system-prompt.md` at line 48, Update the multi-phase
effort guidance in the four system prompt files (wizard-inline-system.md,
wizard-inline-new.md, wizard-inline-iterate-generation.md, and
wizard-document-generation.md) to exactly match the new directive in
maestro-system-prompt.md: state that multi-phase efforts with 3+ phase documents
should be placed in a single flat subdirectory directly under {{AUTORUN_FOLDER}}
prefixed with today's date (example format
YYYY-MM-DD-Feature-Name/FEATURE-NAME-01.md) and explicitly prohibit nested
subdirectories (e.g., “Do NOT create nested subdirectories — all phase documents
for a given effort go into one folder, never project/feature/ nesting”); replace
the outdated wording in the existing multi-phase guidance blocks with this
precise phrasing so all prompts are consistent.

Comment on lines +1613 to +1645
// Filter to individual task entries (exclude loop/session summaries)
const taskEntries = allEntries.filter(
(e) =>
e.type === 'AUTO' &&
e.summary &&
!e.summary.startsWith('Loop ') &&
!e.summary.startsWith('Auto Run ') &&
!e.summary.startsWith('PR created') &&
!e.summary.startsWith('PR creation failed')
);

if (taskEntries.length > totalCompletedTasks) {
const historyTasks = taskEntries.length;
let historyInputTokens = 0;
let historyOutputTokens = 0;
let historyCost = 0;
let historyElapsedMs = 0;

for (const entry of taskEntries) {
if (entry.usageStats) {
historyInputTokens += entry.usageStats.inputTokens || 0;
historyOutputTokens += entry.usageStats.outputTokens || 0;
historyCost += entry.usageStats.totalCostUsd || 0;
}
historyElapsedMs += entry.elapsedTimeMs || 0;
}

// Use history-derived totals when they exceed in-memory counters
totalCompletedTasks = Math.max(totalCompletedTasks, historyTasks);
totalInputTokens = Math.max(totalInputTokens, historyInputTokens);
totalOutputTokens = Math.max(totalOutputTokens, historyOutputTokens);
totalCost = Math.max(totalCost, historyCost);
totalElapsedMs = Math.max(totalElapsedMs, historyElapsedMs);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Exclude Document stalled: entries from the reconciliation set.

Those entries are also AUTO and are written at Lines 1287-1295, so this filter will count them as completed tasks and inflate the final stop/completion summary. Please exclude them explicitly here, and ideally share the predicate with src/cli/services/batch-processor.ts so the two paths stay aligned.

Minimal fix
+const isAutoRunTaskHistoryEntry = (entry: HistoryEntry) =>
+	entry.type === 'AUTO' &&
+	!!entry.summary &&
+	!entry.summary.startsWith('Loop ') &&
+	!entry.summary.startsWith('Auto Run ') &&
+	!entry.summary.startsWith('PR created') &&
+	!entry.summary.startsWith('PR creation failed') &&
+	!entry.summary.startsWith('Document stalled:');
+
 ...
-const taskEntries = allEntries.filter(
-	(e) =>
-		e.type === 'AUTO' &&
-		e.summary &&
-		!e.summary.startsWith('Loop ') &&
-		!e.summary.startsWith('Auto Run ') &&
-		!e.summary.startsWith('PR created') &&
-		!e.summary.startsWith('PR creation failed')
-);
+const taskEntries = allEntries.filter(isAutoRunTaskHistoryEntry);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Filter to individual task entries (exclude loop/session summaries)
const taskEntries = allEntries.filter(
(e) =>
e.type === 'AUTO' &&
e.summary &&
!e.summary.startsWith('Loop ') &&
!e.summary.startsWith('Auto Run ') &&
!e.summary.startsWith('PR created') &&
!e.summary.startsWith('PR creation failed')
);
if (taskEntries.length > totalCompletedTasks) {
const historyTasks = taskEntries.length;
let historyInputTokens = 0;
let historyOutputTokens = 0;
let historyCost = 0;
let historyElapsedMs = 0;
for (const entry of taskEntries) {
if (entry.usageStats) {
historyInputTokens += entry.usageStats.inputTokens || 0;
historyOutputTokens += entry.usageStats.outputTokens || 0;
historyCost += entry.usageStats.totalCostUsd || 0;
}
historyElapsedMs += entry.elapsedTimeMs || 0;
}
// Use history-derived totals when they exceed in-memory counters
totalCompletedTasks = Math.max(totalCompletedTasks, historyTasks);
totalInputTokens = Math.max(totalInputTokens, historyInputTokens);
totalOutputTokens = Math.max(totalOutputTokens, historyOutputTokens);
totalCost = Math.max(totalCost, historyCost);
totalElapsedMs = Math.max(totalElapsedMs, historyElapsedMs);
// Filter to individual task entries (exclude loop/session summaries)
const isAutoRunTaskHistoryEntry = (entry: HistoryEntry) =>
entry.type === 'AUTO' &&
!!entry.summary &&
!entry.summary.startsWith('Loop ') &&
!entry.summary.startsWith('Auto Run ') &&
!entry.summary.startsWith('PR created') &&
!entry.summary.startsWith('PR creation failed') &&
!entry.summary.startsWith('Document stalled:');
const taskEntries = allEntries.filter(isAutoRunTaskHistoryEntry);
if (taskEntries.length > totalCompletedTasks) {
const historyTasks = taskEntries.length;
let historyInputTokens = 0;
let historyOutputTokens = 0;
let historyCost = 0;
let historyElapsedMs = 0;
for (const entry of taskEntries) {
if (entry.usageStats) {
historyInputTokens += entry.usageStats.inputTokens || 0;
historyOutputTokens += entry.usageStats.outputTokens || 0;
historyCost += entry.usageStats.totalCostUsd || 0;
}
historyElapsedMs += entry.elapsedTimeMs || 0;
}
// Use history-derived totals when they exceed in-memory counters
totalCompletedTasks = Math.max(totalCompletedTasks, historyTasks);
totalInputTokens = Math.max(totalInputTokens, historyInputTokens);
totalOutputTokens = Math.max(totalOutputTokens, historyOutputTokens);
totalCost = Math.max(totalCost, historyCost);
totalElapsedMs = Math.max(totalElapsedMs, historyElapsedMs);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/renderer/hooks/batch/useBatchProcessor.ts` around lines 1613 - 1645, The
current filter that builds taskEntries incorrectly includes "Document stalled:"
AUTO entries; update the predicate used when creating taskEntries in
useBatchProcessor.ts to also exclude summaries that start with "Document
stalled:" (e.g., add && !e.summary.startsWith('Document stalled:')). To keep CLI
and renderer logic aligned, extract this predicate into a shared exported helper
(e.g., isIndividualTaskEntry or isAutoTaskEntry) and import/use that helper both
in useBatchProcessor.ts (where taskEntries is computed) and in
src/cli/services/batch-processor.ts so both paths share the same exclusion
rules.

Comment on lines +1648 to +1650
} catch {
// Fall back to in-memory counters if history read fails
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't swallow history reconciliation errors.

If window.maestro.history.getAll() throws, the UI quietly falls back to partial totals and gives us no telemetry to debug why reconciliation failed. Capture the exception with sessionId/session.cwd context before using the fallback.

As per coding guidelines, "Do not silently swallow errors. Let unhandled exceptions bubble up to Sentry for error tracking in production. Handle expected/recoverable errors explicitly (e.g., NETWORK_ERROR). For unexpected errors, re-throw them to allow Sentry to capture them. Use Sentry utilities (captureException, captureMessage) from src/utils/sentry.ts for explicit error reporting with context."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/renderer/hooks/batch/useBatchProcessor.ts` around lines 1648 - 1650, When
calling window.maestro.history.getAll() in useBatchProcessor (in
src/renderer/hooks/batch/useBatchProcessor.ts) don't silently swallow
exceptions: catch the error, call captureException(error, { sessionId, cwd:
session.cwd }) from the Sentry utilities (import from src/utils/sentry.ts) to
log context, then handle expected recoverable errors explicitly or re-throw
unexpected ones so Sentry can capture them; ensure the fallback to in-memory
counters only occurs after logging/explicit handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto Run summary only reports last loop iteration, not cumulative session stats

1 participant