fix: reconcile Auto Run summary with cumulative session stats#735
fix: reconcile Auto Run summary with cumulative session stats#735pedramamini wants to merge 3 commits intomainfrom
Conversation
- Standardized multi-phase auto-run docs into one flat, dated subdirectory 📁 - Explicitly banned nested project/feature folder structures for phase outputs 🚫 - Improved guidance for clean lexicographic sorting with zero-padded phases 🔢 - Made it easier to add entire effort folders to auto-run at once ➕ - Clarified organization rules so related phase documents stay tightly grouped 🧭
…734) In-memory cumulative counters (tasks, tokens, cost, duration) reset when startBatchRun is re-invoked (e.g. after app restart), but individual task history entries persist on disk. For long-running sessions spanning restarts, the final summary only reflected the last invocation's stats. Now, before generating the summary, we read all persisted history entries for the session and compute cumulative totals. The summary uses the max of in-memory counters vs history-derived totals, ensuring accuracy regardless of how many times the batch processor was restarted. Fixes both the renderer (useBatchProcessor) and CLI (batch-processor) codepaths.
📝 WalkthroughWalkthroughThis pull request adds cumulative session statistics reconciliation to Auto Run summaries by reading persisted history entries from disk and cross-validating in-memory counters. It also includes documentation formatting updates and a clarification on multi-phase effort directory structure. Changes
Sequence DiagramsequenceDiagram
participant Client as Client/Renderer
participant Processor as Batch Processor
participant History as History Persistence
participant Summary as Summary Generator
Client->>Processor: Stop auto-run session
Processor->>History: Read all AUTO entries for session.id
History-->>Processor: Return history entries
Processor->>Processor: Filter non-task entries (exclude Loop/PR/summary types)
Processor->>Processor: Aggregate tokens & elapsed from history
alt History task count > in-memory count
Processor->>Processor: Update totals using Math.max(history, in-memory)
else
Processor->>Processor: Keep in-memory totals
end
Processor->>Summary: Compose final summary with reconciled totals
Summary-->>Client: Emit "Auto Run completed" with cumulative stats
Note over Client,Processor: Accurate session stats across all loop iterations
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR fixes a real bug where Auto Run summary stats were reset to zero on app restart by reconciling in-memory counters against persisted history entries before writing the final summary. The fix is applied to both the renderer ( However, the reconciliation reads all history entries for the session without scoping them to the current Auto Run invocation by timestamp. This creates a regression for the more common case where a user runs Auto Run more than once on the same agent session: every subsequent run will absorb all prior task entries and report inflated task counts, token usage, and cost in its summary.
Confidence Score: 2/5Not safe to merge as-is — the reconciliation logic will produce inflated Auto Run summaries for any session that has more than one completed Auto Run invocation The intended fix (handling app-restart continuity) is correct in principle, but the implementation has a clear logic bug that regresses the more common case of running Auto Run multiple times on the same agent. The missing timestamp boundary means all prior task history is summed into every new summary. The fix is also duplicated across two code paths that must stay in sync. src/renderer/hooks/batch/useBatchProcessor.ts and src/cli/services/batch-processor.ts — specifically the history filter predicate that lacks a batchStartTime lower bound Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant BatchProcessor
participant HistoryStore
User->>BatchProcessor: Stop / complete Auto Run
BatchProcessor->>HistoryStore: getAll(cwd, sessionId)
HistoryStore-->>BatchProcessor: ALL entries for session (no timestamp bound)
BatchProcessor->>BatchProcessor: filter to AUTO task entries
Note over BatchProcessor: ⚠️ Includes entries from prior completed runs
alt taskEntries.length > totalCompletedTasks
BatchProcessor->>BatchProcessor: sum tokens/cost from ALL task entries
BatchProcessor->>BatchProcessor: Math.max(inMemory, historyDerived)
Note over BatchProcessor: Correct for restart scenario<br/>Overcounts for repeat-run scenario
else
BatchProcessor->>BatchProcessor: Use in-memory counters unchanged
end
BatchProcessor->>HistoryStore: addHistoryEntry(finalSummary)
BatchProcessor-->>User: Show Auto Run summary
Reviews (1): Last reviewed commit: "style: format docs/releases.md with pret..." | Re-trigger Greptile |
| const allEntries = await window.maestro.history.getAll(session.cwd, sessionId); | ||
| if (Array.isArray(allEntries) && allEntries.length > 0) { | ||
| // Filter to individual task entries (exclude loop/session summaries) | ||
| const taskEntries = allEntries.filter( | ||
| (e) => | ||
| e.type === 'AUTO' && | ||
| e.summary && | ||
| !e.summary.startsWith('Loop ') && | ||
| !e.summary.startsWith('Auto Run ') && | ||
| !e.summary.startsWith('PR created') && | ||
| !e.summary.startsWith('PR creation failed') | ||
| ); |
There was a problem hiding this comment.
History entries not scoped to the current Auto Run invocation
window.maestro.history.getAll(session.cwd, sessionId) returns all history entries ever written for this session — including entries from previous, fully-completed Auto Run sessions on the same agent. The reconciliation guard taskEntries.length > totalCompletedTasks (line 1624) fires at the start of every second run on the same agent because the old entries are still present in history.
Concrete failure scenario:
- Run Internal Logging #1 on agent A completes normally → 30 task entries written to history.
- Run Add Claude Code GitHub Workflow #2 on agent A completes 20 tasks.
- At summary time:
taskEntries.length = 50(30 old + 20 new),totalCompletedTasks = 20. - Guard fires → summary reports 50 tasks / all tokens / all cost from both runs instead of the correct 20.
The same issue exists in src/cli/services/batch-processor.ts around line 309.
Fix: add a batchStartTime lower-bound to the filter so only entries that pre-date the current invocation are pulled in for reconciliation, while entries written during this invocation are already captured by in-memory counters:
const taskEntries = allEntries.filter(
(e) =>
e.type === 'AUTO' &&
e.summary &&
!e.summary.startsWith('Loop ') &&
!e.summary.startsWith('Auto Run ') &&
!e.summary.startsWith('PR created') &&
!e.summary.startsWith('PR creation failed') &&
e.timestamp < batchStartTime // only include pre-restart entries
);
// In-memory totalCompletedTasks already covers the current invocation;
// add taskEntries.length for prior-restart invocations:
finalTotalTasks = totalCompletedTasks + taskEntries.length;| const allEntries = readHistory(undefined, session.id); | ||
| if (allEntries.length > 0) { | ||
| const taskEntries = allEntries.filter( | ||
| (e) => | ||
| e.type === 'AUTO' && | ||
| e.summary && | ||
| !e.summary.startsWith('Loop ') && | ||
| !e.summary.startsWith('Auto Run ') && | ||
| !e.summary.startsWith('PR created') && | ||
| !e.summary.startsWith('PR creation failed') | ||
| ); | ||
|
|
||
| if (taskEntries.length > finalTotalTasks) { |
There was a problem hiding this comment.
Same over-counting issue as in the renderer path
readHistory(undefined, session.id) returns all history for the session with no time boundary. After a successfully-completed Auto Run, launching a second batch on the same session will trigger the reconciliation guard (old entries outnumber the new in-memory tasks) and inflate the summary stats.
Apply the same e.timestamp < batchStartTime filter described in the renderer comment:
const taskEntries = allEntries.filter(
(e) =>
e.type === 'AUTO' &&
e.summary &&
!e.summary.startsWith('Loop ') &&
!e.summary.startsWith('Auto Run ') &&
!e.summary.startsWith('PR created') &&
!e.summary.startsWith('PR creation failed') &&
e.timestamp < batchStartTime // scope to prior-restart entries only
);| } catch { | ||
| // Fall back to in-memory counters if history read fails | ||
| } |
There was a problem hiding this comment.
Silent catch block swallows unexpected errors
An empty catch {} silences anything beyond the expected "history not found" case — e.g., a malformed IPC response or a type assertion failure. Per the project's guidelines in CLAUDE.md, unexpected errors should be logged and surfaced to Sentry rather than disappearing silently.
| } catch { | |
| // Fall back to in-memory counters if history read fails | |
| } | |
| } catch (err) { | |
| // Fall back to in-memory counters if history read fails | |
| console.warn('[BatchProcessor] History reconciliation failed, using in-memory counters', err); | |
| } |
| } catch { | ||
| // Fall back to in-memory counters if history read fails | ||
| } |
There was a problem hiding this comment.
Silent catch hides reconciliation failures in the CLI path
Same pattern as the renderer: unexpected errors are swallowed without any trace. At a minimum, log with the project's logger so failures are visible in debug output:
| } catch { | |
| // Fall back to in-memory counters if history read fails | |
| } | |
| } catch (err) { | |
| // Fall back to in-memory counters if history read fails | |
| logger.warn('History reconciliation failed, using in-memory counters', String(err)); | |
| } |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
docs/releases.md (2)
20-20: Fix heading hierarchy regression (#used where nested headings are expected).These headings are nested under release sections but were changed to top-level
#, which can break document outline/TOC structure. Please demote them to###(or##where appropriate) consistently.Proposed markdown fix
-# Major 0.15.x Additions +### Major 0.15.x Additions -# Smaller Changes in 014.x +### Smaller Changes in 0.14.x -# Other Changes +### Other Changes -# Onboarding, Wizard, and Tours +### Onboarding, Wizard, and Tours -# UI / UX Enhancements +### UI / UX Enhancements -# Auto Run Workflow Improvements +### Auto Run Workflow Improvements -# Application Behavior / Core Fixes +### Application Behavior / Core Fixes -# Update System +### Update SystemAlso applies to: 80-80, 176-176, 268-268, 276-276, 286-286, 291-291, 297-297
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/releases.md` at line 20, Several headings in docs/releases.md (e.g., the "Major 0.15.x Additions" heading and the other occurrences noted at lines 80, 176, 268, 276, 286, 291, 297) were mistakenly rendered as top-level `#` headings; change those `#` headings to the appropriate nested level (use `###` for subsection items or `##` where they are direct children of a release section) so the document outline/TOC hierarchy is restored—scan for headings using a single `#` in that file and demote them consistently to `###` (or `##` when it should be a direct child of a release header).
439-448: Use a single bullet style for Markdown list consistency.These sections use
•while surrounding lists use-. Standardizing improves renderer compatibility and keeps formatting consistent.Proposed markdown fix
-• Enhanced mobile web interface with session sync and history panel 📱 +- Enhanced mobile web interface with session sync and history panel 📱 ... -• Added splash screen with logo and progress bar during startup 🎨 +- Added splash screen with logo and progress bar during startup 🎨 -• Added template variables for dynamic AI command customization 🎯 +- Added template variables for dynamic AI command customization 🎯 ... -• Updated documentation with new features and template references 📖 +- Updated documentation with new features and template references 📖Also applies to: 462-471
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/releases.md` around lines 439 - 448, The Markdown release notes use mixed bullet characters (the diff shows a bullet list starting with "Enhanced mobile web interface with session sync and history panel 📱" that uses • while other lists use -); standardize all lists to a single bullet style by replacing every occurrence of the • bullets in this section (and the similar block around "Also applies to: 462-471") with the hyphen (-) form so all Markdown lists use the same dash-style bullets.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/releases.md`:
- Line 74: Fix the user-facing typo in the docs string "🗄️ Document Graphs.
Launch from file preview or from the FIle tree panel." by changing the token
"FIle" to "File" so the sentence reads "Launch from file preview or from the
File tree panel."; edit the string in docs/releases.md accordingly.
In `@src/cli/services/batch-processor.ts`:
- Around line 298-338: The reconciled totals (finalTotalTasks,
finalTotalInputTokens, finalTotalOutputTokens, finalTotalCost,
finalTotalElapsedMs) are computed but never used when emitting the terminal
"complete" event, so update the generator/emit logic that currently sends
totalTasksCompleted, totalElapsedMs, and totalCost to use the corresponding
finalTotal* variables instead; locate the code that emits the terminal
complete/summary event (the generator that emits the final CLI/JSONL complete
event) and replace references to the in-memory counters (totalCompletedTasks,
totalInputTokens, totalOutputTokens, totalCost, elapsedMs) with finalTotalTasks,
finalTotalInputTokens, finalTotalOutputTokens, finalTotalCost,
finalTotalElapsedMs so resumed runs report the reconciled totals.
- Around line 341-343: Replace the empty catch after readHistory() so the
exception is not swallowed: change to catch (err) { import and call the Sentry
helper captureException (or captureMessage) from src/utils/sentry.ts with
contextual data including session.id (e.g., captureException(err, { extra: {
sessionId: session.id } })), also emit a concise processLogger.warn/error that
includes session.id and the error message; if you can identify
expected/recoverable errors (e.g., err.code === 'NETWORK_ERROR') allow the
fallback to in-memory totals, otherwise re-throw the error so it bubbles to
Sentry/upper layers. }
In `@src/prompts/maestro-system-prompt.md`:
- Line 48: Update the multi-phase effort guidance in the four system prompt
files (wizard-inline-system.md, wizard-inline-new.md,
wizard-inline-iterate-generation.md, and wizard-document-generation.md) to
exactly match the new directive in maestro-system-prompt.md: state that
multi-phase efforts with 3+ phase documents should be placed in a single flat
subdirectory directly under {{AUTORUN_FOLDER}} prefixed with today's date
(example format YYYY-MM-DD-Feature-Name/FEATURE-NAME-01.md) and explicitly
prohibit nested subdirectories (e.g., “Do NOT create nested subdirectories — all
phase documents for a given effort go into one folder, never project/feature/
nesting”); replace the outdated wording in the existing multi-phase guidance
blocks with this precise phrasing so all prompts are consistent.
In `@src/renderer/hooks/batch/useBatchProcessor.ts`:
- Around line 1648-1650: When calling window.maestro.history.getAll() in
useBatchProcessor (in src/renderer/hooks/batch/useBatchProcessor.ts) don't
silently swallow exceptions: catch the error, call captureException(error, {
sessionId, cwd: session.cwd }) from the Sentry utilities (import from
src/utils/sentry.ts) to log context, then handle expected recoverable errors
explicitly or re-throw unexpected ones so Sentry can capture them; ensure the
fallback to in-memory counters only occurs after logging/explicit handling.
- Around line 1613-1645: The current filter that builds taskEntries incorrectly
includes "Document stalled:" AUTO entries; update the predicate used when
creating taskEntries in useBatchProcessor.ts to also exclude summaries that
start with "Document stalled:" (e.g., add && !e.summary.startsWith('Document
stalled:')). To keep CLI and renderer logic aligned, extract this predicate into
a shared exported helper (e.g., isIndividualTaskEntry or isAutoTaskEntry) and
import/use that helper both in useBatchProcessor.ts (where taskEntries is
computed) and in src/cli/services/batch-processor.ts so both paths share the
same exclusion rules.
---
Nitpick comments:
In `@docs/releases.md`:
- Line 20: Several headings in docs/releases.md (e.g., the "Major 0.15.x
Additions" heading and the other occurrences noted at lines 80, 176, 268, 276,
286, 291, 297) were mistakenly rendered as top-level `#` headings; change those
`#` headings to the appropriate nested level (use `###` for subsection items or
`##` where they are direct children of a release section) so the document
outline/TOC hierarchy is restored—scan for headings using a single `#` in that
file and demote them consistently to `###` (or `##` when it should be a direct
child of a release header).
- Around line 439-448: The Markdown release notes use mixed bullet characters
(the diff shows a bullet list starting with "Enhanced mobile web interface with
session sync and history panel 📱" that uses • while other lists use -);
standardize all lists to a single bullet style by replacing every occurrence of
the • bullets in this section (and the similar block around "Also applies to:
462-471") with the hyphen (-) form so all Markdown lists use the same dash-style
bullets.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7545d087-870e-4255-a85b-ad19eb6a650a
📒 Files selected for processing (4)
docs/releases.mdsrc/cli/services/batch-processor.tssrc/prompts/maestro-system-prompt.mdsrc/renderer/hooks/batch/useBatchProcessor.ts
|
|
||
| The major contributions to 0.14.x remain: | ||
|
|
||
| 🗄️ Document Graphs. Launch from file preview or from the FIle tree panel. Explore relationships between Markdown documents that contain links between documents and to URLs. |
There was a problem hiding this comment.
Correct typo in user-facing text (FIle → File).
Small but visible docs typo in “File tree panel”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/releases.md` at line 74, Fix the user-facing typo in the docs string
"🗄️ Document Graphs. Launch from file preview or from the FIle tree panel." by
changing the token "FIle" to "File" so the sentence reads "Launch from file
preview or from the File tree panel."; edit the string in docs/releases.md
accordingly.
| let finalTotalTasks = totalCompletedTasks; | ||
| let finalTotalInputTokens = totalInputTokens; | ||
| let finalTotalOutputTokens = totalOutputTokens; | ||
| let finalTotalCost = totalCost; | ||
| let finalTotalElapsedMs = Date.now() - batchStartTime; | ||
|
|
||
| // Reconcile in-memory counters with persisted history entries. | ||
| // In-memory counters reset on process restart, but history entries persist on disk. | ||
| try { | ||
| const allEntries = readHistory(undefined, session.id); | ||
| if (allEntries.length > 0) { | ||
| const taskEntries = allEntries.filter( | ||
| (e) => | ||
| e.type === 'AUTO' && | ||
| e.summary && | ||
| !e.summary.startsWith('Loop ') && | ||
| !e.summary.startsWith('Auto Run ') && | ||
| !e.summary.startsWith('PR created') && | ||
| !e.summary.startsWith('PR creation failed') | ||
| ); | ||
|
|
||
| if (taskEntries.length > finalTotalTasks) { | ||
| let historyInputTokens = 0; | ||
| let historyOutputTokens = 0; | ||
| let historyCost = 0; | ||
| let historyElapsedMs = 0; | ||
|
|
||
| for (const entry of taskEntries) { | ||
| if (entry.usageStats) { | ||
| historyInputTokens += entry.usageStats.inputTokens || 0; | ||
| historyOutputTokens += entry.usageStats.outputTokens || 0; | ||
| historyCost += entry.usageStats.totalCostUsd || 0; | ||
| } | ||
| historyElapsedMs += entry.elapsedTimeMs || 0; | ||
| } | ||
|
|
||
| finalTotalTasks = Math.max(finalTotalTasks, taskEntries.length); | ||
| finalTotalInputTokens = Math.max(finalTotalInputTokens, historyInputTokens); | ||
| finalTotalOutputTokens = Math.max(finalTotalOutputTokens, historyOutputTokens); | ||
| finalTotalCost = Math.max(finalTotalCost, historyCost); | ||
| finalTotalElapsedMs = Math.max(finalTotalElapsedMs, historyElapsedMs); |
There was a problem hiding this comment.
Reuse the reconciled totals for the terminal complete event.
finalTotal* never leave this helper, so the generator still emits totalTasksCompleted, totalElapsedMs, and totalCost from the current process only. After a resume, CLI/JSONL consumers will still see underreported totals even though the persisted summary entry is corrected.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/cli/services/batch-processor.ts` around lines 298 - 338, The reconciled
totals (finalTotalTasks, finalTotalInputTokens, finalTotalOutputTokens,
finalTotalCost, finalTotalElapsedMs) are computed but never used when emitting
the terminal "complete" event, so update the generator/emit logic that currently
sends totalTasksCompleted, totalElapsedMs, and totalCost to use the
corresponding finalTotal* variables instead; locate the code that emits the
terminal complete/summary event (the generator that emits the final CLI/JSONL
complete event) and replace references to the in-memory counters
(totalCompletedTasks, totalInputTokens, totalOutputTokens, totalCost, elapsedMs)
with finalTotalTasks, finalTotalInputTokens, finalTotalOutputTokens,
finalTotalCost, finalTotalElapsedMs so resumed runs report the reconciled
totals.
| } catch { | ||
| // Fall back to in-memory counters if history read fails | ||
| } |
There was a problem hiding this comment.
Don't silently swallow readHistory() failures.
Falling back to in-memory totals is fine, but the empty catch hides the exact failure mode this reconciliation depends on. Please capture/log the exception with session.id before continuing.
As per coding guidelines, "Do not silently swallow errors. Let unhandled exceptions bubble up to Sentry for error tracking in production. Handle expected/recoverable errors explicitly (e.g., NETWORK_ERROR). For unexpected errors, re-throw them to allow Sentry to capture them. Use Sentry utilities (captureException, captureMessage) from src/utils/sentry.ts for explicit error reporting with context."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/cli/services/batch-processor.ts` around lines 341 - 343, Replace the
empty catch after readHistory() so the exception is not swallowed: change to
catch (err) { import and call the Sentry helper captureException (or
captureMessage) from src/utils/sentry.ts with contextual data including
session.id (e.g., captureException(err, { extra: { sessionId: session.id } })),
also emit a concise processLogger.warn/error that includes session.id and the
error message; if you can identify expected/recoverable errors (e.g., err.code
=== 'NETWORK_ERROR') allow the fallback to in-memory totals, otherwise re-throw
the error so it bubbles to Sentry/upper layers. }
| When a user wants an auto-run document (or playbook), create a detailed multi-document, multi-point Markdown implementation plan in the `{{AUTORUN_FOLDER}}` folder. Use the format `$PREFIX-XX.md`, where `XX` is the two-digit phase number (01, 02, etc.) and `$PREFIX` is the effort name. Always zero-pad phase numbers to ensure correct lexicographic sorting. Break phases by relevant context; do not mix unrelated task results in the same document. If working within a file, group and fix all type issues in that file together. If working with an MCP, keep all related tasks in the same document. Each task must be written as `- [ ] ...` so auto-run can execute and check them off with comments on completion. | ||
|
|
||
| **Multi-phase efforts:** When creating 3 or more phase documents for a single effort, place them in a dedicated subdirectory prefixed with today's date (e.g., `{{AUTORUN_FOLDER}}/YYYY-MM-DD-Feature-Name/FEATURE-NAME-01.md`). This allows users to add the entire folder at once and keeps related documents organized with a clear creation date. | ||
| **Multi-phase efforts:** When creating 3 or more phase documents for a single effort, place them in a single flat subdirectory directly under `{{AUTORUN_FOLDER}}`, prefixed with today's date (e.g., `{{AUTORUN_FOLDER}}/YYYY-MM-DD-Feature-Name/FEATURE-NAME-01.md`). Do NOT create nested subdirectories — all phase documents for a given effort go into one folder, never `project/feature/` nesting. This allows users to add the entire folder at once and keeps related documents organized with a clear creation date. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Find all files with multi-phase effort directory guidance to ensure consistency
# Search for references to multi-phase efforts and YYYY-MM-DD-Feature-Name pattern
rg -n -C3 --type=md "multi-phase|Multi-phase" | grep -A3 -B3 "YYYY-MM-DD"
# Also search for the specific example pattern
rg -n --type=md "YYYY-MM-DD-Feature-Name"Repository: RunMaestro/Maestro
Length of output: 8495
Update multi-phase effort guidance across all system prompt files.
The new guidance in maestro-system-prompt.md explicitly prohibits nested subdirectories ("Do NOT create nested subdirectories — all phase documents for a given effort go into one folder, never project/feature/ nesting"), but four other files retain outdated wording that omits this critical prohibition:
src/prompts/wizard-inline-system.md:51src/prompts/wizard-inline-new.md:34src/prompts/wizard-inline-iterate-generation.md:201src/prompts/wizard-document-generation.md:278
Update these to match the new directive: use "single flat subdirectory directly under" and include the explicit prohibition against nested structures.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/prompts/maestro-system-prompt.md` at line 48, Update the multi-phase
effort guidance in the four system prompt files (wizard-inline-system.md,
wizard-inline-new.md, wizard-inline-iterate-generation.md, and
wizard-document-generation.md) to exactly match the new directive in
maestro-system-prompt.md: state that multi-phase efforts with 3+ phase documents
should be placed in a single flat subdirectory directly under {{AUTORUN_FOLDER}}
prefixed with today's date (example format
YYYY-MM-DD-Feature-Name/FEATURE-NAME-01.md) and explicitly prohibit nested
subdirectories (e.g., “Do NOT create nested subdirectories — all phase documents
for a given effort go into one folder, never project/feature/ nesting”); replace
the outdated wording in the existing multi-phase guidance blocks with this
precise phrasing so all prompts are consistent.
| // Filter to individual task entries (exclude loop/session summaries) | ||
| const taskEntries = allEntries.filter( | ||
| (e) => | ||
| e.type === 'AUTO' && | ||
| e.summary && | ||
| !e.summary.startsWith('Loop ') && | ||
| !e.summary.startsWith('Auto Run ') && | ||
| !e.summary.startsWith('PR created') && | ||
| !e.summary.startsWith('PR creation failed') | ||
| ); | ||
|
|
||
| if (taskEntries.length > totalCompletedTasks) { | ||
| const historyTasks = taskEntries.length; | ||
| let historyInputTokens = 0; | ||
| let historyOutputTokens = 0; | ||
| let historyCost = 0; | ||
| let historyElapsedMs = 0; | ||
|
|
||
| for (const entry of taskEntries) { | ||
| if (entry.usageStats) { | ||
| historyInputTokens += entry.usageStats.inputTokens || 0; | ||
| historyOutputTokens += entry.usageStats.outputTokens || 0; | ||
| historyCost += entry.usageStats.totalCostUsd || 0; | ||
| } | ||
| historyElapsedMs += entry.elapsedTimeMs || 0; | ||
| } | ||
|
|
||
| // Use history-derived totals when they exceed in-memory counters | ||
| totalCompletedTasks = Math.max(totalCompletedTasks, historyTasks); | ||
| totalInputTokens = Math.max(totalInputTokens, historyInputTokens); | ||
| totalOutputTokens = Math.max(totalOutputTokens, historyOutputTokens); | ||
| totalCost = Math.max(totalCost, historyCost); | ||
| totalElapsedMs = Math.max(totalElapsedMs, historyElapsedMs); |
There was a problem hiding this comment.
Exclude Document stalled: entries from the reconciliation set.
Those entries are also AUTO and are written at Lines 1287-1295, so this filter will count them as completed tasks and inflate the final stop/completion summary. Please exclude them explicitly here, and ideally share the predicate with src/cli/services/batch-processor.ts so the two paths stay aligned.
Minimal fix
+const isAutoRunTaskHistoryEntry = (entry: HistoryEntry) =>
+ entry.type === 'AUTO' &&
+ !!entry.summary &&
+ !entry.summary.startsWith('Loop ') &&
+ !entry.summary.startsWith('Auto Run ') &&
+ !entry.summary.startsWith('PR created') &&
+ !entry.summary.startsWith('PR creation failed') &&
+ !entry.summary.startsWith('Document stalled:');
+
...
-const taskEntries = allEntries.filter(
- (e) =>
- e.type === 'AUTO' &&
- e.summary &&
- !e.summary.startsWith('Loop ') &&
- !e.summary.startsWith('Auto Run ') &&
- !e.summary.startsWith('PR created') &&
- !e.summary.startsWith('PR creation failed')
-);
+const taskEntries = allEntries.filter(isAutoRunTaskHistoryEntry);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Filter to individual task entries (exclude loop/session summaries) | |
| const taskEntries = allEntries.filter( | |
| (e) => | |
| e.type === 'AUTO' && | |
| e.summary && | |
| !e.summary.startsWith('Loop ') && | |
| !e.summary.startsWith('Auto Run ') && | |
| !e.summary.startsWith('PR created') && | |
| !e.summary.startsWith('PR creation failed') | |
| ); | |
| if (taskEntries.length > totalCompletedTasks) { | |
| const historyTasks = taskEntries.length; | |
| let historyInputTokens = 0; | |
| let historyOutputTokens = 0; | |
| let historyCost = 0; | |
| let historyElapsedMs = 0; | |
| for (const entry of taskEntries) { | |
| if (entry.usageStats) { | |
| historyInputTokens += entry.usageStats.inputTokens || 0; | |
| historyOutputTokens += entry.usageStats.outputTokens || 0; | |
| historyCost += entry.usageStats.totalCostUsd || 0; | |
| } | |
| historyElapsedMs += entry.elapsedTimeMs || 0; | |
| } | |
| // Use history-derived totals when they exceed in-memory counters | |
| totalCompletedTasks = Math.max(totalCompletedTasks, historyTasks); | |
| totalInputTokens = Math.max(totalInputTokens, historyInputTokens); | |
| totalOutputTokens = Math.max(totalOutputTokens, historyOutputTokens); | |
| totalCost = Math.max(totalCost, historyCost); | |
| totalElapsedMs = Math.max(totalElapsedMs, historyElapsedMs); | |
| // Filter to individual task entries (exclude loop/session summaries) | |
| const isAutoRunTaskHistoryEntry = (entry: HistoryEntry) => | |
| entry.type === 'AUTO' && | |
| !!entry.summary && | |
| !entry.summary.startsWith('Loop ') && | |
| !entry.summary.startsWith('Auto Run ') && | |
| !entry.summary.startsWith('PR created') && | |
| !entry.summary.startsWith('PR creation failed') && | |
| !entry.summary.startsWith('Document stalled:'); | |
| const taskEntries = allEntries.filter(isAutoRunTaskHistoryEntry); | |
| if (taskEntries.length > totalCompletedTasks) { | |
| const historyTasks = taskEntries.length; | |
| let historyInputTokens = 0; | |
| let historyOutputTokens = 0; | |
| let historyCost = 0; | |
| let historyElapsedMs = 0; | |
| for (const entry of taskEntries) { | |
| if (entry.usageStats) { | |
| historyInputTokens += entry.usageStats.inputTokens || 0; | |
| historyOutputTokens += entry.usageStats.outputTokens || 0; | |
| historyCost += entry.usageStats.totalCostUsd || 0; | |
| } | |
| historyElapsedMs += entry.elapsedTimeMs || 0; | |
| } | |
| // Use history-derived totals when they exceed in-memory counters | |
| totalCompletedTasks = Math.max(totalCompletedTasks, historyTasks); | |
| totalInputTokens = Math.max(totalInputTokens, historyInputTokens); | |
| totalOutputTokens = Math.max(totalOutputTokens, historyOutputTokens); | |
| totalCost = Math.max(totalCost, historyCost); | |
| totalElapsedMs = Math.max(totalElapsedMs, historyElapsedMs); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/renderer/hooks/batch/useBatchProcessor.ts` around lines 1613 - 1645, The
current filter that builds taskEntries incorrectly includes "Document stalled:"
AUTO entries; update the predicate used when creating taskEntries in
useBatchProcessor.ts to also exclude summaries that start with "Document
stalled:" (e.g., add && !e.summary.startsWith('Document stalled:')). To keep CLI
and renderer logic aligned, extract this predicate into a shared exported helper
(e.g., isIndividualTaskEntry or isAutoTaskEntry) and import/use that helper both
in useBatchProcessor.ts (where taskEntries is computed) and in
src/cli/services/batch-processor.ts so both paths share the same exclusion
rules.
| } catch { | ||
| // Fall back to in-memory counters if history read fails | ||
| } |
There was a problem hiding this comment.
Don't swallow history reconciliation errors.
If window.maestro.history.getAll() throws, the UI quietly falls back to partial totals and gives us no telemetry to debug why reconciliation failed. Capture the exception with sessionId/session.cwd context before using the fallback.
As per coding guidelines, "Do not silently swallow errors. Let unhandled exceptions bubble up to Sentry for error tracking in production. Handle expected/recoverable errors explicitly (e.g., NETWORK_ERROR). For unexpected errors, re-throw them to allow Sentry to capture them. Use Sentry utilities (captureException, captureMessage) from src/utils/sentry.ts for explicit error reporting with context."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/renderer/hooks/batch/useBatchProcessor.ts` around lines 1648 - 1650, When
calling window.maestro.history.getAll() in useBatchProcessor (in
src/renderer/hooks/batch/useBatchProcessor.ts) don't silently swallow
exceptions: catch the error, call captureException(error, { sessionId, cwd:
session.cwd }) from the Sentry utilities (import from src/utils/sentry.ts) to
log context, then handle expected recoverable errors explicitly or re-throw
unexpected ones so Sentry can capture them; ensure the fallback to in-memory
counters only occurs after logging/explicit handling.
Summary
useBatchProcessor.ts) and CLI (batch-processor.ts) codepathsCloses #734
Root Cause
In-memory cumulative counters (
totalCompletedTasks,totalInputTokens,totalOutputTokens,totalCost) are initialized fresh each timestartBatchRunis called. For long-running sessions that survive app restarts, manual stop/restart cycles, or process kills, the final "Auto Run stopped" summary only reflected stats from the most recent invocation — while all individual task history entries persisted on disk across restarts.Fix
Before generating the final summary, read all persisted history entries for the session and compute cumulative totals from task entries (filtering out loop/session summary entries). Use
Math.max(inMemory, historyDerived)to ensure the summary is accurate regardless of restart history. Falls back gracefully to in-memory counters if history read fails.Test plan
npm run lintandnpm test— all passSummary by CodeRabbit
Documentation
Bug Fixes