Skip to content

fix(bench): attribute unified walk time to per-phase timers#858

Merged
carlos-alm merged 4 commits intomainfrom
fix/benchmark-timing-attribution
Apr 5, 2026
Merged

fix(bench): attribute unified walk time to per-phase timers#858
carlos-alm merged 4 commits intomainfrom
fix/benchmark-timing-attribution

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Fixes misleading benchmark phase breakdown where WASM 1-file incremental rebuilds showed ~0ms for AST/complexity/CFG/dataflow phases
  • The unified walk (walkWithVisitors) did all compute work upfront but its time was stored in a hidden _unifiedWalkMs field, never attributed to individual phase timers
  • delegateToBuildFunctions then overwrote phase timers with just the near-zero DB-write time

Changes

  • Time each file's walkWithVisitors call and distribute proportionally among active visitors (timing.astMs += share)
  • Changed delegateToBuildFunctions from timing.xxxMs = ... to timing.xxxMs += ... so DB-write time accumulates on top of walk time
  • WASM phase timers now include walk + DB-write time, matching how native timers already work

Test plan

  • tsc --noEmit passes
  • biome check clean on changed file
  • tests/builder/pipeline.test.ts — 4/4 pass
  • tests/integration/build.test.ts — 17/17 pass (2 skipped, 1 pre-existing WASM grammar failure in worktree)

The analysis engine runs AST/complexity/CFG/dataflow visitors in a
single DFS walk, then delegates to buildXxx functions for DB writes.
The walk time was stored in a hidden _unifiedWalkMs field while the
reported phase timers only captured the DB-write tail. This made WASM
1-file incremental benchmarks show ~0ms for CFG/dataflow/complexity
(the real work was done in the unified walk but never attributed).

Now each file's walk time is distributed proportionally among active
visitors, and delegateToBuildFunctions accumulates (+=) rather than
overwrites (=) so both walk + DB-write time are captured.
@claude
Copy link
Copy Markdown

claude bot commented Apr 4, 2026

Claude finished @carlos-alm's task —— View job


PR Review Complete

Reviewing PR #858: fix(bench): attribute unified walk time to per-phase timers

Root Cause Analysis

Problem Being Solved: WASM 1-file incremental rebuilds showed misleading benchmark results with ~0ms for AST/complexity/CFG/dataflow phases, while all the actual computation work was being done but not attributed correctly.

Why This Approach: This is a legitimate timing attribution fix addressing a real measurement issue. The unified walk (walkWithVisitors) did all the computation work upfront but stored timing in a hidden _unifiedWalkMs field, while delegateToBuildFunctions overwrote phase timers with just the near-zero DB-write time using timing.xxxMs = ....

Risk Assessment: Low risk. This is purely a measurement/observability fix with no functional behavior changes.

Backlog Compliance

  • Zero-dep: ✓ — No new dependencies added
  • Foundation-aligned: ✓ — Improves timing observability, maintains zero-cost core
  • Problem-fit: 3 — Improves benchmark accuracy but doesn't directly address the core "lost AI" problem
  • Breaking: No — Purely additive timing attribution changes
  • Tier: 1

Critical Concerns

None identified. This is a clean, targeted fix.

Code Quality Analysis

Strengths

  1. Root cause identified correctly: The issue was real — walk time stored in hidden _unifiedWalkMs while phase timers only captured DB writes
  2. Proportional distribution logic: Lines 725-733 properly distribute walk time among active visitors using equal shares
  3. Accumulative timing: Changed from timing.xxxMs = ... to timing.xxxMs += ... (lines 632, 643, 654, 665) so both walk + DB time are captured
  4. Clean implementation: The fix is surgical and doesn't affect functional behavior

🔍 Implementation Details

  • Walk timing: Lines 720-722 capture per-file walkWithVisitors timing
  • Active visitor counting: Lines 725-727 count non-null visitors for proportional distribution
  • Time attribution: Lines 730-733 add proportional shares to phase timers
  • The _unifiedWalkMs field (line 748) is preserved for internal tracking

Edge Cases Handled

  • Zero active visitors: activeCount > 0 check prevents division by zero
  • Visitor filtering: Uses .filter(Boolean) to count only active visitors
  • Multiple analyses: Each visitor type gets its fair share of walk time

📊 Benchmark Impact

This fix will make WASM 1-file incremental benchmarks show realistic phase breakdown instead of misleading ~0ms values. Native engine timers already worked correctly, so this achieves parity.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality fix addressing a real observability gap. Clean implementation, no functional risk, proper edge case handling. The only reason it's not 5/5 is that it's a measurement fix rather than a core feature enhancement.

This PR should be merged — it fixes misleading benchmark data without introducing any functional changes or risks.


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 4, 2026

Greptile Summary

This PR fixes a benchmarking accuracy issue in runAnalyses where WASM phase timers (astMs, complexityMs, cfgMs, dataflowMs) showed near-zero values for 1-file incremental rebuilds. All computation happened inside walkWithVisitors, but its time was only stored in the hidden _unifiedWalkMs field — then delegateToBuildFunctions overwrote (not accumulated) those timers with just the trivial DB-write duration.

Key changes:

  • Times each file's walkWithVisitors call and distributes the elapsed time equally among active visitors (timing.astMs += share, etc.)
  • Changes delegateToBuildFunctions from timing.xxxMs = to timing.xxxMs += so DB-write time accumulates on top of walk time
  • Documents _unifiedWalkMs as a diagnostic overlap field in the AnalysisTiming interface (not an additive bucket), with a JSDoc note that Σ phase timers ≈ _unifiedWalkMs

The fix is correct and well-scoped. All prior review concerns (the "proportionally" vs "equally" wording) have been resolved in the head commit.

Confidence Score: 5/5

Safe to merge — changes are limited to timing instrumentation with no effect on analysis correctness or DB output.

Both changed files touch only benchmarking/timing fields. The core logic fix (+= instead of =) is straightforward and correct. The prior review concern about 'proportionally' vs 'equally' wording has been resolved in the head commit. All tests pass.

No files require special attention.

Important Files Changed

Filename Overview
src/ast-analysis/engine.ts Adds per-file walkWithVisitors timing, distributes elapsed time equally among active visitors, and fixes delegateToBuildFunctions to accumulate (+=) rather than overwrite (=) phase timers
src/types.ts Documents _unifiedWalkMs as a diagnostic cross-check field (not an additive bucket) in AnalysisTiming with clear JSDoc

Sequence Diagram

sequenceDiagram
    participant RA as runAnalyses
    participant EW as ensureWasmTreesIfNeeded
    participant SW as setupVisitors
    participant WV as walkWithVisitors
    participant PT as timing (per-phase)
    participant DB as delegateToBuildFunctions

    RA->>EW: pre-parse WASM trees
    loop for each file with _tree
        RA->>SW: setupVisitors(db, relPath, symbols, langId, opts)
        SW-->>RA: {visitors, astVisitor, complexityVisitor, cfgVisitor, dataflowVisitor}
        Note over RA: walkStart = performance.now()
        RA->>WV: walkWithVisitors(rootNode, visitors, langId, walkerOpts)
        WV-->>RA: results (AST / complexity / CFG / dataflow)
        Note over RA: walkMs = elapsed; share = walkMs / activeCount
        RA->>PT: astMs += share
        RA->>PT: complexityMs += share
        RA->>PT: cfgMs += share
        RA->>PT: dataflowMs += share
    end
    Note over RA: _unifiedWalkMs = total loop wall-clock
    RA->>DB: delegateToBuildFunctions(db, fileSymbols, timing)
    DB->>PT: astMs += DB-write elapsed
    DB->>PT: complexityMs += DB-write elapsed
    DB->>PT: cfgMs += DB-write elapsed
    DB->>PT: dataflowMs += DB-write elapsed
Loading

Reviews (2): Last reviewed commit: "fix(bench): clarify timing comments per ..." | Re-trigger Greptile

Comment on lines +702 to +705
// Unified pre-walk: run all applicable visitors in a single DFS per file.
// Time each file's walk and distribute proportionally among active visitors
// so that phase timers (astMs, complexityMs, etc.) reflect real work — not
// just the DB-write tail in delegateToBuildFunctions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 "Proportionally" vs "equally" in comment

The comment says "distribute proportionally among active visitors" but the implementation divides walkMs equally (walkMs / activeCount). Equal distribution is a valid approximation, but "proportionally" implies the share reflects each visitor's actual work — which it doesn't. This could mislead future readers who then expect e.g. a complexity-heavy visitor to show a larger share.

Suggested change
// Unified pre-walk: run all applicable visitors in a single DFS per file.
// Time each file's walk and distribute proportionally among active visitors
// so that phase timers (astMs, complexityMs, etc.) reflect real work — not
// just the DB-write tail in delegateToBuildFunctions.
// Unified pre-walk: run all applicable visitors in a single DFS per file.
// Time each file's walk and distribute equally among active visitors
// so that phase timers (astMs, complexityMs, etc.) reflect real work — not
// just the DB-write tail in delegateToBuildFunctions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in fa5e686 — changed "proportionally" to "equally" in the comment, matching what the code actually does.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Codegraph Impact Analysis

2 functions changed1 callers affected across 1 files

  • delegateToBuildFunctions in src/ast-analysis/engine.ts:616 (1 transitive callers)
  • runAnalyses in src/ast-analysis/engine.ts:671 (0 transitive callers)

- Change "proportionally" to "equally" in walk-distribution comment
- Document that _unifiedWalkMs includes setupVisitors overhead and
  overlaps with per-phase timers (not an additive bucket)
- Add JSDoc to _unifiedWalkMs field in AnalysisTiming interface
@carlos-alm
Copy link
Copy Markdown
Contributor Author

All three Greptile findings addressed in fa5e686:

  1. "proportionally" vs "equally" (inline comment, line 702-705) — Fixed. Comment now says "distribute equally among active visitors."

  2. setupVisitors time in _unifiedWalkMs (lines 706-748) — Added an inline comment above timing._unifiedWalkMs = ... documenting that the field includes setupVisitors overhead and that this small gap between _unifiedWalkMs and the sum of phase timers is expected (setup cost is negligible relative to the DFS walk).

  3. _unifiedWalkMs semantic change (line 748) — Added JSDoc to the _unifiedWalkMs field in AnalysisTiming (types.ts) explaining it now overlaps with per-phase timers and is a diagnostic cross-check, not an additive bucket.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 06208bb into main Apr 5, 2026
13 checks passed
@carlos-alm carlos-alm deleted the fix/benchmark-timing-attribution branch April 5, 2026 07:03
@github-actions github-actions bot locked and limited conversation to collaborators Apr 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant