Skip to content

[aw][perf] vscode_parser: _finalize_summary creates redundant dict copies that VSCodeLogSummary.__post_init__ immediately re [Content truncated due to length] #904

@microsasa

Description

@microsasa

_finalize_summary in src/copilot_usage/vscode_parser.py calls dict() on each of the four accumulator mapping fields before passing them to VSCodeLogSummary(...). VSCodeLogSummary.__post_init__ then immediately calls dict() on all four fields again before wrapping them in MappingProxyType. This results in two dict() copies per mapping field (eight total) when one would suffice.

Location

File: src/copilot_usage/vscode_parser.py
Function: _finalize_summary (lines ~558–569) and VSCodeLogSummary.__post_init__ (lines ~84–103)

# _finalize_summary
def _finalize_summary(acc: _SummaryAccumulator) -> VSCodeLogSummary:
    return VSCodeLogSummary(
        ...
        requests_by_model=dict(acc.requests_by_model),      # copy 1
        duration_by_model=dict(acc.duration_by_model),      # copy 1
        requests_by_category=dict(acc.requests_by_category),# copy 1
        requests_by_date=dict(acc.requests_by_date),        # copy 1
        ...
    )

# VSCodeLogSummary.__post_init__
def __post_init__(self) -> None:
    _wrap = types.MappingProxyType
    if self.requests_by_model is not _EMPTY_MAPPING:
        object.__setattr__(
            self, "requests_by_model", _wrap(dict(self.requests_by_model))  # copy 2
        )
    # ... same pattern for the other three fields

What makes it slow

_finalize_summary is called once per changed log file in get_vscode_summary (per-file partial summary) and once for the overall summary on any call with a cache miss. Each call allocates four plain dicts from the accumulator, then __post_init__ immediately discards and re-creates them.

For a VS Code session spanning D days with M models and N categories, the redundant copies cost O(D + 2M + N) extra dict-item copies per _finalize_summary call. For a year-long session, requests_by_date alone has ~365 entries, making the redundant copy non-trivial.

Fix

Remove the dict() wrappers from _finalize_summary and pass the accumulator defaultdict fields directly. VSCodeLogSummary.__post_init__ already converts them to plain dicts via dict(self.requests_by_model), so each field is copied exactly once:

def _finalize_summary(acc: _SummaryAccumulator) -> VSCodeLogSummary:
    return VSCodeLogSummary(
        ...
        requests_by_model=acc.requests_by_model,      # pass defaultdict; __post_init__ copies once
        duration_by_model=acc.duration_by_model,
        requests_by_category=acc.requests_by_category,
        requests_by_date=acc.requests_by_date,
        ...
    )

__post_init__'s dict(self.requests_by_model) already handles defaultdict → dict conversion. Copy count drops from 2× to 1× per field (four fewer full dict copies per VSCodeLogSummary construction).

Testing requirement

Add or extend a unit test that calls _finalize_summary (or build_vscode_summary) with an accumulator containing entries across multiple dates and models. Assert that the resulting VSCodeLogSummary fields contain the correct counts. This verifies that the single-copy path through __post_init__ produces the same result as the former two-copy path.

Generated by Performance Analysis · ● 3.8M ·

Metadata

Metadata

Assignees

No one assigned

    Labels

    awCreated by agentic workflowaw-dispatchedIssue has been dispatched to implementerperfPerformance improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions