[aw][perf] vscode_parser: `get_vscode_summary` stats every log file on every call before checking the global summary cache

## Summary

`get_vscode_summary` calls `safe_file_identity(p)` (a `stat()` syscall) for **every** discovered log file to build `current_ids`, and only then compares against `_vscode_summary_cache.file_ids`. O(n\_log\_files) stat syscalls happen on every call, even when the global summary cache is perfectly valid.

## File & Function

`src/copilot_usage/vscode_parser.py` · `get_vscode_summary`

## What Makes It Slow

```python
logs = _cached_discover_vscode_logs(base_path)
log_ids: list[tuple[Path, tuple[int, int] | None]] = [
    (p, safe_file_identity(p)) for p in logs   # ← stat() every log file
]
current_ids: frozenset[...] = frozenset(log_ids)  # ← O(n) hash construction

if (
    _vscode_summary_cache is not None
    and _vscode_summary_cache.file_ids == current_ids  # ← cache check is AFTER stats
):
    return _vscode_summary_cache.summary
```

`_cached_discover_vscode_logs` already knows which log files exist via its discovery cache. But the file **content** identity (mtime + size) is not retained there — so `get_vscode_summary` re-stats every file to detect changes. When the summary is valid (nothing changed), all those stats are wasted.

A VS Code power user accumulates many log files across multiple window sessions. Fifty files across two VS Code installs is easily reached after a week of active use; at 5 µs/stat on a local SSD that is 250 µs wasted per call, and proportionally more on network filesystems.

## Concrete Fix

Extend `_VSCodeDiscoveryCache` to also store the per-log-file identities computed during discovery. Have `_cached_discover_vscode_logs` return them alongside the paths so `get_vscode_summary` can build `current_ids` from cached data without extra stat calls.

```python
`@dataclass`(slots=True)
class _VSCodeDiscoveryCache:
    root_id: tuple[int, int]
    child_ids: _ChildIds
    log_paths: tuple[Path, ...]
    log_file_ids: tuple[tuple[int, int] | None, ...]  # ← added: one entry per log_path
```

When `_cached_discover_vscode_logs` runs the glob (cache miss), it stats each found log file and stores the results alongside the paths. On the next call (cache hit), the stored `log_file_ids` are returned without any additional `stat()` calls.

`get_vscode_summary` then receives `(log_paths, log_file_ids)` from `_cached_discover_vscode_logs` and builds `current_ids` directly from the cached IDs:

```python
logs, log_file_ids = _cached_discover_vscode_logs(base_path)
current_ids = frozenset(zip(logs, log_file_ids))
# no safe_file_identity() calls needed
```

On a discovery cache miss (e.g., new log file appeared), the stat calls are unavoidable — they produce the data that fills the cache. On a cache hit, all stat calls are eliminated.

Alternatively, as a simpler intermediate fix without changing the return type: if `_vscode_summary_cache` is not None and the set of discovered log paths (ignoring IDs) matches `_vscode_summary_cache.file_ids`, stat the files lazily — skip the stat loop entirely and compare frozensets based on the stored IDs. This is only possible if we first check whether the path sets are equal.

## Expected Improvement

For a user with 50 VS Code log files, steady-state cost drops from **50 stat() calls per invocation** to **0** (discovery cache hit path). The savings are proportional to the number of log files and compound with the sibling fix for `_scan_child_ids`.

## Testing Requirement

Monkeypatch `safe_file_identity` and count calls to it (excluding the root/child discovery path). After warming the caches with one call, a second call with unchanged files must invoke `safe_file_identity` for individual log files **0 times**:

```python
def test_get_vscode_summary_no_log_stats_on_cache_hit(tmp_path, monkeypatch):
    # create log files and warm both discovery cache and summary cache
    ...
    stat_calls: list[Path] = []
    original = vscode_parser.safe_file_identity
    def spy(p: Path) -> tuple[int, int] | None:
        stat_calls.append(p)
        return original(p)
    monkeypatch.setattr(vscode_parser, "safe_file_identity", spy)
    get_vscode_summary(tmp_path)
    log_file_stats = [p for p in stat_calls if p.suffix == ".log"]
    assert log_file_stats == [], "log files must not be stat'd on a warm cache hit"
```




> Generated by [Performance Analysis](https://github.com/microsasa/cli-tools/actions/runs/24233054810/agentic_workflow) · ● 3.9M · [◷](https://github.com/search?q=repo%3Amicrosasa%2Fcli-tools+is%3Aissue+%22gh-aw-workflow-call-id%3A+microsasa%2Fcli-tools%2Fperf-analysis%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aw][perf] vscode_parser: `get_vscode_summary` stats every log file on every call before checking the global summary cache #898

Summary

File & Function

What Makes It Slow

Concrete Fix

Expected Improvement

Testing Requirement

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[aw][perf] vscode_parser: get_vscode_summary stats every log file on every call before checking the global summary cache #898

Description

Summary

File & Function

What Makes It Slow

Concrete Fix

Expected Improvement

Testing Requirement

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

[aw][perf] vscode_parser: `get_vscode_summary` stats every log file on every call before checking the global summary cache #898