Summary
get_vscode_summary calls safe_file_identity(p) (a stat() syscall) for every discovered log file to build current_ids, and only then compares against _vscode_summary_cache.file_ids. O(n_log_files) stat syscalls happen on every call, even when the global summary cache is perfectly valid.
File & Function
src/copilot_usage/vscode_parser.py · get_vscode_summary
What Makes It Slow
logs = _cached_discover_vscode_logs(base_path)
log_ids: list[tuple[Path, tuple[int, int] | None]] = [
(p, safe_file_identity(p)) for p in logs # ← stat() every log file
]
current_ids: frozenset[...] = frozenset(log_ids) # ← O(n) hash construction
if (
_vscode_summary_cache is not None
and _vscode_summary_cache.file_ids == current_ids # ← cache check is AFTER stats
):
return _vscode_summary_cache.summary
_cached_discover_vscode_logs already knows which log files exist via its discovery cache. But the file content identity (mtime + size) is not retained there — so get_vscode_summary re-stats every file to detect changes. When the summary is valid (nothing changed), all those stats are wasted.
A VS Code power user accumulates many log files across multiple window sessions. Fifty files across two VS Code installs is easily reached after a week of active use; at 5 µs/stat on a local SSD that is 250 µs wasted per call, and proportionally more on network filesystems.
Concrete Fix
Extend _VSCodeDiscoveryCache to also store the per-log-file identities computed during discovery. Have _cached_discover_vscode_logs return them alongside the paths so get_vscode_summary can build current_ids from cached data without extra stat calls.
`@dataclass`(slots=True)
class _VSCodeDiscoveryCache:
root_id: tuple[int, int]
child_ids: _ChildIds
log_paths: tuple[Path, ...]
log_file_ids: tuple[tuple[int, int] | None, ...] # ← added: one entry per log_path
When _cached_discover_vscode_logs runs the glob (cache miss), it stats each found log file and stores the results alongside the paths. On the next call (cache hit), the stored log_file_ids are returned without any additional stat() calls.
get_vscode_summary then receives (log_paths, log_file_ids) from _cached_discover_vscode_logs and builds current_ids directly from the cached IDs:
logs, log_file_ids = _cached_discover_vscode_logs(base_path)
current_ids = frozenset(zip(logs, log_file_ids))
# no safe_file_identity() calls needed
On a discovery cache miss (e.g., new log file appeared), the stat calls are unavoidable — they produce the data that fills the cache. On a cache hit, all stat calls are eliminated.
Alternatively, as a simpler intermediate fix without changing the return type: if _vscode_summary_cache is not None and the set of discovered log paths (ignoring IDs) matches _vscode_summary_cache.file_ids, stat the files lazily — skip the stat loop entirely and compare frozensets based on the stored IDs. This is only possible if we first check whether the path sets are equal.
Expected Improvement
For a user with 50 VS Code log files, steady-state cost drops from 50 stat() calls per invocation to 0 (discovery cache hit path). The savings are proportional to the number of log files and compound with the sibling fix for _scan_child_ids.
Testing Requirement
Monkeypatch safe_file_identity and count calls to it (excluding the root/child discovery path). After warming the caches with one call, a second call with unchanged files must invoke safe_file_identity for individual log files 0 times:
def test_get_vscode_summary_no_log_stats_on_cache_hit(tmp_path, monkeypatch):
# create log files and warm both discovery cache and summary cache
...
stat_calls: list[Path] = []
original = vscode_parser.safe_file_identity
def spy(p: Path) -> tuple[int, int] | None:
stat_calls.append(p)
return original(p)
monkeypatch.setattr(vscode_parser, "safe_file_identity", spy)
get_vscode_summary(tmp_path)
log_file_stats = [p for p in stat_calls if p.suffix == ".log"]
assert log_file_stats == [], "log files must not be stat'd on a warm cache hit"
Generated by Performance Analysis · ● 3.9M · ◷
Summary
get_vscode_summarycallssafe_file_identity(p)(astat()syscall) for every discovered log file to buildcurrent_ids, and only then compares against_vscode_summary_cache.file_ids. O(n_log_files) stat syscalls happen on every call, even when the global summary cache is perfectly valid.File & Function
src/copilot_usage/vscode_parser.py·get_vscode_summaryWhat Makes It Slow
_cached_discover_vscode_logsalready knows which log files exist via its discovery cache. But the file content identity (mtime + size) is not retained there — soget_vscode_summaryre-stats every file to detect changes. When the summary is valid (nothing changed), all those stats are wasted.A VS Code power user accumulates many log files across multiple window sessions. Fifty files across two VS Code installs is easily reached after a week of active use; at 5 µs/stat on a local SSD that is 250 µs wasted per call, and proportionally more on network filesystems.
Concrete Fix
Extend
_VSCodeDiscoveryCacheto also store the per-log-file identities computed during discovery. Have_cached_discover_vscode_logsreturn them alongside the paths soget_vscode_summarycan buildcurrent_idsfrom cached data without extra stat calls.When
_cached_discover_vscode_logsruns the glob (cache miss), it stats each found log file and stores the results alongside the paths. On the next call (cache hit), the storedlog_file_idsare returned without any additionalstat()calls.get_vscode_summarythen receives(log_paths, log_file_ids)from_cached_discover_vscode_logsand buildscurrent_idsdirectly from the cached IDs:On a discovery cache miss (e.g., new log file appeared), the stat calls are unavoidable — they produce the data that fills the cache. On a cache hit, all stat calls are eliminated.
Alternatively, as a simpler intermediate fix without changing the return type: if
_vscode_summary_cacheis not None and the set of discovered log paths (ignoring IDs) matches_vscode_summary_cache.file_ids, stat the files lazily — skip the stat loop entirely and compare frozensets based on the stored IDs. This is only possible if we first check whether the path sets are equal.Expected Improvement
For a user with 50 VS Code log files, steady-state cost drops from 50 stat() calls per invocation to 0 (discovery cache hit path). The savings are proportional to the number of log files and compound with the sibling fix for
_scan_child_ids.Testing Requirement
Monkeypatch
safe_file_identityand count calls to it (excluding the root/child discovery path). After warming the caches with one call, a second call with unchanged files must invokesafe_file_identityfor individual log files 0 times: