Summary
The public discover_vscode_logs function (src/copilot_usage/vscode_parser.py, lines 199–235, exported in __all__) always runs the full multi-level Path.glob(_GLOB_PATTERN) traversal on every call. The private _cached_discover_vscode_logs (lines 255–325) implements identical logic with an O(1) steady-state cache that avoids the glob when the root directory and its sentinel child are unchanged.
Because discover_vscode_logs is the documented public entry point, callers using it directly (e.g., listing log paths before deciding whether to call get_vscode_summary) pay full glob cost — O(n_window_dirs) filesystem traversal — on each invocation. get_vscode_summary already bypasses this by calling _cached_discover_vscode_logs internally, creating an inconsistency where the public function is slower than the internal one.
What makes it slow
The glob pattern "*/window*/exthost/GitHub.copilot-chat/GitHub Copilot Chat.log" requires traversing at minimum three directory levels under each candidate root. On a mature VS Code installation with many session directories (each containing multiple window*/ subdirectories), this glob touches hundreds of inodes on every call.
Currently, repeated calls to discover_vscode_logs() with an unchanged directory each pay this full traversal cost. _cached_discover_vscode_logs reduces steady-state cost to two stat calls (root + sentinel child), which is at least an order of magnitude cheaper.
Concrete fix
Make discover_vscode_logs delegate to _cached_discover_vscode_logs:
def discover_vscode_logs(base_path: Path | None = None) -> list[Path]:
"""Find all VS Code Copilot Chat log files.
...
"""
return _cached_discover_vscode_logs(base_path)
_cached_discover_vscode_logs already handles both the base_path=None (default candidates) and base_path != None (explicit directory) cases with identical externally-visible behaviour. The one implementation difference — the uncached function calls candidate.is_dir() while the cached version uses candidate.stat() — produces the same result for valid and missing paths.
Expected improvement
Steady-state cost for repeated calls drops from O(n_inodes) Path.glob traversal to O(1): two stat syscalls (root identity + sentinel child identity). On a VS Code installation with 20 session directories each containing 3 window directories, this eliminates ~60+ directory entries from being traversed per call.
Testing requirement
Add a test to tests/copilot_usage/test_vscode_parser.py:
- Create a temporary VS Code log directory structure (matching
_GLOB_PATTERN) and call discover_vscode_logs(tmp_path) twice with no changes between calls.
- Spy on
Path.glob (or monkeypatch the glob call in vscode_parser) and assert it is called at most once across both calls — the second call must use the cache rather than re-running the glob.
This follows the project's deterministic perf-test convention (call-count assertion, no wall-clock timing), matching the pattern in TestVscodeDiscoveryCache.
Generated by Performance Analysis · ● 4.4M · ◷
Summary
The public
discover_vscode_logsfunction (src/copilot_usage/vscode_parser.py, lines 199–235, exported in__all__) always runs the full multi-levelPath.glob(_GLOB_PATTERN)traversal on every call. The private_cached_discover_vscode_logs(lines 255–325) implements identical logic with an O(1) steady-state cache that avoids the glob when the root directory and its sentinel child are unchanged.Because
discover_vscode_logsis the documented public entry point, callers using it directly (e.g., listing log paths before deciding whether to callget_vscode_summary) pay full glob cost — O(n_window_dirs) filesystem traversal — on each invocation.get_vscode_summaryalready bypasses this by calling_cached_discover_vscode_logsinternally, creating an inconsistency where the public function is slower than the internal one.What makes it slow
The glob pattern
"*/window*/exthost/GitHub.copilot-chat/GitHub Copilot Chat.log"requires traversing at minimum three directory levels under each candidate root. On a mature VS Code installation with many session directories (each containing multiplewindow*/subdirectories), this glob touches hundreds of inodes on every call.Currently, repeated calls to
discover_vscode_logs()with an unchanged directory each pay this full traversal cost._cached_discover_vscode_logsreduces steady-state cost to twostatcalls (root + sentinel child), which is at least an order of magnitude cheaper.Concrete fix
Make
discover_vscode_logsdelegate to_cached_discover_vscode_logs:_cached_discover_vscode_logsalready handles both thebase_path=None(default candidates) andbase_path != None(explicit directory) cases with identical externally-visible behaviour. The one implementation difference — the uncached function callscandidate.is_dir()while the cached version usescandidate.stat()— produces the same result for valid and missing paths.Expected improvement
Steady-state cost for repeated calls drops from O(n_inodes)
Path.globtraversal to O(1): twostatsyscalls (root identity + sentinel child identity). On a VS Code installation with 20 session directories each containing 3 window directories, this eliminates ~60+ directory entries from being traversed per call.Testing requirement
Add a test to
tests/copilot_usage/test_vscode_parser.py:_GLOB_PATTERN) and calldiscover_vscode_logs(tmp_path)twice with no changes between calls.Path.glob(or monkeypatch the glob call invscode_parser) and assert it is called at most once across both calls — the second call must use the cache rather than re-running the glob.This follows the project's deterministic perf-test convention (call-count assertion, no wall-clock timing), matching the pattern in
TestVscodeDiscoveryCache.