Conversation
#897) Reorder _cached_discover_vscode_logs so the cached root_id (st_mtime_ns, st_size) is checked before calling _scan_child_ids. On a steady-state cache hit the child scan is now skipped entirely, reducing per-call cost from O(n_children) stat syscalls to a single stat on the root directory. On Linux (ext4/btrfs/overlayfs) and macOS (APFS), creating or removing a subdirectory always updates the parent directory's mtime, so root_id is sufficient to detect child-directory changes. _scan_child_ids still runs on every cache miss as an extra safety net. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR optimizes VS Code log discovery by changing _cached_discover_vscode_logs to skip _scan_child_ids on cache hits, aiming to reduce steady-state discovery overhead in copilot_usage.vscode_parser.
Changes:
- Reordered
_cached_discover_vscode_logsto checkroot_idbefore scanning child directories, and to skip_scan_child_idson cache hits. - Updated docstrings/comments to describe O(1) steady-state behavior.
- Adjusted and added tests to assert no re-glob and no child-scan on a
root_idcache hit.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/copilot_usage/vscode_parser.py | Changes cache-hit criteria to rely on root_id alone and updates documentation accordingly. |
| tests/copilot_usage/test_vscode_parser.py | Updates expectations around rediscovery and adds a test ensuring _scan_child_ids is not invoked on cache hits. |
Add O(1) sentinel check for newest child directory to detect new window directories under existing sessions, while keeping the root_id fast path. The cache now stores the most recently modified child dir at population time and re-stats it on hits — catching sub-directory changes without rescanning all children. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Commit pushed:
|
- Add deterministic tie-breaker (child name) to _newest_child_from_ids to stabilize sentinel selection when multiple children share the same st_mtime_ns. - Document the sentinel cache limitation: only changes under the cached newest child are detected; modifications to older session directories may return stale log_paths until the root changes or cache is cleared. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Commit pushed:
|
Contributor
There was a problem hiding this comment.
Medium-impact performance optimization with good test coverage. Auto-approving for merge.
What was evaluated:
vscode_parser.py: Cache hit path refactored from O(n_children)_scan_child_idsscan to O(1) sentinel-based check (2 stat calls: root + newest child). Trade-off (may miss changes in non-newest children) is well-documented and acceptable._newest_child_from_ids: New helper with correct type annotations, deterministic tie-breaking, proper None handling._VSCodeDiscoveryCache: Extended withnewest_child_path/newest_child_idsentinel fields.- Tests: New
TestCachedDiscoverSkipsChildScanOnHitclass uses monkeypatched spy to verify the optimization. Existing tests updated for new dataclass fields. - All 8 CI checks passing. Follows coding guidelines (strict typing, docstrings,
safe_file_identitypattern).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the O(n_children) steady-state cost in
_cached_discover_vscode_logsby checking the cachedroot_idbefore calling_scan_child_ids.On a cache hit (root
(st_mtime_ns, st_size)unchanged), the child scan is now skipped entirely — reducing per-call cost from ~201 stat syscalls + 1 scandir to a singlestat()for a VS Code installation with 200 session directories.Changes
src/copilot_usage/vscode_parser.py_cached_discover_vscode_logs: checkroot_idagainst cache before calling_scan_child_ids_scan_child_idsonly runs on cache misses (still used as safety net when repopulating)tests/copilot_usage/test_vscode_parser.pyTestCachedDiscoverSkipsChildScanOnHitwith monkeypatch spy verifying_scan_child_idsis never called on a root_id cache hittest_new_window_under_existing_session_uses_cacheto reflect the new behavior (child-only changes do not trigger rediscovery when root mtime is unchanged)Closes #897