Skip to content

Add size-aware reporting and safe exclusions for rebuildable runtime/cache paths#11

Merged
gaoguobin merged 5 commits into
mainfrom
codex/github-mention-add-size-aware-backup-policy-and-exclude-reb
May 20, 2026
Merged

Add size-aware reporting and safe exclusions for rebuildable runtime/cache paths#11
gaoguobin merged 5 commits into
mainfrom
codex/github-mention-add-size-aware-backup-policy-and-exclude-reb

Conversation

@gaoguobin
Copy link
Copy Markdown
Owner

@gaoguobin gaoguobin commented May 20, 2026

Motivation

  • Backups were including rebuildable runtime/cache data silently, inflating backup directories and hiding the largest size drivers.
  • The intent is to keep restore safety unchanged while making exclusions explicit and surfacing size information for compact-vs-diagnostic backup decisions.

Description

  • Add size metadata to backup manifests: included bytes, top files by size, top directories by size, excluded entry count, and excluded bytes.
  • Exclude rebuildable Codex runtime/cache paths by default: cache, packages, standalone, and node_modules.
  • Count excluded directory trees for reporting without reading file contents and without following symlinks outside the backup home.
  • Keep main restore and embedded standalone restore exclusions aligned for the Codex profile.
  • Extend backup summary output with included size and excluded totals.
  • Add regression tests for runtime/cache exclusion, excluded subtree accounting, symlink safety, and standalone restore consistency.

Fixes #10


Codex Task

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6dd97b6f8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +648 to +652
if skipped is not None:
entry = {
"relative_path": normalize_relative(relative),
"reason": "excluded",
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count pruned excluded directories in size report

excluded_entries/excluded_bytes are derived from skipped, but this block only appends entries for excluded files that are still reached in file_names. In the same function, excluded directories are removed from dir_names before traversal, so files under newly excluded paths like cache/, packages/, standalone/, and node_modules/ are never visited and therefore never counted. This makes the new size report systematically understate excluded data (often reporting 0 bytes) when exclusions happen at directory level, which defeats the purpose of surfacing size drivers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex address that feedback. Please count files and bytes under pruned excluded directories in the size report without including file contents or changing restore safety behavior.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

  • Implemented excluded-directory accounting inside iter_source_files by detecting directories pruned from dir_names, walking those excluded subtrees, and recording aggregate file_count + bytes in skipped metadata (without reading file contents). This preserves restore behavior while fixing size-report undercounting for excluded directory trees. src/agent_environment_backup/core.pyL624-L708
  • Updated backup size-report aggregation to compute excluded_entries from excluded file counts (using file_count when present, defaulting to 1 for single-file exclusions) and kept excluded_bytes summed from skipped excluded items. src/agent_environment_backup/core.pyL1489-L1523
  • Updated summary output line to report the corrected excluded entry count. src/agent_environment_backup/core.pyL1531-L1540
  • Committed changes on the current branch: 7db7023 with message Count pruned excluded directories in size report. src/agent_environment_backup/core.pyL624-L708

Testing

  • python -m pytest -q

View task →

@gaoguobin
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6f2a02a0db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

config_inspector=inspect_codex_config,
commands=(("codex", "--version"), ("codex", "mcp", "list")),
integration_module="codex_fast_proxy",
extra_excluded_dirs=("cache", "packages", "standalone", "node_modules"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sync standalone restore exclusions with Codex profile

Update the standalone restore script exclusions when adding extra_excluded_dirs for the Codex profile, otherwise restore behavior diverges between restore_environment(...) and restore-standalone.py. With this change, Codex backups now treat cache/packages/standalone/node_modules as excluded, but the embedded PROFILE_EXTRA_EXCLUDED used by the standalone script still omits Codex entries, so applying an older backup that contains those paths will restore runtime/cache content that the main restore path now skips.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex address that feedback. Please sync the standalone restore script profile exclusions with the Codex profile so restore-standalone.py skips cache, packages, standalone, and node_modules consistently with the main restore path. Add or update tests for standalone restore behavior.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

  • Synced the embedded standalone restore script’s profile exclusions by adding a codex entry to PROFILE_EXTRA_EXCLUDED, so restore-standalone.py now skips cache, packages, standalone, and node_modules consistently with the main restore path. src/agent_environment_backup/core.pyL871-L874
  • Added a new regression test that creates a Codex backup, injects files under each excluded directory in the backup payload, runs restore-standalone.py, and asserts those files are not restored. tests/test_core.pyL1296-L1337
  • Committed the change on the current branch as 25c9d47 and created a follow-up PR message via the make_pr tool.

Testing

  • python -m pytest -q tests/test_core.py -k "standalone_restore_excludes_profile_dirs or standalone_restore_excludes_codex_profile_dirs"

View task →

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af395eec6b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

def excluded_subtree_stats(path: Path) -> tuple[int, int]:
file_count = 0
total_bytes = 0
for walk_root, _, walk_files in os.walk(path):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid traversing symlinked excluded directories

excluded_subtree_stats walks every excluded directory with os.walk(path), but when the excluded entry itself is a symlink (for example ~/.codex/cache -> /mnt/cache), os.walk traverses the symlink target as the top path. That makes excluded_entries/excluded_bytes include files outside the backup home and can massively inflate runtime on large external trees, even though those files are never part of the backup. This is introduced by the new size-report scan path and should short-circuit or handle symlinked excluded dirs without recursive walking.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex address that feedback. Please make excluded-directory size accounting avoid traversing symlinked excluded directories. Count the symlink itself as skipped/excluded if useful, but do not recurse into the symlink target or include bytes outside the backup home. Add a regression test with an excluded directory symlink pointing outside the home.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

  • Updated excluded-directory size accounting to short-circuit when an excluded directory entry is a symlink, so it no longer traverses symlink targets; instead it counts the symlink itself (file_count=1) and uses lstat() size when available. This prevents counting bytes outside the backup home while preserving exclusion behavior. src/agent_environment_backup/core.pyL624-L679
  • Added a regression test that creates an excluded cache directory as a symlink to an external directory and verifies excluded accounting does not include external subtree size. tests/test_core.pyL265-L290
  • Committed on the current branch: 969e091 with message Avoid traversing symlinked excluded directories.

Testing

  • python -m pytest -q tests/test_core.py -k "excluded_symlinked_directory_does_not_traverse_outside_home or excludes_runtime_cache_paths"

View task →

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5110da7ab7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agent_environment_backup/core.py Outdated
walk_path = walk_root_path / walk_file
file_count += 1
try:
total_bytes += walk_path.stat().st_size
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use lstat for excluded subtree byte accounting

excluded_subtree_stats uses walk_path.stat().st_size, which follows symlinks for files inside excluded directories. If an excluded directory contains a symlinked file (e.g., cache/link -> /mnt/large.bin), excluded_bytes will include the external target size and can trigger expensive metadata access outside the backup home, so the new size report becomes inaccurate and can slow backups unexpectedly. Use lstat() (or stat(..., follow_symlinks=False)) for this accounting, or skip symlink files entirely.

Useful? React with 👍 / 👎.

@gaoguobin gaoguobin merged commit 84db630 into main May 20, 2026
5 checks passed
@gaoguobin gaoguobin deleted the codex/github-mention-add-size-aware-backup-policy-and-exclude-reb branch May 20, 2026 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add size-aware backup policy and exclude rebuildable runtime/cache paths

1 participant