-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Phase 5 - Backup Automation #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Complete backup framework with: **Python Backup Manager:** - BackupManager: Full/incremental/differential backups with tar/gzip - RetentionManager: Automated cleanup with daily/weekly/monthly/yearly retention - BackupVerifier: Integrity checks with SHA256 checksums - Models: BackupConfig, BackupJob, BackupResult, BackupMetadata **Features:** - Metadata storage in JSON - Pre/post backup script support - Exclude patterns - Restore with path traversal protection - Checksum validation **Bash Scripts:** - backup-files.sh: Directory backup with compression Includes 18 unit tests covering all core functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Reviewer's GuideImplements a complete backup automation framework including a Python backup manager with metadata and checksum handling, a retention manager that enforces time-based retention policies, a verifier for integrity checks, a Bash helper script, and unit tests plus documentation for the new components. Sequence diagram for backup creation with metadata and scriptssequenceDiagram
actor User
participant BackupManager
participant FileSystem
User->>BackupManager: create_backup(config)
activate BackupManager
BackupManager->>BackupManager: create BackupJob(status RUNNING)
alt pre_backup_script configured
BackupManager->>FileSystem: execute pre_backup_script
FileSystem-->>BackupManager: script exit code
end
BackupManager->>BackupManager: _generate_backup_filename(config)
BackupManager->>FileSystem: ensure destination_path exists
FileSystem-->>BackupManager: directory ready
alt backup_type FULL
BackupManager->>BackupManager: _create_full_backup(config, backup_file)
else backup_type INCREMENTAL
BackupManager->>BackupManager: _create_incremental_backup(config, backup_file)
else backup_type DIFFERENTIAL
BackupManager->>BackupManager: _create_differential_backup(config, backup_file)
end
BackupManager->>FileSystem: stat backup_file
FileSystem-->>BackupManager: size_bytes
BackupManager->>BackupManager: _count_files_in_archive(backup_file)
BackupManager->>BackupManager: _save_metadata(job, config)
BackupManager->>BackupManager: _calculate_checksum(backup_file)
BackupManager->>FileSystem: write metadata JSON
alt post_backup_script configured
BackupManager->>FileSystem: execute post_backup_script
FileSystem-->>BackupManager: script exit code
end
BackupManager->>BackupManager: update job status COMPLETED
BackupManager-->>User: BackupResult(success True, job)
deactivate BackupManager
Sequence diagram for retention cleanup and verification workflowsequenceDiagram
actor Operator
participant RetentionManager
participant BackupVerifier
participant FileSystem
Operator->>RetentionManager: cleanup_old_backups(backup_dir, config_name, dry_run)
activate RetentionManager
RetentionManager->>FileSystem: find *.json metadata
FileSystem-->>RetentionManager: list of metadata files
RetentionManager->>RetentionManager: parse BackupMetadata and sort
RetentionManager->>RetentionManager: apply_retention(backups, dry_run)
RetentionManager-->>Operator: stats(kept, deleted, would_delete, total_size_freed)
deactivate RetentionManager
Operator->>BackupVerifier: verify_all_backups(backup_dir)
activate BackupVerifier
BackupVerifier->>FileSystem: scan for *.tar* archives
loop for each backup_file
BackupVerifier->>BackupVerifier: verify_backup(backup_file)
BackupVerifier->>FileSystem: read metadata JSON
FileSystem-->>BackupVerifier: BackupMetadata
BackupVerifier->>BackupVerifier: _calculate_checksum(backup_file)
BackupVerifier->>BackupVerifier: _test_extraction(backup_file)
BackupVerifier->>BackupVerifier: build VerificationResult
end
BackupVerifier-->>Operator: dict backup_id -> VerificationResult
deactivate BackupVerifier
Class diagram for backup automation managers and modelsclassDiagram
class BackupType {
<<enumeration>>
FULL
INCREMENTAL
DIFFERENTIAL
}
class BackupStatus {
<<enumeration>>
PENDING
RUNNING
COMPLETED
FAILED
VERIFIED
}
class RetentionPolicy {
+int keep_daily
+int keep_weekly
+int keep_monthly
+int keep_yearly
+int min_backups
+validate_positive(v int) int
}
class BackupConfig {
+str name
+Path source_path
+Path destination_path
+BackupType backup_type
+bool compression
+bool encryption
+RetentionPolicy retention_policy
+list~str~ exclude_patterns
+str schedule_cron
+bool enabled
+Path pre_backup_script
+Path post_backup_script
+str notification_email
}
class BackupJob {
+str job_id
+str config_name
+BackupType backup_type
+datetime started_at
+datetime finished_at
+BackupStatus status
+Path backup_file
+int size_bytes
+int files_count
+str error_message
+float duration_seconds
+is_running() bool
+is_completed() bool
+is_failed() bool
}
class BackupResult {
+bool success
+BackupJob job
+str message
+list~str~ warnings
}
class BackupMetadata {
+str backup_id
+datetime created_at
+str config_name
+BackupType backup_type
+str source_path
+str hostname
+int size_bytes
+int files_count
+bool compression
+bool encryption
+str checksum
+str parent_backup_id
}
class VerificationResult {
+Path backup_file
+bool is_valid
+bool can_extract
+bool checksum_match
+int size_bytes
+datetime verified_at
+list~str~ errors
}
class BackupManager {
-Path work_dir
+BackupManager(work_dir Path)
+create_backup(config BackupConfig) BackupResult
+list_backups(config_name str) list~BackupMetadata~
+restore_backup(backup_file Path, restore_path Path, overwrite bool) bool
-_create_full_backup(config BackupConfig, output_file Path) void
-_create_incremental_backup(config BackupConfig, output_file Path) void
-_create_differential_backup(config BackupConfig, output_file Path) void
-_generate_backup_filename(config BackupConfig) Path
-_count_files_in_archive(archive_path Path) int
-_save_metadata(job BackupJob, config BackupConfig) void
-_calculate_checksum(file_path Path) str
-_run_script(script_path Path) void
}
class RetentionManager {
-RetentionPolicy policy
+RetentionManager(policy RetentionPolicy)
+apply_retention(backups list~BackupMetadata~, dry_run bool) tuple
+cleanup_old_backups(backup_dir Path, config_name str, dry_run bool) dict
-_get_backups_in_range(backups list~BackupMetadata~, start datetime, end datetime) set~str~
-_get_weekly_backups(backups list~BackupMetadata~, start datetime, end datetime) set~str~
-_get_monthly_backups(backups list~BackupMetadata~, start datetime, end datetime) set~str~
-_get_yearly_backups(backups list~BackupMetadata~, start datetime, end datetime) set~str~
-_find_backup_file(backup_dir Path, backup_id str) Path
}
class BackupVerifier {
+verify_backup(backup_file Path) VerificationResult
+verify_all_backups(backup_dir Path) dict~str,VerificationResult~
-_calculate_checksum(file_path Path) str
-_test_extraction(backup_file Path) bool
}
BackupConfig --> RetentionPolicy : has
BackupJob --> BackupType : uses
BackupJob --> BackupStatus : uses
BackupResult --> BackupJob : wraps
BackupMetadata --> BackupType : uses
VerificationResult --> BackupMetadata : relates via metadata
BackupManager --> BackupConfig : input
BackupManager --> BackupJob : creates
BackupManager --> BackupResult : returns
BackupManager --> BackupMetadata : persists
RetentionManager --> RetentionPolicy : configured_with
RetentionManager --> BackupMetadata : manages
BackupVerifier --> BackupMetadata : reads
BackupVerifier --> VerificationResult : returns
Flow diagram for cron, Bash script, and Python backup systemflowchart LR
subgraph System
Cron["Cron scheduler"]
BashScript["backup-files.sh
Bash backup script"]
PythonApp["Python backup_manager
(BackupManager, RetentionManager, BackupVerifier)"]
FS["Filesystem
/var/backups or custom dir"]
end
Cron -->|daily backup| BashScript
Cron -->|weekly cleanup| PythonApp
BashScript -->|create tar.gz + .sha256| FS
PythonApp -->|create .tar/.tar.gz + metadata .json| FS
PythonApp -->|list, verify, cleanup backups| FS
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Caution Review failedThe pull request is closed. WalkthroughThis pull request introduces a complete backup automation system comprising Python package infrastructure with Pydantic data models, a BackupManager orchestrating backup workflows, a RetentionManager enforcing retention policies, a BackupVerifier validating backup integrity, supporting Bash scripts, example configuration, and comprehensive unit tests. Changes
Sequence DiagramssequenceDiagram
actor User
participant Manager as BackupManager
participant Scripts as Pre/Post Scripts
participant Archive as Tar Archive
participant Metadata as JSON Metadata
participant FS as File System
User->>Manager: create_backup(config)
alt
Manager->>Scripts: _run_script(pre_backup)
Scripts-->>Manager: ✓ Complete
end
Manager->>Archive: _create_full_backup()
Archive->>FS: Read source paths
FS-->>Archive: Files (excluding patterns)
Archive-->>Manager: Archive created
Manager->>Manager: _calculate_checksum()
Manager->>Manager: _count_files_in_archive()
Manager->>Metadata: _save_metadata(job, config)
Metadata->>FS: Write .json metadata
alt
Manager->>Scripts: _run_script(post_backup)
Scripts-->>Manager: ✓ Complete
end
Manager-->>User: BackupResult (success)
sequenceDiagram
participant Manager as BackupManager
participant Verifier as BackupVerifier
participant Archive as Tar Archive
participant Metadata as JSON Metadata
participant Temp as Temp Directory
Manager->>Verifier: verify_backup(backup_file)
Verifier->>Verifier: Check file exists
alt File missing
Verifier-->>Manager: VerificationResult (invalid)
end
alt Metadata exists
Verifier->>Metadata: Load .json
Verifier->>Verifier: _calculate_checksum()
Verifier->>Verifier: Compare with metadata
end
Verifier->>Archive: Open as tar
alt Archive corrupted
Verifier-->>Manager: VerificationResult (invalid)
end
Verifier->>Verifier: _test_extraction()
Verifier->>Temp: Extract sample members
Temp-->>Verifier: ✓ Extracted
Verifier-->>Manager: VerificationResult (valid)
sequenceDiagram
participant Retention as RetentionManager
participant Manager as BackupManager
participant FS as File System
participant Metadata as JSON Metadata
Retention->>Manager: cleanup_old_backups(backup_dir, config_name)
Manager->>FS: Scan backup_dir
FS-->>Manager: List of backup files
Manager->>Metadata: Load .json for each backup
Metadata-->>Manager: BackupMetadata[]
Manager->>Retention: apply_retention(backups)
Retention->>Retention: Filter by retention policy<br/>(daily, weekly, monthly, yearly)
Retention->>Retention: Group by time windows<br/>keep max 1 per window
Retention->>Retention: Enforce min_backups threshold
Retention-->>Manager: (kept[], deleted[])
alt dry_run = false
Manager->>FS: Delete backup files (deleted[])
Manager->>FS: Delete .json metadata (deleted[])
FS-->>Manager: ✓ Cleaned
end
Manager-->>Retention: {kept: N, deleted: M, freed: bytes}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
Poem
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (11)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
Blocking issues:
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
General comments:
- In
RetentionManager.cleanup_old_backups,backup_filecan beNone(or fail the existence check) but is still used unconditionally to constructmetadata_file, which will raise if_find_backup_filereturnsNone; guard this usage or derive the metadata path from the existingmetadata_fileinstead. - In
BackupManager.restore_backup, theoverwriteparameter is currently unused; either implement overwrite semantics (e.g., skip/replace existing paths) or remove the parameter to avoid a misleading API. - The separation between
BackupManager.work_dir(used for listing backups) andBackupConfig.destination_path(where backups/metadata are written) can causelist_backupsand retention cleanup to miss backups when these differ; consider using a single authoritative backup root or threadingdestination_pathconsistently into listing/cleanup routines.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `RetentionManager.cleanup_old_backups`, `backup_file` can be `None` (or fail the existence check) but is still used unconditionally to construct `metadata_file`, which will raise if `_find_backup_file` returns `None`; guard this usage or derive the metadata path from the existing `metadata_file` instead.
- In `BackupManager.restore_backup`, the `overwrite` parameter is currently unused; either implement overwrite semantics (e.g., skip/replace existing paths) or remove the parameter to avoid a misleading API.
- The separation between `BackupManager.work_dir` (used for listing backups) and `BackupConfig.destination_path` (where backups/metadata are written) can cause `list_backups` and retention cleanup to miss backups when these differ; consider using a single authoritative backup root or threading `destination_path` consistently into listing/cleanup routines.
## Individual Comments
### Comment 1
<location> `5-backup-automation/backup_manager/retention.py:218-223` </location>
<code_context>
+ """
+ # Metadata fájlok keresése
+ backups = []
+ for metadata_file in backup_dir.rglob("*.json"):
+ try:
+ metadata = BackupMetadata.model_validate_json(
+ metadata_file.read_text()
+ )
+ if config_name is None or metadata.config_name == config_name:
+ backups.append((metadata, metadata_file))
+ except Exception:
</code_context>
<issue_to_address>
**issue (bug_risk):** The metadata path discovered here is later ignored, and `cleanup_old_backups` reconstructs it from `backup_file`, which can be None.
This can raise when `_find_backup_file` returns `None`, and it also ignores the already-known `metadata_file` path in favor of a naming convention that might drift. Consider storing and reusing the original `metadata_file` path for deletion, and explicitly handle a `None` `backup_file` when removing the archive itself.
</issue_to_address>
### Comment 2
<location> `5-backup-automation/backup_manager/manager.py:283` </location>
<code_context>
+ backups = []
+
+ # Összes metadata fájl keresése
+ for metadata_file in self.work_dir.rglob("*.json"):
+ try:
+ metadata = BackupMetadata.model_validate_json(
</code_context>
<issue_to_address>
**question (bug_risk):** The listing of backups searches `self.work_dir`, but metadata is written next to `destination_path` in `_save_metadata`.
This means backups whose `destination_path` is outside `work_dir` won’t be listed. Either ensure `destination_path` is always under `work_dir`, or change `list_backups` to search the actual backup destinations (or accept a directory parameter) so the lookup matches where metadata is stored.
</issue_to_address>
### Comment 3
<location> `5-backup-automation/backup_manager/retention.py:33-42` </location>
<code_context>
+ """
+ self.policy = policy
+
+ def apply_retention(
+ self, backups: list[BackupMetadata], dry_run: bool = False
+ ) -> tuple[list[BackupMetadata], list[BackupMetadata]]:
+ """
+ Megőrzési szabály alkalmazása.
+
+ Args:
+ backups: Mentések listája (legújabb elöl).
+ dry_run: Csak szimulálás, nem töröl.
+
+ Returns:
+ Tuple (megőrzendő mentések, törlendő mentések).
+ """
+ if len(backups) <= self.policy.min_backups:
+ return backups, []
+
</code_context>
<issue_to_address>
**nitpick (bug_risk):** `dry_run` parameter is unused in retention logic and could mislead callers.
`apply_retention` accepts `dry_run` but never uses it. Since `cleanup_old_backups` already prevents deletions on dry runs, either drop `dry_run` from this method or implement the same semantics here (especially if this function may later mutate state). An unused flag implies behavior that doesn’t exist and can mislead callers.
</issue_to_address>
### Comment 4
<location> `5-backup-automation/scripts/backup-files.sh:53-59` </location>
<code_context>
+echo "Destination: $BACKUP_FILE"
+
+# Create backup / Mentés létrehozása
+tar -czf "$BACKUP_FILE" \
+ --exclude='*.tmp' \
+ --exclude='*.log' \
+ --exclude='.cache' \
+ --exclude='node_modules' \
+ -C "$(dirname "$SOURCE_DIR")" \
+ "$(basename "$SOURCE_DIR")" 2>&1 | grep -v "Removing leading"
+
+# Calculate size / Méret számítása
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `grep` in the tar pipeline with `set -euo pipefail` can cause the script to fail even on successful backups.
With `set -euo pipefail`, the `tar` stderr pipe into `grep -v
</issue_to_address>
### Comment 5
<location> `5-backup-automation/backup_manager/retention.py:199` </location>
<code_context>
+
+ return set(yearly_backups.values())
+
+ def cleanup_old_backups(
+ self,
+ backup_dir: Path,
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring metadata handling, periodic backup selection, and the `apply_retention` API to remove redundant work and make the retention logic clearer and more focused.
You can reduce complexity and I/O in a few focused places without changing behavior.
### 1. Avoid double `rglob` + duplicate JSON parsing
You already load all metadata in `cleanup_old_backups`, then `_find_backup_file` walks the tree again and re-parses JSON. You can derive the backup file path once when scanning and carry that forward.
Example refactor:
```python
def cleanup_old_backups(
self,
backup_dir: Path,
config_name: Optional[str] = None,
dry_run: bool = False,
) -> dict[str, int]:
backups: list[tuple[BackupMetadata, Path, Path]] = []
for metadata_file in backup_dir.rglob("*.json"):
try:
metadata = BackupMetadata.model_validate_json(metadata_file.read_text())
if config_name is None or metadata.config_name == config_name:
backup_file = self._backup_file_from_metadata_path(metadata_file)
backups.append((metadata, metadata_file, backup_file))
except Exception:
continue
backups.sort(key=lambda x: x[0].created_at, reverse=True)
backup_objects = [b[0] for b in backups]
to_keep, to_delete = self.apply_retention(backup_objects, dry_run)
ids_to_delete = {b.backup_id for b in to_delete}
deleted_count = 0
total_size_freed = 0
if not dry_run:
for metadata, metadata_file, backup_file in backups:
if metadata.backup_id not in ids_to_delete:
continue
if backup_file.exists():
total_size_freed += backup_file.stat().st_size
backup_file.unlink()
deleted_count += 1
if metadata_file.exists():
metadata_file.unlink()
return {
"kept": len(to_keep),
"deleted": len(to_delete) if not dry_run else 0,
"would_delete": len(to_delete) if dry_run else 0,
"total_size_freed": total_size_freed,
}
```
Then replace `_find_backup_file` with a deterministic helper used only at discovery time:
```python
def _backup_file_from_metadata_path(self, metadata_file: Path) -> Path:
# metadata: backup.tar.json -> backup.tar
# metadata: backup.tar.gz.json -> backup.tar.gz
base = metadata_file.with_suffix("") # strip .json
if base.suffix == ".tar":
return base
if base.with_suffix("").suffix == ".tar":
# backup.tar.gz.json -> backup.tar.gz
return base.with_suffix("")
# Fallback: if future formats appear, just drop .json
return base
```
This removes the second `rglob` and duplicated JSON parsing, and makes the mapping from metadata → archive explicit.
### 2. Collapse weekly/monthly/yearly selection
The three helpers are structurally identical: range filter + grouping key + first per group. You can factor that into one generic helper to reduce surface area:
```python
def _get_periodic_backups(
self,
backups: list[BackupMetadata],
start: datetime,
end: datetime,
key_func,
) -> set[str]:
selected: dict[tuple, str] = {}
for backup in backups:
if not (start <= backup.created_at <= end):
continue
key = key_func(backup.created_at)
if key not in selected:
selected[key] = backup.backup_id
return set(selected.values())
def _get_weekly_backups(self, backups, start, end) -> set[str]:
return self._get_periodic_backups(
backups,
start,
end,
key_func=lambda dt: (dt.year, dt.isocalendar()[1]),
)
def _get_monthly_backups(self, backups, start, end) -> set[str]:
return self._get_periodic_backups(
backups,
start,
end,
key_func=lambda dt: (dt.year, dt.month),
)
def _get_yearly_backups(self, backups, start, end) -> set[str]:
return self._get_periodic_backups(
backups,
start,
end,
key_func=lambda dt: (dt.year,),
)
```
Behavior stays identical, but the rule is centralized and easier to tweak.
### 3. Remove unused `dry_run` in `apply_retention`
`apply_retention` is pure and ignores `dry_run`. Keeping the parameter suggests side effects that don’t exist.
You can simplify the API and eliminate the misleading abstraction:
```python
def apply_retention(
self, backups: list[BackupMetadata]
) -> tuple[list[BackupMetadata], list[BackupMetadata]]:
...
```
And update the call site:
```python
to_keep, to_delete = self.apply_retention(backup_objects)
```
If you do plan to add side effects at the `apply_retention` layer later, wiring `dry_run` into actual behavior now would also resolve the confusion, but as-is the simplest option is to drop it.
</issue_to_address>
### Comment 6
<location> `5-backup-automation/backup_manager/manager.py:263-268` </location>
<code_context>
subprocess.run(
[str(script_path)],
check=True,
capture_output=True,
timeout=300, # 5 perc timeout
)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 7
<location> `5-backup-automation/backup_manager/retention.py:255` </location>
<code_context>
"deleted": len(to_delete) if not dry_run else 0,
</code_context>
<issue_to_address>
**suggestion (code-quality):** Swap if/else branches of if expression to remove negation ([`swap-if-expression`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/swap-if-expression))
```suggestion
"deleted": 0 if dry_run else len(to_delete),
```
<br/><details><summary>Explanation</summary>Negated conditions are more difficult to read than positive ones, so it is best
to avoid them where we can. By swapping the `if` and `else` conditions around we
can invert the condition and make it positive.
</details>
</issue_to_address>
### Comment 8
<location> `5-backup-automation/backup_manager/manager.py:68` </location>
<code_context>
def create_backup(self, config: BackupConfig) -> BackupResult:
"""
Mentés létrehozása konfigurációból.
Args:
config: Mentés konfiguráció.
Returns:
BackupResult az eredménnyel.
"""
# Job létrehozása
job = BackupJob(
job_id=str(uuid.uuid4()),
config_name=config.name,
backup_type=config.backup_type,
started_at=datetime.now(),
status=BackupStatus.RUNNING,
)
try:
# Pre-backup script futtatása
if config.pre_backup_script and config.pre_backup_script.exists():
self._run_script(config.pre_backup_script)
# Mentés fájl létrehozása
backup_file = self._generate_backup_filename(config)
job.backup_file = backup_file
# Mentés típus alapján
if config.backup_type == BackupType.FULL:
self._create_full_backup(config, backup_file)
elif config.backup_type == BackupType.INCREMENTAL:
self._create_incremental_backup(config, backup_file)
else:
self._create_differential_backup(config, backup_file)
# Metadata gyűjtése
job.size_bytes = backup_file.stat().st_size
job.files_count = self._count_files_in_archive(backup_file)
job.finished_at = datetime.now()
job.status = BackupStatus.COMPLETED
# Duration számítása
if job.finished_at:
duration = (job.finished_at - job.started_at).total_seconds()
job.duration_seconds = duration
# Metadata mentése
self._save_metadata(job, config)
# Post-backup script futtatása
if config.post_backup_script and config.post_backup_script.exists():
self._run_script(config.post_backup_script)
return BackupResult(
success=True,
job=job,
message=f"Backup completed successfully: {backup_file}",
)
except Exception as e:
job.status = BackupStatus.FAILED
job.finished_at = datetime.now()
job.error_message = str(e)
return BackupResult(
success=False,
job=job,
message=f"Backup failed: {e}",
)
</code_context>
<issue_to_address>
**issue (code-quality):** Extract code out into method ([`extract-method`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/extract-method/))
</issue_to_address>
### Comment 9
<location> `5-backup-automation/backup_manager/manager.py:232-234` </location>
<code_context>
def _save_metadata(self, job: BackupJob, config: BackupConfig) -> None:
"""
Metadata mentése JSON fájlba.
Args:
job: Mentési feladat.
config: Konfiguráció.
"""
if not job.backup_file:
return
metadata = BackupMetadata(
backup_id=job.job_id,
created_at=job.started_at,
config_name=config.name,
backup_type=config.backup_type,
source_path=str(config.source_path),
hostname=socket.gethostname(),
size_bytes=job.size_bytes or 0,
files_count=job.files_count or 0,
compression=config.compression,
encryption=config.encryption,
checksum=self._calculate_checksum(job.backup_file),
)
metadata_file = job.backup_file.with_suffix(
job.backup_file.suffix + ".json"
)
metadata_file.write_text(metadata.model_dump_json(indent=2))
</code_context>
<issue_to_address>
**suggestion (code-quality):** Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
```suggestion
metadata_file = job.backup_file.with_suffix(f"{job.backup_file.suffix}.json")
```
</issue_to_address>
### Comment 10
<location> `5-backup-automation/backup_manager/retention.py:236` </location>
<code_context>
def cleanup_old_backups(
self,
backup_dir: Path,
config_name: Optional[str] = None,
dry_run: bool = False,
) -> dict[str, int]:
"""
Régi mentések tisztítása.
Args:
backup_dir: Mentések könyvtára.
config_name: Konfiguráció név szűréshez.
dry_run: Csak szimulálás.
Returns:
Statisztika dict (kept, deleted, total_size_freed).
"""
# Metadata fájlok keresése
backups = []
for metadata_file in backup_dir.rglob("*.json"):
try:
metadata = BackupMetadata.model_validate_json(
metadata_file.read_text()
)
if config_name is None or metadata.config_name == config_name:
backups.append((metadata, metadata_file))
except Exception:
continue
# Rendezés dátum szerint (legújabb elöl)
backups.sort(key=lambda x: x[0].created_at, reverse=True)
backup_objects = [b[0] for b in backups]
# Retention alkalmazása
to_keep, to_delete = self.apply_retention(backup_objects, dry_run)
# Törlés végrehajtása
deleted_count = 0
total_size_freed = 0
if not dry_run:
for metadata in to_delete:
# Backup fájl és metadata törlése
backup_file = self._find_backup_file(backup_dir, metadata.backup_id)
if backup_file and backup_file.exists():
total_size_freed += backup_file.stat().st_size
backup_file.unlink()
deleted_count += 1
# Metadata fájl törlése
metadata_file = backup_file.with_suffix(backup_file.suffix + ".json")
if metadata_file.exists():
metadata_file.unlink()
return {
"kept": len(to_keep),
"deleted": len(to_delete) if not dry_run else 0,
"would_delete": len(to_delete) if dry_run else 0,
"total_size_freed": total_size_freed,
}
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Move assignments closer to their usage ([`move-assign`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/move-assign/))
- Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
</issue_to_address>
### Comment 11
<location> `5-backup-automation/backup_manager/verifier.py:55` </location>
<code_context>
def verify_backup(self, backup_file: Path) -> VerificationResult:
"""
Mentés ellenőrzése.
Args:
backup_file: Mentés fájl útvonala.
Returns:
VerificationResult az eredménnyel.
"""
errors = []
is_valid = True
can_extract = False
checksum_match = None
# Fájl létezés ellenőrzése
if not backup_file.exists():
errors.append("Backup file does not exist")
return VerificationResult(
backup_file=backup_file,
is_valid=False,
can_extract=False,
size_bytes=0,
verified_at=datetime.now(),
errors=errors,
)
size_bytes = backup_file.stat().st_size
# Ellenőrző összeg validálás
metadata_file = backup_file.with_suffix(backup_file.suffix + ".json")
if metadata_file.exists():
try:
metadata = BackupMetadata.model_validate_json(
metadata_file.read_text()
)
actual_checksum = self._calculate_checksum(backup_file)
checksum_match = actual_checksum == metadata.checksum
if not checksum_match:
errors.append(
f"Checksum mismatch: expected {metadata.checksum}, got {actual_checksum}"
)
is_valid = False
except Exception as e:
errors.append(f"Failed to verify checksum: {e}")
# Tarfile integritás ellenőrzés
try:
with tarfile.open(backup_file, "r") as tar:
# Fájllista olvasás
members = tar.getmembers()
if len(members) == 0:
errors.append("Archive is empty")
is_valid = False
except tarfile.TarError as e:
errors.append(f"Invalid tar archive: {e}")
is_valid = False
# Kibonthatóság tesztelése
if is_valid:
can_extract = self._test_extraction(backup_file)
if not can_extract:
errors.append("Failed to extract archive")
is_valid = False
return VerificationResult(
backup_file=backup_file,
is_valid=is_valid,
can_extract=can_extract,
checksum_match=checksum_match,
size_bytes=size_bytes,
verified_at=datetime.now(),
errors=errors,
)
</code_context>
<issue_to_address>
**suggestion (code-quality):** Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
```suggestion
metadata_file = backup_file.with_suffix(f"{backup_file.suffix}.json")
```
</issue_to_address>
### Comment 12
<location> `5-backup-automation/backup_manager/verifier.py:157` </location>
<code_context>
def verify_all_backups(self, backup_dir: Path) -> dict[str, VerificationResult]:
"""
Összes mentés ellenőrzése könyvtárban.
Args:
backup_dir: Mentések könyvtára.
Returns:
Dict backup_id -> VerificationResult.
"""
results = {}
# Tarfile keresése
for backup_file in backup_dir.rglob("*.tar*"):
if backup_file.suffix == ".json":
continue
try:
result = self.verify_backup(backup_file)
# Backup ID kinyerése metadata-ból
metadata_file = backup_file.with_suffix(backup_file.suffix + ".json")
if metadata_file.exists():
metadata = BackupMetadata.model_validate_json(
metadata_file.read_text()
)
results[metadata.backup_id] = result
else:
results[str(backup_file)] = result
except Exception as e:
# Hiba esetén is adjunk vissza eredményt
results[str(backup_file)] = VerificationResult(
backup_file=backup_file,
is_valid=False,
can_extract=False,
size_bytes=0,
verified_at=datetime.now(),
errors=[f"Verification failed: {e}"],
)
return results
</code_context>
<issue_to_address>
**suggestion (code-quality):** Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
```suggestion
metadata_file = backup_file.with_suffix(f"{backup_file.suffix}.json")
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| for metadata_file in backup_dir.rglob("*.json"): | ||
| try: | ||
| metadata = BackupMetadata.model_validate_json( | ||
| metadata_file.read_text() | ||
| ) | ||
| if config_name is None or metadata.config_name == config_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): The metadata path discovered here is later ignored, and cleanup_old_backups reconstructs it from backup_file, which can be None.
This can raise when _find_backup_file returns None, and it also ignores the already-known metadata_file path in favor of a naming convention that might drift. Consider storing and reusing the original metadata_file path for deletion, and explicitly handle a None backup_file when removing the archive itself.
| backups = [] | ||
|
|
||
| # Összes metadata fájl keresése | ||
| for metadata_file in self.work_dir.rglob("*.json"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (bug_risk): The listing of backups searches self.work_dir, but metadata is written next to destination_path in _save_metadata.
This means backups whose destination_path is outside work_dir won’t be listed. Either ensure destination_path is always under work_dir, or change list_backups to search the actual backup destinations (or accept a directory parameter) so the lookup matches where metadata is stored.
| def apply_retention( | ||
| self, backups: list[BackupMetadata], dry_run: bool = False | ||
| ) -> tuple[list[BackupMetadata], list[BackupMetadata]]: | ||
| """ | ||
| Megőrzési szabály alkalmazása. | ||
|
|
||
| Args: | ||
| backups: Mentések listája (legújabb elöl). | ||
| dry_run: Csak szimulálás, nem töröl. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick (bug_risk): dry_run parameter is unused in retention logic and could mislead callers.
apply_retention accepts dry_run but never uses it. Since cleanup_old_backups already prevents deletions on dry runs, either drop dry_run from this method or implement the same semantics here (especially if this function may later mutate state). An unused flag implies behavior that doesn’t exist and can mislead callers.
| tar -czf "$BACKUP_FILE" \ | ||
| --exclude='*.tmp' \ | ||
| --exclude='*.log' \ | ||
| --exclude='.cache' \ | ||
| --exclude='node_modules' \ | ||
| -C "$(dirname "$SOURCE_DIR")" \ | ||
| "$(basename "$SOURCE_DIR")" 2>&1 | grep -v "Removing leading" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Using grep in the tar pipeline with set -euo pipefail can cause the script to fail even on successful backups.
With set -euo pipefail, the tar stderr pipe into `grep -v
|
|
||
| return set(yearly_backups.values()) | ||
|
|
||
| def cleanup_old_backups( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider refactoring metadata handling, periodic backup selection, and the apply_retention API to remove redundant work and make the retention logic clearer and more focused.
You can reduce complexity and I/O in a few focused places without changing behavior.
1. Avoid double rglob + duplicate JSON parsing
You already load all metadata in cleanup_old_backups, then _find_backup_file walks the tree again and re-parses JSON. You can derive the backup file path once when scanning and carry that forward.
Example refactor:
def cleanup_old_backups(
self,
backup_dir: Path,
config_name: Optional[str] = None,
dry_run: bool = False,
) -> dict[str, int]:
backups: list[tuple[BackupMetadata, Path, Path]] = []
for metadata_file in backup_dir.rglob("*.json"):
try:
metadata = BackupMetadata.model_validate_json(metadata_file.read_text())
if config_name is None or metadata.config_name == config_name:
backup_file = self._backup_file_from_metadata_path(metadata_file)
backups.append((metadata, metadata_file, backup_file))
except Exception:
continue
backups.sort(key=lambda x: x[0].created_at, reverse=True)
backup_objects = [b[0] for b in backups]
to_keep, to_delete = self.apply_retention(backup_objects, dry_run)
ids_to_delete = {b.backup_id for b in to_delete}
deleted_count = 0
total_size_freed = 0
if not dry_run:
for metadata, metadata_file, backup_file in backups:
if metadata.backup_id not in ids_to_delete:
continue
if backup_file.exists():
total_size_freed += backup_file.stat().st_size
backup_file.unlink()
deleted_count += 1
if metadata_file.exists():
metadata_file.unlink()
return {
"kept": len(to_keep),
"deleted": len(to_delete) if not dry_run else 0,
"would_delete": len(to_delete) if dry_run else 0,
"total_size_freed": total_size_freed,
}Then replace _find_backup_file with a deterministic helper used only at discovery time:
def _backup_file_from_metadata_path(self, metadata_file: Path) -> Path:
# metadata: backup.tar.json -> backup.tar
# metadata: backup.tar.gz.json -> backup.tar.gz
base = metadata_file.with_suffix("") # strip .json
if base.suffix == ".tar":
return base
if base.with_suffix("").suffix == ".tar":
# backup.tar.gz.json -> backup.tar.gz
return base.with_suffix("")
# Fallback: if future formats appear, just drop .json
return baseThis removes the second rglob and duplicated JSON parsing, and makes the mapping from metadata → archive explicit.
2. Collapse weekly/monthly/yearly selection
The three helpers are structurally identical: range filter + grouping key + first per group. You can factor that into one generic helper to reduce surface area:
def _get_periodic_backups(
self,
backups: list[BackupMetadata],
start: datetime,
end: datetime,
key_func,
) -> set[str]:
selected: dict[tuple, str] = {}
for backup in backups:
if not (start <= backup.created_at <= end):
continue
key = key_func(backup.created_at)
if key not in selected:
selected[key] = backup.backup_id
return set(selected.values())
def _get_weekly_backups(self, backups, start, end) -> set[str]:
return self._get_periodic_backups(
backups,
start,
end,
key_func=lambda dt: (dt.year, dt.isocalendar()[1]),
)
def _get_monthly_backups(self, backups, start, end) -> set[str]:
return self._get_periodic_backups(
backups,
start,
end,
key_func=lambda dt: (dt.year, dt.month),
)
def _get_yearly_backups(self, backups, start, end) -> set[str]:
return self._get_periodic_backups(
backups,
start,
end,
key_func=lambda dt: (dt.year,),
)Behavior stays identical, but the rule is centralized and easier to tweak.
3. Remove unused dry_run in apply_retention
apply_retention is pure and ignores dry_run. Keeping the parameter suggests side effects that don’t exist.
You can simplify the API and eliminate the misleading abstraction:
def apply_retention(
self, backups: list[BackupMetadata]
) -> tuple[list[BackupMetadata], list[BackupMetadata]]:
...And update the call site:
to_keep, to_delete = self.apply_retention(backup_objects)If you do plan to add side effects at the apply_retention layer later, wiring dry_run into actual behavior now would also resolve the confusion, but as-is the simplest option is to drop it.
|
|
||
| try: | ||
| # Pre-backup script futtatása | ||
| if config.pre_backup_script and config.pre_backup_script.exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Extract code out into method (extract-method)
| metadata_file = job.backup_file.with_suffix( | ||
| job.backup_file.suffix + ".json" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use f-string instead of string concatenation (use-fstring-for-concatenation)
| metadata_file = job.backup_file.with_suffix( | |
| job.backup_file.suffix + ".json" | |
| ) | |
| metadata_file = job.backup_file.with_suffix(f"{job.backup_file.suffix}.json") |
| to_keep, to_delete = self.apply_retention(backup_objects, dry_run) | ||
|
|
||
| # Törlés végrehajtása | ||
| deleted_count = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Move assignments closer to their usage (
move-assign) - Use f-string instead of string concatenation (
use-fstring-for-concatenation)
| size_bytes = backup_file.stat().st_size | ||
|
|
||
| # Ellenőrző összeg validálás | ||
| metadata_file = backup_file.with_suffix(backup_file.suffix + ".json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use f-string instead of string concatenation (use-fstring-for-concatenation)
| metadata_file = backup_file.with_suffix(backup_file.suffix + ".json") | |
| metadata_file = backup_file.with_suffix(f"{backup_file.suffix}.json") |
| try: | ||
| result = self.verify_backup(backup_file) | ||
| # Backup ID kinyerése metadata-ból | ||
| metadata_file = backup_file.with_suffix(backup_file.suffix + ".json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use f-string instead of string concatenation (use-fstring-for-concatenation)
| metadata_file = backup_file.with_suffix(backup_file.suffix + ".json") | |
| metadata_file = backup_file.with_suffix(f"{backup_file.suffix}.json") |
Summary / Összefoglaló
🇬🇧 English
Complete backup automation framework with retention policies and verification:
Python Backup Manager:
BackupManager: Create full/incremental/differential backups
RetentionManager: Automated backup cleanup
BackupVerifier: Integrity verification
Bash Scripts:
🇭🇺 Magyar
Teljes backup automatizálási keretrendszer retention policy-val és ellenőrzéssel:
Python Backup Manager:
Test Plan / Tesztterv
🤖 Generated with Claude Code
Summary by Sourcery
Add a backup automation module providing backup creation, retention management, verification, and a Bash wrapper script.
New Features:
Enhancements:
Documentation:
Tests:
Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.