sources(migrator): add logic to remove orphaned datastores#812
sources(migrator): add logic to remove orphaned datastores#812vigh-m wants to merge 1 commit intobottlerocket-os:developfrom
Conversation
982d15e to
4764498
Compare
|
⬆️ fixes for unit-tests, fmt, and clippy |
| datastore_dir.join("current"), | ||
| datastore_dir.join(format!("v{}", cv.major)), | ||
| datastore_dir.join(format!("v{}.{}", cv.major, cv.minor)), | ||
| datastore_dir.join(format!("v{}", cv)), | ||
| datastore_dir.join(format!("v{}", tv)), | ||
| datastore_dir.join(format!("v{}", tv.major)), | ||
| datastore_dir.join(format!("v{}.{}", tv.major, tv.minor)), |
There was a problem hiding this comment.
I don't love this because there's no mechanism to keep symlink creation and symlink cleanup in sync.
We could potentially model it as a graph traversal problem, i.e. populate graph with vertices for symlinks and the final datastore directory; add edges for each symlink's target; find all vertices without a path to the final datastore; remove them.
4764498 to
dd585dd
Compare
|
🆙 Reworked the PR following comments. The description is also up to date with the new testing results |
dd585dd to
57a3e6b
Compare
|
🆙 Minor fix for an unused change |
Signed-off-by: Vighnesh Maheshwari <vighmah@amazon.com>
57a3e6b to
909f25e
Compare
| } | ||
| let mut seen: HashSet<PathBuf> = HashSet::from([path.clone()]); | ||
| let mut current = path; | ||
| while let Ok(next) = fs::read_link(¤t).await { |
There was a problem hiding this comment.
May want to protect against infinite symlink loops. Could maybe leverage seen?
Might be a good unit test too.
| ) -> HashSet<PathBuf> { | ||
| let mut reachable: HashSet<PathBuf> = HashSet::new(); | ||
| let mut unreachable: HashSet<PathBuf> = HashSet::new(); | ||
| let top_dir = final_vertex.parent().unwrap(); |
There was a problem hiding this comment.
Shouldn't use unwrap() - rather avoid the situation for panic
| let top_dir = final_vertex.parent().unwrap(); | ||
| 'outer: while let Ok(Some(entry)) = all_vertices.next_entry().await { | ||
| let path = entry.path(); | ||
| if reachable.contains(&path) { |
There was a problem hiding this comment.
Is it worth also checking unreachable here?
Seems if it is in unreachable, we would have already traversed the path.
| let mut reachable: HashSet<PathBuf> = HashSet::new(); | ||
| let mut unreachable: HashSet<PathBuf> = HashSet::new(); | ||
| let top_dir = final_vertex.parent().unwrap(); | ||
| 'outer: while let Ok(Some(entry)) = all_vertices.next_entry().await { |
There was a problem hiding this comment.
nit: while let Ok(Some(entry)) silently exits on I/O errors from next_entry(), which means cleanup could be incomplete without any log message. Consider logging the error before breaking
| // If an intermediate datastore exists from a previous loop, delete it. | ||
| if let Some(path) = &intermediate_datastore { | ||
| delete_intermediate_datastore(path).await; | ||
| delete_datastore(path.as_path()).await; |
There was a problem hiding this comment.
nit: Name - comment still says intermediate but function renamed delete_datastore. I think I prefer delete_datastore_entryor clean_datastore
| } | ||
|
|
||
| // Traverse all files in all_vertices and return those that do not resolve to the final_vertex | ||
| async fn find_all_files_to_delete( |
There was a problem hiding this comment.
nit: find_orphaned_entries ?
Issue number:
Related bottlerocket-os/bottlerocket#2589
Description of changes:
The disk space was filling up because, overtime, repeated updates to the latest released version of Bottlerocket were causing datastore from older versions to be left behind. I was able to reproduce this by setting up Brupop to update an instance from v1.15.0 incrementally using the
settings.updates.version-lockto point to the next version.The datastore ends up like below:
Details
This change adds a new method
cleanup_orphaned_datastoreswhich locates all datastore symlinks that do not resolvetargetdatastore and deletes everything else.There is a helper function named
find_all_files_to_deletewhich does the actual symlink traversal.I also extended the unit tests to add multi-stage migrations to verify this behaviour.
Testing done:
Migrate from 1.53.0 to 1.54.0 (with settings migrations):
Details
Initial state
bash-5.2# ls -la /var/lib/bottlerocket/datastore/ total 12 drwxr-xr-x 3 root root 4096 Feb 17 04:59 . drwxr-xr-x. 5 root root 4096 Feb 17 04:59 .. lrwxrwxrwx 1 root root 2 Feb 17 04:59 current -> v1 lrwxrwxrwx 1 root root 5 Feb 17 04:59 v1 -> v1.53 lrwxrwxrwx 1 root root 7 Feb 17 04:59 v1.53 -> v1.53.0 lrwxrwxrwx 1 root root 24 Feb 17 04:59 v1.53.0 -> v1.53.0_MBb5IUZ2jNtloSZB drwxr-xr-x 4 root root 4096 Feb 17 04:59 v1.53.0_MBb5IUZ2jNtloSZBForward migration
bash-5.2# ls -la /var/lib/bottlerocket/datastore/ total 12 drwxr-xr-x 3 root root 4096 Feb 17 05:09 . drwxr-xr-x. 5 root root 4096 Feb 17 04:59 .. lrwxrwxrwx 1 root root 2 Feb 17 05:09 current -> v1 lrwxrwxrwx 1 root root 5 Feb 17 05:09 v1 -> v1.54 lrwxrwxrwx 1 root root 7 Feb 17 05:09 v1.54 -> v1.54.0 lrwxrwxrwx 1 root root 24 Feb 17 05:09 v1.54.0 -> v1.54.0_pzujidtVjBccrhUX drwxr-xr-x 4 root root 4096 Feb 17 05:09 v1.54.0_pzujidtVjBccrhUXRollback
bash-5.2# ls -la /var/lib/bottlerocket/datastore/ total 12 drwxr-xr-x 3 root root 4096 Feb 17 17:54 . drwxr-xr-x. 5 root root 4096 Feb 17 04:59 .. lrwxrwxrwx 1 root root 2 Feb 17 17:54 current -> v1 lrwxrwxrwx 1 root root 5 Feb 17 17:54 v1 -> v1.53 lrwxrwxrwx 1 root root 7 Feb 17 17:54 v1.53 -> v1.53.0 lrwxrwxrwx 1 root root 24 Feb 17 17:54 v1.53.0 -> v1.53.0_9GTssIfOT9f1FkFX drwxr-xr-x 4 root root 4096 Feb 17 17:54 v1.53.0_9GTssIfOT9f1FkFXMigrate 1.53.0 to 1.54.0 then to 1.55.0 then to 1.55.1
Details
Initial State
bash-5.2# ls -la /var/lib/bottlerocket/datastore/ total 12 drwxr-xr-x 3 root root 4096 Feb 17 17:54 . drwxr-xr-x. 5 root root 4096 Feb 17 04:59 .. lrwxrwxrwx 1 root root 2 Feb 17 17:54 current -> v1 lrwxrwxrwx 1 root root 5 Feb 17 17:54 v1 -> v1.53 lrwxrwxrwx 1 root root 7 Feb 17 17:54 v1.53 -> v1.53.0 lrwxrwxrwx 1 root root 24 Feb 17 17:54 v1.53.0 -> v1.53.0_9GTssIfOT9f1FkFX drwxr-xr-x 4 root root 4096 Feb 17 17:54 v1.53.0_9GTssIfOT9f1FkFXForward migration
Rollback
total 12 drwxr-xr-x 3 root root 4096 Feb 17 18:27 . drwxr-xr-x. 5 root root 4096 Feb 17 04:59 .. lrwxrwxrwx 1 root root 2 Feb 17 18:27 current -> v1 lrwxrwxrwx 1 root root 5 Feb 17 18:27 v1 -> v1.55 lrwxrwxrwx 1 root root 7 Feb 17 18:27 v1.55 -> v1.55.0 lrwxrwxrwx 1 root root 24 Feb 17 18:27 v1.55.0 -> v1.55.0_oT7OTfJqxzvd1h2G drwxr-xr-x 4 root root 4096 Feb 17 18:27 v1.55.0_oT7OTfJqxzvd1h2GBy submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.