Add two repair tests to agent-antagonist#1793
Add two repair tests to agent-antagonist#1793jmpesp wants to merge 1 commit intooxidecomputer:mainfrom
Conversation
Discussing possible causes for the zeroes appearing at the beginning of an extent file (oxidecomputer#1788), one theory that came up was that the Extent was being repaired, and those zeroes came from reading the Extent from the repair API, not from the disk itself: only _one_ Region's Extents were bad, not all of them. In order to test this theory, add two new tests, both of which will contact an Agent to get a Region's details, then use the repair API to read Extent data files and either look for a block of zeroes at the beginning or write that to a temporary file and use the new `Extent::validate` routine.
leftwo
left a comment
There was a problem hiding this comment.
These changes are really just around looking for and checking around the #1788 all zeros situation, right? We might want to document somewhere how to do a setup where someone could run this, so that information is not lost to time.
| #[clap(short, long)] | ||
| region_id: Uuid, | ||
|
|
||
| /// Allow reading extents from a read-write downstairs that may not be |
There was a problem hiding this comment.
This seems fraught with peril. Can we expect this data to be good?
Or, is this more allowing us to connect to a downstairs that is serving a RW region, but that downstairs not actually taking live IO from an upstairs?
There was a problem hiding this comment.
Right, we can't expect it to be good no, if it's accepting IO - but in cases where it's RW downstairs that's not currently doing anything, we should expect the data to be good.
| .build() | ||
| .unwrap(); | ||
|
|
||
| loop { |
There was a problem hiding this comment.
This will just run forever, or until you control-C it?
leftwo
left a comment
There was a problem hiding this comment.
IF you add some notes or documentation (somewhere??) that can be used to run these, then I'm fine with the changes going back.
Discussing possible causes for the zeroes appearing at the beginning of an extent file (#1788), one theory that came up was that the Extent was being repaired, and those zeroes came from reading the Extent from the repair API, not from the disk itself: only one Region's Extents were bad, not all of them.
In order to test this theory, add two new tests, both of which will contact an Agent to get a Region's details, then use the repair API to read Extent data files and either look for a block of zeroes at the beginning or write that to a temporary file and use the new
Extent::validateroutine.