-
Notifications
You must be signed in to change notification settings - Fork 2.2k
fix(buffers): disk buffers v2 reopen buffer after restart crash loop #25570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: graphite-base/25570
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| Fixed a `disk` buffer (v2) crash loop where, after a crash or forced restart, Vector | ||
| could fail to reopen the buffer with `failed to seek to position where reader left off: | ||
| No such file or directory` and exit with a configuration error on every restart. The | ||
| reader now advances past a fully acknowledged data file that was already deleted instead | ||
| of failing the buffer build, so the buffer always reopens and continues delivering. | ||
|
|
||
| Disk buffer (v2) durability was also hardened: the directory holding the buffer is now | ||
| `fsync`ed after a data file is created. Previously only file contents were synced, so a | ||
| crash could lose a freshly created data file's directory entry and drop data that had | ||
| been reported as synced to disk. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -800,6 +800,17 @@ where | |
| ); | ||
| self.ledger.wait_for_writer().await; | ||
| } else { | ||
| // The ledger names a data file that is gone while | ||
| // the writer has moved on past it. A data file is | ||
| // only unlinked after every record in it is acked, | ||
| // so a missing file below the writer was fully | ||
| // delivered. Advancing past it loses nothing. | ||
| warn!( | ||
| skipped_file_id = reader_file_id, | ||
| writer_file_id, | ||
| data_file_path = data_file_path.to_string_lossy().as_ref(), | ||
| "Reader resume data file is missing; it was fully acknowledged before deletion. Advancing past it." | ||
| ); | ||
| self.ledger.increment_acked_reader_file_id(); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This branch also matches buffers affected by the creation-side durability bug this patch calls out: before this change (and still on non-Unix, where Useful? React with 👍 / 👎. |
||
| } | ||
| continue; | ||
|
|
@@ -819,6 +830,53 @@ where | |
| } | ||
| } | ||
|
|
||
| /// Reconciles the reader's resume position on reopen. | ||
| /// | ||
| /// A crash between unlinking a fully-acked data file and the durable ledger flush can leave | ||
| /// the ledger naming a file that is already gone. Walk the reader file id forward to the | ||
| /// lowest file that still exists so `seek_to_next_record` opens a real file instead of failing | ||
| /// on a missing one. A missing file strictly below the writer was fully delivered, so skipping | ||
| /// it loses nothing. `total_buffer_size` is reseeded on reopen from files that exist, so an | ||
| /// absent file already contributes zero and is deliberately not adjusted here. | ||
| pub(super) async fn reconcile_reader_position(&mut self) -> Result<(), ReaderError<T>> { | ||
| let mut advanced = false; | ||
| loop { | ||
| let (reader_file_id, writer_file_id) = self.ledger.get_current_reader_writer_file_id(); | ||
| // Equality guards, not a numeric `<`, so ring wraparound never skips the live writer file. | ||
| if reader_file_id == writer_file_id | ||
| || reader_file_id == self.ledger.get_next_writer_file_id() | ||
| { | ||
| break; | ||
| } | ||
| let data_file_path = self.ledger.get_data_file_path(reader_file_id); | ||
| match self | ||
| .ledger | ||
| .filesystem() | ||
| .open_file_readable(&data_file_path) | ||
| .await | ||
| { | ||
| Ok(_) => break, | ||
| Err(e) if e.kind() == ErrorKind::NotFound => { | ||
| warn!( | ||
| skipped_file_id = reader_file_id, | ||
| writer_file_id, | ||
| data_file_path = data_file_path.to_string_lossy().as_ref(), | ||
| "Reader resume data file missing on reopen; fully acknowledged before deletion. Advancing past it." | ||
| ); | ||
| self.ledger.increment_acked_reader_file_id(); | ||
| advanced = true; | ||
| } | ||
| Err(source) => return Err(ReaderError::Io { source }), | ||
| } | ||
| } | ||
| if advanced { | ||
| self.ledger | ||
| .flush() | ||
| .map_err(|source| ReaderError::Io { source })?; | ||
| } | ||
| Ok(()) | ||
| } | ||
|
|
||
| /// Seeks to where this reader previously left off. | ||
| /// | ||
| /// In cases where Vector has restarted, but the reader hasn't yet finished a file, we would | ||
|
|
@@ -864,8 +922,14 @@ where | |
| // | ||
| // Once the reader/writer file IDs are identical, we fall back to the slow path. | ||
| while self.ledger.get_current_reader_file_id() != self.ledger.get_current_writer_file_id() { | ||
| let data_file_path = self.ledger.get_current_reader_data_file_path(); | ||
| self.ensure_ready_for_read().await.context(IoSnafu)?; | ||
| // NOTE we intentionally read the resume path after | ||
| // `ensure_ready_for_read` to avoid crash-looping the buffer. If the | ||
| // ledger is out of date -- a hard-crash will cause its sync to be | ||
| // missed after dat files are unlinked -- we may be pointed to a | ||
| // missing dat file. Skipping is harmless as, by construction, the | ||
| // file was previously read entirely. | ||
| let data_file_path = self.ledger.get_current_reader_data_file_path(); | ||
| let data_file_mmap = self | ||
| .ledger | ||
| .filesystem() | ||
|
Comment on lines
933
to
935
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When recovery skips a missing reader file here, Useful? React with 👍 / 👎. |
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a crash happens after
delete_completed_data_fileunlinks a data file but before its mmap changes are durably flushed, the restored ledger can still include that file's bytes and last-read record state even though the file is gone. This branch only advances the reader file id, so the skipped file's size remains intotal_buffer_sizeand its record ids may be treated as gaps; after the remaining files drain, the buffer can still appear non-empty/full and the reader can wait for a next writer file instead of reaching an empty state. The recovery path needs to make the same accounting durable before unlink, or explicitly repair the skipped file's ledger state here.Useful? React with 👍 / 👎.