Gate block-manager historical storage behind PROMPP feature flag#389
Open
vporoshok wants to merge 21 commits into
Open
Gate block-manager historical storage behind PROMPP feature flag#389vporoshok wants to merge 21 commits into
vporoshok wants to merge 21 commits into
Conversation
* chore(deps): update snappy digest to 27ab5f7 * removed patches for snappy --------- Co-authored-by: Renovate Bot <renovate@whitesourcesoftware.com> Co-authored-by: Vladimir Pustovalov <cherep@sura.ru>
* created metrics for DataStorage * added ability to store metadata in metrics page * added encoder type count metrics for DataStorage * added DataStorage finalized_chunks_count metric * added DataStorage timestamp_states_count metric * review fixes * changed calculation logic of finalized_chunks metric * removed DataStorage timestamp_states_count metric * added ability to refresh metrics in metrics page * added DataStorage timestamp_states_count metric * fixed chunk_count_metric calculating * created unit test for DataStorage metrics * optimized encoding speed * fixed clang-format * fixed compilation error * fixed comment * fixed clang-tidy warning
Co-authored-by: Renovate Bot <renovate@whitesourcesoftware.com>
Co-authored-by: Renovate Bot <renovate@whitesourcesoftware.com>
Co-authored-by: Renovate Bot <renovate@whitesourcesoftware.com>
…ity] (#384) Co-authored-by: Renovate Bot <renovate@whitesourcesoftware.com>
Co-authored-by: Renovate Bot <renovate@whitesourcesoftware.com>
Co-authored-by: Renovate Bot <renovate@whitesourcesoftware.com>
* Add block.Manager with reload, retention and queryable support Port reloadBlocks into a standalone pp/go/storage/block.Manager that reloads persisted blocks, applies retention via an injected tsdb.BlocksToDeleteFunc, and implements storage.Queryable/ChunkQueryable. Refactor pp-pkg/tsdb to a DB-free NewBlocksToDelete constructor that owns its retention metrics and limit gauges, and expose CatalogHeadsSize / CatalogHeadsExtraSize helpers. Add a tsdb.OpenBlocks wrapper. Co-authored-by: Cursor <cursoragent@cursor.com> * Implement Blocks method in block.Manager to return currently loaded blocks This update adds the Blocks method to the Manager struct, which provides a snapshot of the currently loaded blocks, implementing the BlockSource interface. The method ensures thread-safe access to the blocks using read locks. * Wire block.Manager and block.Compactor into main, disable tsdb In server mode, stop opening tsdb.DB and instead run block.Manager (persisted block reads + retention) and block.Compactor (compaction). block.Manager is plugged into the fanout via a querier-only storage.Storage adapter; localStorage stays an empty stub. Replace the TSDB run-group actor with a lifecycle actor and drop the dead openDBWithMetrics and its obsolete TestTimeMetrics. Co-authored-by: Cursor <cursoragent@cursor.com> * review fix * Add block manager coverage and fail-fast startup behavior. Ensure server startup aborts when the initial block reload fails, and add manager/compactor tests to cover startup loading, retention-driven deletion, and compaction loop triggering. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Vladimir Kavlakan <vladimir.kavlakan@flant.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Bastrykov Evgeniy <vporoshok@gmail.com>
…curity] (#385) Co-authored-by: Renovate Bot <renovate@whitesourcesoftware.com>
Add compaction plan/result logs, restore missing block-manager TSDB gauges, and cap compaction ranges by max block duration so 2h block setups follow configured bounds. Co-authored-by: Cursor <cursoragent@cursor.com>
Default to the pre-PR-377 historical TSDB path and enable block-manager only with PROMPP_FEATURES=enable_block_manager, keeping PP head+adapter as the write path in both modes. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose prometheus_tsdb_blocks_loaded_by_size to track loaded block size buckets after reload and help diagnose startup spikes caused by unexpected compaction output. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the size-based gauge with prometheus_tsdb_blocks_loaded_by_duration so legacy TSDB exposes the same block-duration view as block-manager during startup diagnostics. Co-authored-by: Cursor <cursoragent@cursor.com>
Add Grafana panels for loaded block layout and normalize loaded-block duration buckets to 5 minutes in both block-manager and legacy TSDB so duration heatmaps are stable and easy to read. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose loaded-block duration labels in minutes with 1-minute rounding for both block-manager and legacy TSDB paths, and update Grafana panels to query duration_minutes for clearer heatmap grouping. Co-authored-by: Cursor <cursoragent@cursor.com>
Mirror legacy tsdb "Found healthy block" output so operators can see the on-disk block layout when the block manager starts, including each block's normalized duration in minutes. Co-authored-by: Cursor <cursoragent@cursor.com>
Run reload (with deletion) and a single compaction pass sequentially in one goroutine, mirroring tsdb's compact/reload loop. This removes the race where the compactor's independent loop could plan/compact blocks that the manager's reload was concurrently deleting (open meta.json: no such file or directory) and the resulting repeated re-compaction of overlapping blocks (CPU/mmap churn). The compactor no longer runs its own goroutine, ticker or shared mutex: it exposes a one-shot Compact() called by Manager after each reload. Also render the compaction plan as a string in logs (go-kit cannot encode []string). Co-authored-by: Cursor <cursoragent@cursor.com>
When the C++ scrape parser rejects a buffer with invalid UTF-8 (in a HELP text or a label value), print the containing line, the buffer size and the line start offset to stdout. The Go-side buffer is mutated in place during parsing, so the offending bytes can only be inspected reliably here. Co-authored-by: Cursor <cursoragent@cursor.com>
After a successful compaction, immediately reload and compact again instead of waiting for the next ticker interval, so multiple pending compactions converge in one tick (mirroring tsdb's compact/reload loop). Compact now reports whether it did any work to drive the loop. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PROMPP_FEATURES=enable_block_managerWhy
A stage rollout showed a sharp CPU and mmap spike after switching storage behavior, so we need a safer migration path and better visibility before enabling the new scheme by default.
Test plan
devcontainer exec --workspace-folder . --config .devcontainer/arm/devcontainer.json go test -tags stringlabels ./pp/go/storage/block/...devcontainer exec --workspace-folder . --config .devcontainer/arm/devcontainer.json go test -tags stringlabels ./cmd/prometheus/...Made with Cursor