Fix ingester panic on float histogram with zero count#7645
Merged
danielblando merged 3 commits intoJun 25, 2026
Merged
Conversation
The ingester Push path chose the float vs integer native-histogram decoder by checking `hp.GetCountFloat() > 0`. A float histogram with a count of 0 (e.g. a staleness marker or an empty histogram) still has the CountFloat oneof set, so `IsFloatHistogram()` reports true while the value check is false. Such samples were routed to `HistogramProtoToHistogram`, which panics with "HistogramProtoToHistogram called with a float histogram", crashing the whole ingester process in the synchronous Push handler. Select the decoder by the proto type via `IsFloatHistogram()`, matching the discriminator already used in `util/validation` and upstream Prometheus. Add a regression test covering count-0 float histograms (staleness marker and empty histogram). Signed-off-by: Ben Ye <benye@amazon.com>
…texproject#7645) Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
danielblando
approved these changes
Jun 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
The ingester
Pushpath selected the float vs integer native-histogram decoder using a value check,hp.GetCountFloat() > 0:A float histogram with a count of 0 (for example a staleness marker —
FloatHistogram{Sum: StaleNaN}— or an empty histogram) still has theCountFloatoneof set, soIsFloatHistogram()reportstruewhileGetCountFloat() > 0isfalse. Such samples were routed toHistogramProtoToHistogram, which panics:Because this runs synchronously in the gRPC
Pushhandler with norecover, the panic crashes the whole ingester process. With many ingesters receiving the same input this can break write/read quorum across a cell.Fix
Select the decoder by the proto type via
IsFloatHistogram(), matching the discriminator already used inpkg/util/validationand upstream Prometheus.Testing
Added
TestIngester_Push_FloatHistogramWithZeroCountcovering a staleness-marker float histogram and an empty float histogram (both count 0). Verified the test panics on master without the fix and passes with it. Fullpkg/ingestersuite passes with-tags "netgo slicelabels".Which issue(s) this PR fixes
N/A
Checklist
CHANGELOG.mdupdated - (will add with PR number)docs/configuration/config-file-reference.mdupdated (regenerated; no change — no config flags added)