Skip to content

Fix ingester panic on float histogram with zero count#7645

Merged
danielblando merged 3 commits into
cortexproject:masterfrom
yeya24:fix/ingester-float-histogram-panic
Jun 25, 2026
Merged

Fix ingester panic on float histogram with zero count#7645
danielblando merged 3 commits into
cortexproject:masterfrom
yeya24:fix/ingester-float-histogram-panic

Conversation

@yeya24

@yeya24 yeya24 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What this PR does

The ingester Push path selected the float vs integer native-histogram decoder using a value check, hp.GetCountFloat() > 0:

if hp.GetCountFloat() > 0 {
    fh = cortexpb.FloatHistogramProtoToFloatHistogram(hp.Histogram)
} else {
    h = cortexpb.HistogramProtoToHistogram(hp.Histogram)
}

A float histogram with a count of 0 (for example a staleness marker — FloatHistogram{Sum: StaleNaN} — or an empty histogram) still has the CountFloat oneof set, so IsFloatHistogram() reports true while GetCountFloat() > 0 is false. Such samples were routed to HistogramProtoToHistogram, which panics:

panic: HistogramProtoToHistogram called with a float histogram

Because this runs synchronously in the gRPC Push handler with no recover, the panic crashes the whole ingester process. With many ingesters receiving the same input this can break write/read quorum across a cell.

Fix

Select the decoder by the proto type via IsFloatHistogram(), matching the discriminator already used in pkg/util/validation and upstream Prometheus.

Testing

Added TestIngester_Push_FloatHistogramWithZeroCount covering a staleness-marker float histogram and an empty float histogram (both count 0). Verified the test panics on master without the fix and passes with it. Full pkg/ingester suite passes with -tags "netgo slicelabels".

Which issue(s) this PR fixes

N/A

Checklist

  • Tests updated
  • CHANGELOG.md updated - (will add with PR number)
  • docs/configuration/config-file-reference.md updated (regenerated; no change — no config flags added)

The ingester Push path chose the float vs integer native-histogram
decoder by checking `hp.GetCountFloat() > 0`. A float histogram with a
count of 0 (e.g. a staleness marker or an empty histogram) still has the
CountFloat oneof set, so `IsFloatHistogram()` reports true while the
value check is false. Such samples were routed to
`HistogramProtoToHistogram`, which panics with "HistogramProtoToHistogram
called with a float histogram", crashing the whole ingester process in
the synchronous Push handler.

Select the decoder by the proto type via `IsFloatHistogram()`, matching
the discriminator already used in `util/validation` and upstream
Prometheus. Add a regression test covering count-0 float histograms
(staleness marker and empty histogram).

Signed-off-by: Ben Ye <benye@amazon.com>

@SungJin1212 SungJin1212 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jun 25, 2026
@danielblando danielblando merged commit c5c2de0 into cortexproject:master Jun 25, 2026
37 checks passed
@yeya24 yeya24 deleted the fix/ingester-float-histogram-panic branch June 25, 2026 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/ingester lgtm This PR has been approved by a maintainer size/M type/bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants