In orgs with high overall metrics ingest (~10 TB/month), queries fall back to down-sampled storage tiers (1/8, 1/16, 1/64). The down-sampling decision is scoped at the project/dataset level, so low-volume but business-critical series lose accuracy simply because they share a project with high-volume series. At least two orgs have surfaced this independently; the affected volume tier is reachable by dozens of customers as the metrics product grows.
Observed symptoms:
- Dashboard charts grouped by stable, low-cardinality attributes (e.g. route name) return visibly inaccurate values
- Low-volume series are unreliable at down-sampled tiers even though their own data volume is tiny
- Higher-cardinality groupings are only accurate over short time ranges that still hit the 1:1 tier
Why the current workaround doesn't scale:
The only available mitigation is splitting high- and low-volume metrics into separate projects via SDK transport config. This is fragile, customer-maintained, and abuses project as a de facto sharding key.
Proposed approaches:
- Make metric series name a first-class index/sort-key in the storage layer (preferred). Series names are low-cardinality and stable; queries almost always filter on a single series name. Treating series name like
project or environment as a storage scoping key would let low-volume series be served at full fidelity regardless of org-level volume.
- Per-series down-sampling decisions. Make the tier selection sensitive to the volume of the queried series, not the aggregate project volume.
- Materialized views / pre-aggregations for known dashboard queries — less general, listed for completeness.
Open questions for the owning team:
- Schema viability (index/projection on series name) and ingest-time maintenance cost
- Whether existing data could be backfilled or only new ingest would benefit
- Confirmation that the dominant query pattern always filters on a single series name
In orgs with high overall metrics ingest (~10 TB/month), queries fall back to down-sampled storage tiers (1/8, 1/16, 1/64). The down-sampling decision is scoped at the project/dataset level, so low-volume but business-critical series lose accuracy simply because they share a project with high-volume series. At least two orgs have surfaced this independently; the affected volume tier is reachable by dozens of customers as the metrics product grows.
Observed symptoms:
Why the current workaround doesn't scale:
The only available mitigation is splitting high- and low-volume metrics into separate projects via SDK transport config. This is fragile, customer-maintained, and abuses
projectas a de facto sharding key.Proposed approaches:
projectorenvironmentas a storage scoping key would let low-volume series be served at full fidelity regardless of org-level volume.Open questions for the owning team: