Conversation
| if err != nil { | ||
| m.RecordDatabaseQueryError("count_refresh_tokens") | ||
| gaugeErrorLogger.logIfNeeded("count_refresh_tokens", err) | ||
| } else { |
Check failure
Code scanning / CodeQL
Incorrect conversion between integer types High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, to fix this class of problems you should avoid converting from a wider integer type (here int64) to a narrower type (here potentially 32‑bit int) without enforcing bounds that guarantee no overflow. The cleanest approach is to keep the value in the wide type all the way through, or to change the API to accept the wider type, rather than down‑casting.
In this codebase, the best minimal fix is to avoid the int(...) casts when recording metrics and pass the int64 counts directly to the metrics recorder, assuming its interface accepts or can be changed to accept int64. However, the snippet shows only the call sites, not the MetricsRecorder interface. Since we are constrained to edit only the provided snippets and must not change other files, we cannot safely change the metrics interface signature. Therefore, the next best fix that preserves existing functionality is to explicitly clamp the int64 values into the valid range of int before casting. On platforms where int is 64‑bit, this clamp will be a no‑op; on 32‑bit platforms, it prevents overflow by capping large counts at math.MaxInt (and similarly handling negative values if they ever occurred).
Concretely, in main.go we will import math and replace each int(...) cast on values that originated as int64 (activeAccessTokens, activeRefreshTokens, totalDeviceCodes, pendingDeviceCodes) with calls to a small helper safeIntFromInt64 that clamps to the int range. This helper will live in main.go below the imports so we don’t modify any other packages. We will then use safeIntFromInt64 in both updateGaugeMetrics and updateGaugeMetricsWithCache when calling m.SetActiveTokensCount and m.SetActiveDeviceCodesCount. This preserves behavior on 64‑bit systems and removes overflow risk on 32‑bit systems, addressing all three CodeQL variants because all of them trace back to these int(...) conversions.
main.go
Outdated
Check failure
Code scanning / CodeQL
Incorrect conversion between integer types High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, to fix this type of issue you should avoid narrowing conversions from a larger integer type (here int64) to a smaller one (here potentially 32‑bit int) without bounds checking, or you should keep using the wider type end‑to‑end. Since the MetricsRecorder interface uses int parameters, the simplest safe change is to clamp the int64 values into the valid int range before converting, using math.MinInt/math.MaxInt (which reflect the platform’s int size).
Concretely, the problematic conversions are in main.go in updateGaugeMetricsWithCache:
int(activeAccessTokens)on line 928int(activeRefreshTokens)on line 937int(totalDeviceCodes)andint(pendingDeviceCodes)on line 955
We should:
-
Import the
mathpackage inmain.go(no new external dependency). -
Introduce a small helper function in
main.gothat takes anint64and returns anintby clamping to the platform’sintrange, e.g.:func clampInt64ToInt(v int64) int { if v > int64(math.MaxInt) { return math.MaxInt } if v < int64(math.MinInt) { return math.MinInt } return int(v) }
-
Replace all direct
int(...)casts on the metric counts inupdateGaugeMetricsWithCachewith calls to this helper:m.SetActiveTokensCount("access", clampInt64ToInt(activeAccessTokens))m.SetActiveTokensCount("refresh", clampInt64ToInt(activeRefreshTokens))m.SetActiveDeviceCodesCount(clampInt64ToInt(totalDeviceCodes), clampInt64ToInt(pendingDeviceCodes))
This preserves existing functionality for values within the valid int range and prevents overflow for out‑of‑range values by saturating at the bounds.
No changes are needed in internal/cache/rueidis.go or internal/cache/rueidis_aside.go, because they already operate purely on int64. The only narrowing occurs in main.go.
| @@ -33,6 +33,7 @@ | ||
| "fmt" | ||
| "io/fs" | ||
| "log" | ||
| "math" | ||
| "net/http" | ||
| "os" | ||
| "time" | ||
| @@ -913,6 +914,16 @@ | ||
| // updateGaugeMetricsWithCache updates gauge metrics using a cache-backed store. | ||
| // This reduces database load in multi-instance deployments by caching query results. | ||
| // The cache TTL should match the update interval to ensure consistent behavior. | ||
| func clampInt64ToInt(v int64) int { | ||
| if v > int64(math.MaxInt) { | ||
| return math.MaxInt | ||
| } | ||
| if v < int64(math.MinInt) { | ||
| return math.MinInt | ||
| } | ||
| return int(v) | ||
| } | ||
|
|
||
| func updateGaugeMetricsWithCache( | ||
| ctx context.Context, | ||
| cacheWrapper *metrics.MetricsCacheWrapper, | ||
| @@ -925,7 +936,7 @@ | ||
| m.RecordDatabaseQueryError("count_access_tokens") | ||
| gaugeErrorLogger.logIfNeeded("count_access_tokens", err) | ||
| } else { | ||
| m.SetActiveTokensCount("access", int(activeAccessTokens)) | ||
| m.SetActiveTokensCount("access", clampInt64ToInt(activeAccessTokens)) | ||
| } | ||
|
|
||
| // Update active refresh tokens count | ||
| @@ -934,7 +945,7 @@ | ||
| m.RecordDatabaseQueryError("count_refresh_tokens") | ||
| gaugeErrorLogger.logIfNeeded("count_refresh_tokens", err) | ||
| } else { | ||
| m.SetActiveTokensCount("refresh", int(activeRefreshTokens)) | ||
| m.SetActiveTokensCount("refresh", clampInt64ToInt(activeRefreshTokens)) | ||
| } | ||
|
|
||
| // Update active device codes count | ||
| @@ -952,5 +963,5 @@ | ||
| pendingDeviceCodes = 0 | ||
| } | ||
|
|
||
| m.SetActiveDeviceCodesCount(int(totalDeviceCodes), int(pendingDeviceCodes)) | ||
| m.SetActiveDeviceCodesCount(clampInt64ToInt(totalDeviceCodes), clampInt64ToInt(pendingDeviceCodes)) | ||
| } |
There was a problem hiding this comment.
Pull request overview
Adds configurable caching to reduce database load from Prometheus gauge metric updates, supporting single- and multi-replica deployments with memory/Redis/rueidisaside backends and configurable update intervals.
Changes:
- Introduces a generic cache interface plus memory, Redis (rueidis), and Redis-aside (rueidisaside) implementations.
- Adds a cache-backed metrics wrapper and integrates it into the gauge update loop with configurable interval/enablement.
- Extends configuration, tests, and documentation for the new metrics cache and gauge update behavior.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| main.go | Initializes selected metrics cache backend, uses cache-backed gauge updates, and closes cache on shutdown |
| internal/metrics/cache.go | Adds read-through caching wrapper for DB-backed gauge metrics queries |
| internal/metrics/cache_test.go | Tests cache wrapper behavior (hit/miss/expiry/partial hits and aside support) |
| internal/cache/interface.go | Defines cache interface used by metrics caching |
| internal/cache/errors.go | Defines cache error types (miss/unavailable/invalid) |
| internal/cache/memory.go | Adds in-memory cache backend implementation |
| internal/cache/rueidis.go | Adds Redis cache backend using rueidis |
| internal/cache/rueidis_aside.go | Adds Redis-aside backend using rueidisaside with client-side caching |
| internal/config/config.go | Adds metrics cache + gauge update configuration and validation |
| internal/config/config_test.go | Adds validation tests for metrics cache configuration |
| .env.example | Documents new metrics gauge update + cache environment variables |
| CLAUDE.md | Documents multi-replica metrics guidance and new env vars |
| go.mod / go.sum | Adds rueidis + rueidisaside dependencies (and updated sums) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Gauge Metrics Update Configuration | ||
| # METRICS_GAUGE_UPDATE_ENABLED=true # Enable gauge metric updates (default: true) | ||
| # # In multi-replica deployments, enable on only ONE instance or use max()/avg() aggregation | ||
| # METRICS_GAUGE_UPDATE_INTERVAL=5m # Update interval for gauge metrics (default: 5m, reduced from 30s) |
There was a problem hiding this comment.
The comment says the 5m default interval is "reduced from 30s", but this change increases the interval. Please update the wording to avoid misleading operators.
| # METRICS_GAUGE_UPDATE_INTERVAL=5m # Update interval for gauge metrics (default: 5m, reduced from 30s) | |
| # METRICS_GAUGE_UPDATE_INTERVAL=5m # Update interval for gauge metrics (default: 5m, increased from 30s) |
| | **METRICS_ENABLED** | false | Enable Prometheus metrics endpoint | | ||
| | METRICS_TOKEN | (empty) | Bearer token for /metrics endpoint (empty = no auth) | | ||
| | METRICS_GAUGE_UPDATE_ENABLED | true | Enable gauge updates (disable on all but one replica) | | ||
| | METRICS_GAUGE_UPDATE_INTERVAL | 5m | Gauge update interval (default: 5m, reduced from 30s) | |
There was a problem hiding this comment.
The environment variable table describes METRICS_GAUGE_UPDATE_INTERVAL as "reduced from 30s" even though the default moved from 30s to 5m (an increase). Please correct this wording for accuracy.
| | METRICS_GAUGE_UPDATE_INTERVAL | 5m | Gauge update interval (default: 5m, reduced from 30s) | | |
| | METRICS_GAUGE_UPDATE_INTERVAL | 5m | Gauge update interval (default: 5m, increased from 30s) | |
internal/metrics/cache.go
Outdated
There was a problem hiding this comment.
The wrapper keys already include a "metrics:" prefix (e.g., "metrics:tokens:..."). Since main.go also configures Redis caches with keyPrefix "metrics:", Redis-backed caches will end up storing keys like "metrics:metrics:tokens:...". Consider either removing the hardcoded "metrics:" prefix from wrapper keys or passing an empty keyPrefix to the cache backends so key naming is consistent and not duplicated.
| key := fmt.Sprintf("metrics:tokens:%s", category) | |
| key := fmt.Sprintf("tokens:%s", category) |
| case config.MetricsCacheTypeRedisAside: | ||
| metricsCache, err = cache.NewRueidisAsideCache( | ||
| cfg.RedisAddr, | ||
| cfg.RedisPassword, | ||
| cfg.RedisDB, | ||
| "metrics:", | ||
| cfg.MetricsCacheClientTTL, | ||
| ) | ||
| if err != nil { | ||
| log.Fatalf("Failed to initialize redis-aside metrics cache: %v", err) | ||
| } | ||
| log.Printf( | ||
| "Metrics cache: redis-aside (addr=%s, db=%d, client_ttl=%s)", | ||
| cfg.RedisAddr, | ||
| cfg.RedisDB, | ||
| cfg.MetricsCacheClientTTL, | ||
| ) | ||
| case config.MetricsCacheTypeRedis: | ||
| metricsCache, err = cache.NewRueidisCache( | ||
| cfg.RedisAddr, | ||
| cfg.RedisPassword, | ||
| cfg.RedisDB, | ||
| "metrics:", | ||
| ) |
There was a problem hiding this comment.
Both Redis cache constructors are passed keyPrefix "metrics:", but the MetricsCacheWrapper already prefixes all keys with "metrics:". This results in duplicated Redis keys ("metrics:metrics:..."). Consider passing an empty prefix here (and letting the wrapper own the namespace) or changing the wrapper to use un-prefixed keys and keep the prefix solely in the cache backend.
| name: "redis cache type without redis address (allowed)", | ||
| config: &Config{ | ||
| RateLimitStore: RateLimitStoreMemory, | ||
| MetricsCacheType: MetricsCacheTypeRedis, | ||
| RedisAddr: "", | ||
| }, | ||
| expectError: false, |
There was a problem hiding this comment.
This test case asserts that METRICS_CACHE_TYPE="redis" is valid when RedisAddr is empty, but runServer will still try to initialize a rueidis client and likely fail/fatal. If RedisAddr is required for redis metrics caching, update this test to expect a validation error and ensure the error message is clear.
| name: "redis cache type without redis address (allowed)", | |
| config: &Config{ | |
| RateLimitStore: RateLimitStoreMemory, | |
| MetricsCacheType: MetricsCacheTypeRedis, | |
| RedisAddr: "", | |
| }, | |
| expectError: false, | |
| name: "redis cache type without redis address", | |
| config: &Config{ | |
| RateLimitStore: RateLimitStoreMemory, | |
| MetricsCacheType: MetricsCacheTypeRedis, | |
| RedisAddr: "", | |
| }, | |
| expectError: true, | |
| errorMsg: `METRICS_CACHE_TYPE="redis" requires REDIS_ADDR`, |
| DisableCache: false, // Enable client-side caching | ||
| // Client-side cache configuration | ||
| CacheSizeEachConn: 128 * 1024 * 1024, // 128MB per connection | ||
| }, | ||
| }) |
There was a problem hiding this comment.
CacheSizeEachConn is hard-coded to 128MB per connection. This can materially increase memory usage (especially with multiple connections) and may not be appropriate for all deployments. Consider making this configurable and/or documenting the expected memory impact, or choosing a more conservative default.
| MetricsCacheTypeRedisAside, | ||
| ) | ||
| } | ||
|
|
There was a problem hiding this comment.
METRICS_GAUGE_UPDATE_INTERVAL is used to create a time.Ticker in main.go; if it's set to 0 or a negative duration, time.NewTicker will panic. Consider adding validation (e.g., require >0 when METRICS_ENABLED && METRICS_GAUGE_UPDATE_ENABLED) to prevent a configuration that can crash the process.
| // Validate metrics gauge update interval to prevent time.NewTicker panics | |
| if c.MetricsEnabled && c.MetricsGaugeUpdateEnabled && c.MetricsGaugeUpdateInterval <= 0 { | |
| return fmt.Errorf( | |
| "invalid METRICS_GAUGE_UPDATE_INTERVAL value: %s (must be > 0 when METRICS_ENABLED and METRICS_GAUGE_UPDATE_ENABLED are true)", | |
| c.MetricsGaugeUpdateInterval, | |
| ) | |
| } |
- Add comprehensive configuration options and documentation for gauge metrics updates and metrics cache, including support for memory, redis, and redis-aside cache types - Introduce the rueidis and rueidisaside libraries for specialized Redis caching strategies - Implement cache interfaces and concrete cache backends (memory, redis, redis-aside), including test coverage for all behaviors - Add environment variable support and validation for metrics cache type and TTL - Create a metrics cache wrapper that gives unified cache-aside support for database count queries (tokens, device codes) - Switch gauge metrics update to use the cache wrapper for count queries, reducing database load in production, and allowing cache-backed metrics - Set the default gauge update interval to 5 minutes for improved database performance; update gauge job logic and interval management accordingly - Add tests for metrics cache validation, cache interface, and cache-assisted metrics updating - Extend the SQLite store with count methods for device codes and pending codes - Ensure all cache resources are cleaned up gracefully during shutdown - Correct minor doc/formatting issues in audit feature description Signed-off-by: appleboy <appleboy.tw@gmail.com>
Uh oh!
There was an error while loading. Please reload this page.