feat: add unified metrics caching with configurable backends by appleboy · Pull Request #33 · appleboy/authgate

appleboy · 2026-02-17T08:22:47Z

Add comprehensive configuration options and documentation for gauge metrics updates and metrics cache, including support for memory, redis, and redis-aside cache types
Introduce the rueidis and rueidisaside libraries for specialized Redis caching strategies
Implement cache interfaces and concrete cache backends (memory, redis, redis-aside), including test coverage for all behaviors
Add environment variable support and validation for metrics cache type and TTL
Create a metrics cache wrapper that gives unified cache-aside support for database count queries (tokens, device codes)
Switch gauge metrics update to use the cache wrapper for count queries, reducing database load in production, and allowing cache-backed metrics
Set the default gauge update interval to 5 minutes for improved database performance; update gauge job logic and interval management accordingly
Add tests for metrics cache validation, cache interface, and cache-assisted metrics updating
Extend the SQLite store with count methods for device codes and pending codes
Ensure all cache resources are cleaned up gracefully during shutdown
Correct minor doc/formatting issues in audit feature description

main.go

+	if err != nil {
+		m.RecordDatabaseQueryError("count_refresh_tokens")
+		gaugeErrorLogger.logIfNeeded("count_refresh_tokens", err)
+	} else {


In general, to fix this class of problems you should avoid converting from a wider integer type (here int64) to a narrower type (here potentially 32‑bit int) without enforcing bounds that guarantee no overflow. The cleanest approach is to keep the value in the wide type all the way through, or to change the API to accept the wider type, rather than down‑casting.

In this codebase, the best minimal fix is to avoid the int(...) casts when recording metrics and pass the int64 counts directly to the metrics recorder, assuming its interface accepts or can be changed to accept int64. However, the snippet shows only the call sites, not the MetricsRecorder interface. Since we are constrained to edit only the provided snippets and must not change other files, we cannot safely change the metrics interface signature. Therefore, the next best fix that preserves existing functionality is to explicitly clamp the int64 values into the valid range of int before casting. On platforms where int is 64‑bit, this clamp will be a no‑op; on 32‑bit platforms, it prevents overflow by capping large counts at math.MaxInt (and similarly handling negative values if they ever occurred).

Concretely, in main.go we will import math and replace each int(...) cast on values that originated as int64 (activeAccessTokens, activeRefreshTokens, totalDeviceCodes, pendingDeviceCodes) with calls to a small helper safeIntFromInt64 that clamps to the int range. This helper will live in main.go below the imports so we don’t modify any other packages. We will then use safeIntFromInt64 in both updateGaugeMetrics and updateGaugeMetricsWithCache when calling m.SetActiveTokensCount and m.SetActiveDeviceCodesCount. This preserves behavior on 64‑bit systems and removes overflow risk on 32‑bit systems, addressing all three CodeQL variants because all of them trace back to these int(...) conversions.

main.go

In general, to fix this type of issue you should avoid narrowing conversions from a larger integer type (here int64) to a smaller one (here potentially 32‑bit int) without bounds checking, or you should keep using the wider type end‑to‑end. Since the MetricsRecorder interface uses int parameters, the simplest safe change is to clamp the int64 values into the valid int range before converting, using math.MinInt/math.MaxInt (which reflect the platform’s int size).

Concretely, the problematic conversions are in main.go in updateGaugeMetricsWithCache:

int(activeAccessTokens) on line 928

int(activeRefreshTokens) on line 937

int(totalDeviceCodes) and int(pendingDeviceCodes) on line 955

We should:

Import the math package in main.go (no new external dependency).

Introduce a small helper function in main.go that takes an int64 and returns an int by clamping to the platform’s int range, e.g.:

func clampInt64ToInt(v int64) int { if v > int64(math.MaxInt) { return math.MaxInt } if v < int64(math.MinInt) { return math.MinInt } return int(v) }

Replace all direct int(...) casts on the metric counts in updateGaugeMetricsWithCache with calls to this helper:

m.SetActiveTokensCount("access", clampInt64ToInt(activeAccessTokens))

m.SetActiveTokensCount("refresh", clampInt64ToInt(activeRefreshTokens))

m.SetActiveDeviceCodesCount(clampInt64ToInt(totalDeviceCodes), clampInt64ToInt(pendingDeviceCodes))

This preserves existing functionality for values within the valid int range and prevents overflow for out‑of‑range values by saturating at the bounds.

No changes are needed in internal/cache/rueidis.go or internal/cache/rueidis_aside.go, because they already operate purely on int64. The only narrowing occurs in main.go.

Copilot

Pull request overview

Adds configurable caching to reduce database load from Prometheus gauge metric updates, supporting single- and multi-replica deployments with memory/Redis/rueidisaside backends and configurable update intervals.

Changes:

Introduces a generic cache interface plus memory, Redis (rueidis), and Redis-aside (rueidisaside) implementations.
Adds a cache-backed metrics wrapper and integrates it into the gauge update loop with configurable interval/enablement.
Extends configuration, tests, and documentation for the new metrics cache and gauge update behavior.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
main.go	Initializes selected metrics cache backend, uses cache-backed gauge updates, and closes cache on shutdown
internal/metrics/cache.go	Adds read-through caching wrapper for DB-backed gauge metrics queries
internal/metrics/cache_test.go	Tests cache wrapper behavior (hit/miss/expiry/partial hits and aside support)
internal/cache/interface.go	Defines cache interface used by metrics caching
internal/cache/errors.go	Defines cache error types (miss/unavailable/invalid)
internal/cache/memory.go	Adds in-memory cache backend implementation
internal/cache/rueidis.go	Adds Redis cache backend using rueidis
internal/cache/rueidis_aside.go	Adds Redis-aside backend using rueidisaside with client-side caching
internal/config/config.go	Adds metrics cache + gauge update configuration and validation
internal/config/config_test.go	Adds validation tests for metrics cache configuration
.env.example	Documents new metrics gauge update + cache environment variables
CLAUDE.md	Documents multi-replica metrics guidance and new env vars
go.mod / go.sum	Adds rueidis + rueidisaside dependencies (and updated sums)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

internal/config/config.go

Copilot · 2026-02-17T08:32:02Z

.env.example

+# Gauge Metrics Update Configuration
+# METRICS_GAUGE_UPDATE_ENABLED=true         # Enable gauge metric updates (default: true)
+#                                           # In multi-replica deployments, enable on only ONE instance or use max()/avg() aggregation
+# METRICS_GAUGE_UPDATE_INTERVAL=5m          # Update interval for gauge metrics (default: 5m, reduced from 30s)


The comment says the 5m default interval is "reduced from 30s", but this change increases the interval. Please update the wording to avoid misleading operators.

Suggested change

# METRICS_GAUGE_UPDATE_INTERVAL=5m # Update interval for gauge metrics (default: 5m, reduced from 30s)

# METRICS_GAUGE_UPDATE_INTERVAL=5m # Update interval for gauge metrics (default: 5m, increased from 30s)

Copilot · 2026-02-17T08:32:02Z

CLAUDE.md

+| **METRICS_ENABLED**           | false                 | Enable Prometheus metrics endpoint                          |
+| METRICS_TOKEN                 | (empty)               | Bearer token for /metrics endpoint (empty = no auth)        |
+| METRICS_GAUGE_UPDATE_ENABLED  | true                  | Enable gauge updates (disable on all but one replica)       |
+| METRICS_GAUGE_UPDATE_INTERVAL | 5m                    | Gauge update interval (default: 5m, reduced from 30s)       |


The environment variable table describes METRICS_GAUGE_UPDATE_INTERVAL as "reduced from 30s" even though the default moved from 30s to 5m (an increase). Please correct this wording for accuracy.

Suggested change

| METRICS_GAUGE_UPDATE_INTERVAL | 5m | Gauge update interval (default: 5m, reduced from 30s) |

| METRICS_GAUGE_UPDATE_INTERVAL | 5m | Gauge update interval (default: 5m, increased from 30s) |

Copilot · 2026-02-17T08:32:02Z

internal/metrics/cache.go

The wrapper keys already include a "metrics:" prefix (e.g., "metrics:tokens:..."). Since main.go also configures Redis caches with keyPrefix "metrics:", Redis-backed caches will end up storing keys like "metrics:metrics:tokens:...". Consider either removing the hardcoded "metrics:" prefix from wrapper keys or passing an empty keyPrefix to the cache backends so key naming is consistent and not duplicated.

Suggested change

key := fmt.Sprintf("metrics:tokens:%s", category)

key := fmt.Sprintf("tokens:%s", category)

Copilot · 2026-02-17T08:32:03Z

main.go

+		case config.MetricsCacheTypeRedisAside:
+			metricsCache, err = cache.NewRueidisAsideCache(
+				cfg.RedisAddr,
+				cfg.RedisPassword,
+				cfg.RedisDB,
+				"metrics:",
+				cfg.MetricsCacheClientTTL,
+			)
+			if err != nil {
+				log.Fatalf("Failed to initialize redis-aside metrics cache: %v", err)
+			}
+			log.Printf(
+				"Metrics cache: redis-aside (addr=%s, db=%d, client_ttl=%s)",
+				cfg.RedisAddr,
+				cfg.RedisDB,
+				cfg.MetricsCacheClientTTL,
+			)
+		case config.MetricsCacheTypeRedis:
+			metricsCache, err = cache.NewRueidisCache(
+				cfg.RedisAddr,
+				cfg.RedisPassword,
+				cfg.RedisDB,
+				"metrics:",
+			)


Both Redis cache constructors are passed keyPrefix "metrics:", but the MetricsCacheWrapper already prefixes all keys with "metrics:". This results in duplicated Redis keys ("metrics:metrics:..."). Consider passing an empty prefix here (and letting the wrapper own the namespace) or changing the wrapper to use un-prefixed keys and keep the prefix solely in the cache backend.

internal/config/config.go

Copilot · 2026-02-17T08:32:03Z

internal/config/config_test.go

+			name: "redis cache type without redis address (allowed)",
+			config: &Config{
+				RateLimitStore:   RateLimitStoreMemory,
+				MetricsCacheType: MetricsCacheTypeRedis,
+				RedisAddr:        "",
+			},
+			expectError: false,


This test case asserts that METRICS_CACHE_TYPE="redis" is valid when RedisAddr is empty, but runServer will still try to initialize a rueidis client and likely fail/fatal. If RedisAddr is required for redis metrics caching, update this test to expect a validation error and ensure the error message is clear.

Suggested change

name: "redis cache type without redis address (allowed)",

config: &Config{

RateLimitStore: RateLimitStoreMemory,

MetricsCacheType: MetricsCacheTypeRedis,

RedisAddr: "",

},

expectError: false,

name: "redis cache type without redis address",

config: &Config{

RateLimitStore: RateLimitStoreMemory,

MetricsCacheType: MetricsCacheTypeRedis,

RedisAddr: "",

},

expectError: true,

errorMsg: `METRICS_CACHE_TYPE="redis" requires REDIS_ADDR`,

Copilot · 2026-02-17T08:32:03Z

internal/cache/rueidis_aside.go

+			DisableCache: false, // Enable client-side caching
+			// Client-side cache configuration
+			CacheSizeEachConn: 128 * 1024 * 1024, // 128MB per connection
+		},
+	})


CacheSizeEachConn is hard-coded to 128MB per connection. This can materially increase memory usage (especially with multiple connections) and may not be appropriate for all deployments. Consider making this configurable and/or documenting the expected memory impact, or choosing a more conservative default.

Copilot · 2026-02-17T08:32:04Z

internal/config/config.go

+			MetricsCacheTypeRedisAside,
+		)
+	}
+


METRICS_GAUGE_UPDATE_INTERVAL is used to create a time.Ticker in main.go; if it's set to 0 or a negative duration, time.NewTicker will panic. Consider adding validation (e.g., require >0 when METRICS_ENABLED && METRICS_GAUGE_UPDATE_ENABLED) to prevent a configuration that can crash the process.

Suggested change

// Validate metrics gauge update interval to prevent time.NewTicker panics

if c.MetricsEnabled && c.MetricsGaugeUpdateEnabled && c.MetricsGaugeUpdateInterval <= 0 {

return fmt.Errorf(

"invalid METRICS_GAUGE_UPDATE_INTERVAL value: %s (must be > 0 when METRICS_ENABLED and METRICS_GAUGE_UPDATE_ENABLED are true)",

c.MetricsGaugeUpdateInterval,

)

}

- Add comprehensive configuration options and documentation for gauge metrics updates and metrics cache, including support for memory, redis, and redis-aside cache types - Introduce the rueidis and rueidisaside libraries for specialized Redis caching strategies - Implement cache interfaces and concrete cache backends (memory, redis, redis-aside), including test coverage for all behaviors - Add environment variable support and validation for metrics cache type and TTL - Create a metrics cache wrapper that gives unified cache-aside support for database count queries (tokens, device codes) - Switch gauge metrics update to use the cache wrapper for count queries, reducing database load in production, and allowing cache-backed metrics - Set the default gauge update interval to 5 minutes for improved database performance; update gauge job logic and interval management accordingly - Add tests for metrics cache validation, cache interface, and cache-assisted metrics updating - Extend the SQLite store with count methods for device codes and pending codes - Ensure all cache resources are cleaned up gracefully during shutdown - Correct minor doc/formatting issues in audit feature description Signed-off-by: appleboy <appleboy.tw@gmail.com>

Copilot AI review requested due to automatic review settings February 17, 2026 08:22

Copilot started reviewing on behalf of appleboy February 17, 2026 08:23 View session

github-advanced-security bot found potential problems Feb 17, 2026

View reviewed changes

Copilot AI reviewed Feb 17, 2026

View reviewed changes

appleboy force-pushed the cache branch from 4be402c to 3b59cac Compare February 17, 2026 09:21

appleboy changed the title ~~feat: add configurable caching for Prometheus metrics updates~~ feat: add unified metrics caching with configurable backends Feb 17, 2026

appleboy closed this Feb 17, 2026

@@ -33,6 +33,7 @@
             	"fmt"
             	"io/fs"
             	"log"
+            	"math"
             	"net/http"
             	"os"
             	"time"
@@ -63,6 +64,17 @@
             	_ "github.com/appleboy/authgate/api" // swagger docs
             )
+            // safeIntFromInt64 safely converts an int64 to int, clamping to the valid int range.
+            func safeIntFromInt64(v int64) int {
+            	if v > int64(math.MaxInt) {
+            		return math.MaxInt
+            	}
+            	if v < int64(math.MinInt) {
+            		return math.MinInt
+            	}
+            	return int(v)
+            }
             //go:embed internal/templates/*
             var templatesFS embed.FS
@@ -888,7 +900,7 @@
             		m.RecordDatabaseQueryError("count_access_tokens")
             		gaugeErrorLogger.logIfNeeded("count_access_tokens", err)
             	} else {
-            		m.SetActiveTokensCount("access", int(activeAccessTokens))
+            		m.SetActiveTokensCount("access", safeIntFromInt64(activeAccessTokens))
             	}
             	// Update active refresh tokens count
@@ -897,7 +909,7 @@
             		m.RecordDatabaseQueryError("count_refresh_tokens")
             		gaugeErrorLogger.logIfNeeded("count_refresh_tokens", err)
             	} else {
-            		m.SetActiveTokensCount("refresh", int(activeRefreshTokens))
+            		m.SetActiveTokensCount("refresh", safeIntFromInt64(activeRefreshTokens))
             	}
             	// Update active device codes count
@@ -906,7 +918,7 @@
             		m.RecordDatabaseQueryError("count_device_codes")
             		gaugeErrorLogger.logIfNeeded("count_device_codes", err)
             	} else {
-            		m.SetActiveDeviceCodesCount(int(totalDeviceCodes), int(pendingDeviceCodes))
+            		m.SetActiveDeviceCodesCount(safeIntFromInt64(totalDeviceCodes), safeIntFromInt64(pendingDeviceCodes))
             	}
             }
@@ -925,7 +937,7 @@
             		m.RecordDatabaseQueryError("count_access_tokens")
             		gaugeErrorLogger.logIfNeeded("count_access_tokens", err)
             	} else {
-            		m.SetActiveTokensCount("access", int(activeAccessTokens))
+            		m.SetActiveTokensCount("access", safeIntFromInt64(activeAccessTokens))
             	}
             	// Update active refresh tokens count
@@ -934,7 +946,7 @@
             		m.RecordDatabaseQueryError("count_refresh_tokens")
             		gaugeErrorLogger.logIfNeeded("count_refresh_tokens", err)
             	} else {
-            		m.SetActiveTokensCount("refresh", int(activeRefreshTokens))
+            		m.SetActiveTokensCount("refresh", safeIntFromInt64(activeRefreshTokens))
             	}
             	// Update active device codes count
@@ -952,5 +964,5 @@
             		pendingDeviceCodes = 0
             	}
-            	m.SetActiveDeviceCodesCount(int(totalDeviceCodes), int(pendingDeviceCodes))
+            	m.SetActiveDeviceCodesCount(safeIntFromInt64(totalDeviceCodes), safeIntFromInt64(pendingDeviceCodes))
             }

@@ -33,6 +33,7 @@
             	"fmt"
             	"io/fs"
             	"log"
+            	"math"
             	"net/http"
             	"os"
             	"time"
@@ -913,6 +914,16 @@
             // updateGaugeMetricsWithCache updates gauge metrics using a cache-backed store.
             // This reduces database load in multi-instance deployments by caching query results.
             // The cache TTL should match the update interval to ensure consistent behavior.
+            func clampInt64ToInt(v int64) int {
+            	if v > int64(math.MaxInt) {
+            		return math.MaxInt
+            	}
+            	if v < int64(math.MinInt) {
+            		return math.MinInt
+            	}
+            	return int(v)
+            }
             func updateGaugeMetricsWithCache(
             	ctx context.Context,
             	cacheWrapper *metrics.MetricsCacheWrapper,
@@ -925,7 +936,7 @@
             		m.RecordDatabaseQueryError("count_access_tokens")
             		gaugeErrorLogger.logIfNeeded("count_access_tokens", err)
             	} else {
-            		m.SetActiveTokensCount("access", int(activeAccessTokens))
+            		m.SetActiveTokensCount("access", clampInt64ToInt(activeAccessTokens))
             	}
             	// Update active refresh tokens count
@@ -934,7 +945,7 @@
             		m.RecordDatabaseQueryError("count_refresh_tokens")
             		gaugeErrorLogger.logIfNeeded("count_refresh_tokens", err)
             	} else {
-            		m.SetActiveTokensCount("refresh", int(activeRefreshTokens))
+            		m.SetActiveTokensCount("refresh", clampInt64ToInt(activeRefreshTokens))
             	}
             	// Update active device codes count
@@ -952,5 +963,5 @@
             		pendingDeviceCodes = 0
             	}
-            	m.SetActiveDeviceCodesCount(int(totalDeviceCodes), int(pendingDeviceCodes))
+            	m.SetActiveDeviceCodesCount(clampInt64ToInt(totalDeviceCodes), clampInt64ToInt(pendingDeviceCodes))
             }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add unified metrics caching with configurable backends#33

feat: add unified metrics caching with configurable backends#33
appleboy wants to merge 1 commit intomainfrom
cache

appleboy commented Feb 17, 2026 •

edited

Loading

Uh oh!

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

	# METRICS_GAUGE_UPDATE_INTERVAL=5m # Update interval for gauge metrics (default: 5m, reduced from 30s)
	# METRICS_GAUGE_UPDATE_INTERVAL=5m # Update interval for gauge metrics (default: 5m, increased from 30s)

	\| METRICS_GAUGE_UPDATE_INTERVAL \| 5m \| Gauge update interval (default: 5m, reduced from 30s) \|
	\| METRICS_GAUGE_UPDATE_INTERVAL \| 5m \| Gauge update interval (default: 5m, increased from 30s) \|

	key := fmt.Sprintf("metrics:tokens:%s", category)
	key := fmt.Sprintf("tokens:%s", category)

+	// Validate metrics gauge update interval to prevent time.NewTicker panics
+	if c.MetricsEnabled && c.MetricsGaugeUpdateEnabled && c.MetricsGaugeUpdateInterval <= 0 {
+		return fmt.Errorf(
+			"invalid METRICS_GAUGE_UPDATE_INTERVAL value: %s (must be > 0 when METRICS_ENABLED and METRICS_GAUGE_UPDATE_ENABLED are true)",
+			c.MetricsGaugeUpdateInterval,
+		)
+	}

Uh oh!

Conversation

appleboy commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Check failure

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

appleboy commented Feb 17, 2026 •

edited

Loading