feat: ai-cache plugin by janiussyafiq · Pull Request #13308 · apache/apisix

janiussyafiq · 2026-04-27T22:16:17Z

Description

Implements the ai-cache plugin, a two-layer response cache for LLM/AI traffic that reduces upstream cost and latency by reusing prior completions for matching prompts.

Architecture:

The plugin sits in the request flow alongside the existing ai-proxy / ai-protocols stack. On every chat-completion request it:

L1 (exact) — hashes the prompt (SHA-256) plus an optional scope (consumer, configured ctx.var keys) and looks up the result in Redis as a JSON blob keyed by ai-cache:l1:<scope>:<prompt_hash>. Hit ⇒ short-circuit with the cached response.
L2 (semantic) — embeds the prompt via OpenAI / Azure OpenAI, runs a KNN search against a RediSearch HNSW vector index (ai-cache-idx-<dim>, dimension-segregated so different embedding models coexist safely), and returns the nearest candidate above semantic.similarity_threshold. Hit ⇒ also writes the entry into L1 so the next exact match skips the embedding cost entirely.
MISS — proxies to the upstream, captures the response body in body_filter, and (in log) writes back into both L1 and L2.

Layers can be enabled independently via the layers array (["exact"], ["semantic"], or both).

Scope / cache key: cache_key.include_consumer and cache_key.include_vars produce a stable scope hash so caches don't leak across consumers or per-tenant variables.

Bypass: bypass_on lets operators skip caching for specific request headers (e.g. Cache-Control: no-cache, debug headers).

Observability: emits X-AI-Cache-Status (HIT-L1 / HIT-L2 / MISS / BYPASS), X-AI-Cache-Similarity (L2 only), and X-AI-Cache-Age response headers. Header names are configurable.

Safety:

L2 index name and key prefix are namespaced by embedding dimension, so swapping models (e.g. text-embedding-3-small 1536-dim → text-embedding-3-large 3072-dim) does not corrupt or wedge the index — old entries simply expire via semantic.ttl.
API keys (semantic.embedding.api_key, redis_password) are encrypted via APISIX's encrypt_fields.
max_cache_body_size (default 1 MiB) prevents the plugin from caching unbounded payloads.
L1 returns cached completions through the protocol's response builder so streaming and non-streaming clients receive a well-formed response in the format they expect.

Compatibility: zero changes to existing plugins; ai-cache piggybacks on ai-proxy / ai-protocols and does not interfere when the route lacks an AI plugin chain.

Which issue(s) this PR fixes:

Fixes #13290

Checklist

I have explained the need for this PR and the problem it solves
I have explained the changes or the new features added to this PR
I have added tests corresponding to this change
I have updated the documentation to reflect this change
I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

… hits

…g latency

…ser cache collisions

… be gated

Copilot

Pull request overview

This PR adds a new ai-cache plugin to APISIX to cache LLM responses in Redis, supporting an L1 exact-match cache and an L2 semantic (embedding/vector-search) cache, plus associated Prometheus metrics, docs, and test coverage.

Changes:

Implement ai-cache plugin with exact + semantic caching, embedding providers (OpenAI / Azure OpenAI), and cache scoping controls.
Add Prometheus exporter metrics for ai-cache hits/misses and embedding latency/failures, with E2E tests.
Add user documentation and register the plugin in docs/config and example config.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`apisix/plugins/ai-cache.lua`	Core plugin logic (cache lookup in access phase, cache write in log phase, response headers).
`apisix/plugins/ai-cache/schema.lua`	Plugin schema, defaults, and encrypted fields.
`apisix/plugins/ai-cache/exact.lua`	L1 Redis exact-match cache implementation.
`apisix/plugins/ai-cache/semantic.lua`	L2 Redis Stack (RediSearch) vector index/search/store implementation.
`apisix/plugins/ai-cache/embeddings/openai.lua`	OpenAI embeddings client.
`apisix/plugins/ai-cache/embeddings/azure_openai.lua`	Azure OpenAI embeddings client.
`apisix/plugins/prometheus/exporter.lua`	Adds ai-cache metrics (hits/misses, embedding latency/failures).
`t/plugin/ai-cache.t`	Functional tests for schema, L1/L2 behavior, bypass, non-2xx, streaming, drivers.
`t/plugin/ai-cache-scope.t`	Tests cache key scoping via vars and consumer isolation.
`t/plugin/prometheus-ai-cache.t`	Prometheus metric assertions for ai-cache behavior.
`docs/en/latest/plugins/ai-cache.md`	New plugin documentation and configuration examples.
`docs/en/latest/config.json`	Adds ai-cache to docs navigation.
`conf/config.yaml.example`	Adds plugin to example plugin list with priority.
`apisix/cli/config.lua`	Registers plugin in CLI plugin list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-30T00:58:42Z

+            if is_stream then
+                core.response.set_header("Content-Type", "text/event-stream")
+            else
+                core.response.set_header("Content-Type", "application/json")
+            end
+            return core.response.exit(200, proto.build_deny_response({
+                stream = is_stream,
+                text = cached_text,
+            }))
+        end


Cache hits are returned using proto.build_deny_response(...). That helper is intended for policy denials and produces protocol-specific error/deny shapes for some protocols (eg openai-embeddings returns an error object), which can make cached responses invalid if this plugin is ever applied to non-chat endpoints. It would be safer to either (1) explicitly restrict ai-cache to chat-style protocols only, or (2) add a protocol method dedicated to building a successful cached response (and include required fields like model/usage where applicable).

proto.build_deny_response could be reuse here to simulate LLM response, and could be rename to proto.build_response_from_text to prevent confusion. added TODO message to be addressed in later PR to keep focus on ai-cache implementation.

…nd admin tests

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

janiussyafiq added 3 commits April 28, 2026 00:09

feat(ai-cache): add plugin skeleton with schema definition

69fed98

feat(ai-cache): implement L1 exact cache with hash-based Redis lookup

aea4028

feat(ai-cache): add embedding drivers for OpenAI and Azure OpenAI

f323e04

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Apr 27, 2026

janiussyafiq added 15 commits April 28, 2026 06:19

chore: fix lint errors

eafce29

refactor(ai-cache): use ai-protocols for protocol-agnostic caching

09e4692

feat(ai-cache): implement L2 semantic cache with Redis Stack KNN search

a899f6a

feat(ai-cache): set Content-Type text/event-stream on streaming cache…

8d64a39

… hits

feat(ai-cache): add Prometheus metrics for hits, misses, and embeddin…

6f72c37

…g latency

docs(ai-cache): add plugin documentation

4cf5c1c

feat(ai-cache): segregate L2 index by embedding dimension

62850aa

docs(ai-cache): update doc to include Redis attr and update caveat

f16c7e3

test(ai-cache): add cache_key scope partitioning tests in dedicated file

0873560

feat(ai-cache): register plugin in default plugin list

9be2ecc

fix(ai-cache): remove sort from compute_scope_hash to prevent cross-u…

d691ea2

…ser cache collisions

feat(ai-cache): add apisix_ai_cache_embedding_failures_total metric

61ea49c

docs(ai-cache): add Ingress Controller tabs to all examples

bf31bc8

test(ai-cache): add multi-rule bypass_on test cases

803f741

docs(ai-cache): note that bypass header is unauthenticated and should…

0c26870

… be gated

nic-6443 requested a review from Copilot April 30, 2026 00:52

Copilot started reviewing on behalf of nic-6443 April 30, 2026 00:53 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

janiussyafiq added 5 commits May 1, 2026 00:07

chore(ai-cache): fix CI failures across lint, eclint, misc-checker, a…

c345fe0

…nd admin tests

docs(ai-cache): document top_k and fix HTTPS endpoint wording

e665d2a

feat(ai-cache): wire semantic.top_k through L2 vector search

6b996cd

fix(ai-cache): guard log() against nil cache key fields on early-MISS

57687ca

docs(ai-cache): mark build_deny_response usage as a follow-up rename

1e42c86

janiussyafiq requested a review from Copilot May 4, 2026 16:27

Copilot started reviewing on behalf of janiussyafiq May 4, 2026 16:28 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

Comment thread apisix/plugins/ai-cache.lua

Comment thread apisix/plugins/ai-cache.lua

Comment thread apisix/plugins/ai-cache/schema.lua

Comment thread apisix/plugins/ai-cache/semantic.lua

janiussyafiq added 15 commits May 5, 2026 02:10

test(prometheus): expand ai-cache metric coverage and reorganize tests

3349acc

fix(ai-cache): populate AI ctx fields on cache hit

46ad8a6

fix(ai-cache): reject empty layers array via schema

84c6b56

fix(ai-cache): handle missing/dropped RediSearch index gracefully

8f9fbfd

test(ai-cache): cover symmetric HIT for include_consumer and L2 scope

d2eca3d

test(ai-cache): expand, tighten, and reorganize cache plugin tests

f692322

fix(ai-cache): expose embedding ssl_verify, document model + timeout

439b9f0

docs(ai-cache): add field descriptions to schema; cap top_k at 100

61530f8

refactor(ai-cache): tighten exact.lua redis pooling and prune dead code

a1de751

fix(ai-cache): tighten semantic.lua redis pooling and atomicize writes

7df710e

refactor(ai-cache): drop log-phase embedding re-fetch

bf250ac

chore(ai-cache): fix lint

4796a1d

docs(ai-cache): remove outdated caveats

69e8c8e

chore(ai-cache): fix lint

a8843cc

docs(ai-cache): refresh examples and tighten cache-status callouts

a1f0128

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ai-cache plugin#13308

feat: ai-cache plugin#13308
janiussyafiq wants to merge 38 commits intoapache:masterfrom
janiussyafiq:feat/ai-cache

janiussyafiq commented Apr 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

janiussyafiq Apr 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

janiussyafiq commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Which issue(s) this PR fixes:

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

janiussyafiq Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

janiussyafiq commented Apr 27, 2026 •

edited

Loading