feat(ai-cache): add semantic (L2) cache layer by janiussyafiq · Pull Request #13632 · apache/apisix

janiussyafiq · 2026-06-30T07:16:31Z

Description

Adds the optional semantic (L2) cache layer to ai-cache. On an exact (L1) miss, the request prompt is embedded and matched against a RediSearch cosine vector index; a hit above similarity_threshold is served and backfilled into L1. The layer is strictly opt-in via layers: ["exact", "semantic"], so the default exact-only path is unchanged.

Embedding providers: openai and azure_openai.
Per effective-model and per-tenant (cache_key) partition isolation of the vector index.
Fail-open: any embedding or vector-store error degrades to a MISS, never a 5xx.
Response status reported as HIT-L1 / HIT-L2 (with X-AI-Cache-Similarity).

Which issue(s) this PR fixes:

Part of #13290

Checklist

I have explained the need for this PR and the problem it solves
I have explained the changes or the new features added to this PR
I have added tests corresponding to this change
I have updated the documentation to reflect this change
I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

…ck_https call

…ivers, guard encode+write)

…solation

Add TEST 16/17/18 to t/plugin/ai-cache-semantic.t: - TEST 16: embedding-provider returns 500 → semantic skips fail-open, L1 exact cache unaffected; MISS then HIT on identical request proves the broken embeddings endpoint never breaks L1 or upstream traffic. - TEST 17: include_vars=["http_x_tenant"] partitions L2; same prompt with same fixed vector under tenant "globex" gets a MISS after tenant "acme" populated L2 — no cross-tenant semantic leakage. Positive control (acme again) confirms the acme entry is intact (L1 HIT). - TEST 18: azure_openai driver unit test mirroring TEST 7/8 style. Mock server at 7745 rejects requests without the api-key header (proving the driver does NOT send Authorization), accepts with it, and returns data[1].embedding; mock at 7746 returns 500 → nil+err.

- Add `layers` attribute row (array, default ["exact"], values exact/semantic) - Add Semantic (L2) Attributes subsection with full schema table for all semantic.* fields matching schema.lua verbatim (thresholds, top_k, distance_metric cosine-only, ttl 86400, match.*, embedding oneOf, vector_search) - Add Redis Stack (RediSearch) caution callout: required when semantic layer is enabled; L1 and L2 share the same connection - Update cache_headers description to include X-AI-Cache-Similarity header (emitted on semantic hit; value = cosine similarity = 1 − distance) - Add exact+semantic worked example (Admin API JSON + ADC YAML forms) - Add security note on multi-tenant deployments (include_consumer/include_vars) - Update description to reflect both layers are now available - Apply identical changes to zh doc with translated prose

…polish - schema: wrap similarity_threshold line to ≤100 chars (luacheck) - schema: add contains={const="exact"} to layers — ["semantic"]-only is now invalid - embeddings/openai + azure_openai: drop unused err assignment from json.decode (luacheck); add local HTTP_OK; guard not res.body before decode (mirrors ai-rag pattern) - vector-search/redis: ensure_index gains a prefix param; FT.CREATE PREFIX uses it instead of the hardcoded "ai-cache:l2:" so the caller controls the key namespace - semantic: add l2_base(conf) helper; index_name, lookup, and write all read conf.semantic.vector_search.redis.index (default "ai-cache") — wires the config knob - key: fix wrong context_fingerprint doc comment; restore PR-1 "why" comments in build_repr's effective block (model precedence + endpoint significance) - ai-cache: format X-AI-Cache-Similarity to 4 d.p. via string.format("%.4f", ...) - docs (en + zh): note that "exact" is always active and must be in layers - tests: update TEST 9+10 for new ensure_index signature; add TEST 19 (schema rejects ["semantic"]-only); add TEST 20 (custom index="myidx" MISS→HIT + L2 key prefix check)

Cache hits now report which layer served them via X-AI-Cache-Status: HIT-L1 (exact) or HIT-L2 (semantic). The en/zh docs and the exact-cache suite (t/plugin/ai-cache.t) are updated to match. Rework t/plugin/ai-cache-semantic.t into the codebase's declarative test-nginx idiom: granular schema/unit blocks, and end-to-end flows split into route-setup + request blocks (--- request / --- response_headers / --- wait) over the shared :1980 X-AI-Fixture upstream. Drive the embedding mock from real text-embedding-3-small responses (captured at dimensions=64) stored as t/fixtures/openai/embeddings-*.json; the mock logic lives in t/lib/ai_cache_mock.lua. Schema-rejection tests assert the exact validation error via --- response_body eval.

…nality

…ding drivers, add ssl_verify and timeout to schema

…re embedding configurations

…sage handling and protocol checks

Copilot

Pull request overview

Adds an opt-in semantic (L2) caching layer to the existing ai-cache plugin. On an L1 miss, the plugin can embed the prompt and query a Redis Stack (RediSearch) vector index for a sufficiently similar prior response, serving it as an L2 hit and backfilling L1; defaults remain exact-only.

Changes:

Implement semantic (L2) cache lookup/write path backed by Redis Stack RediSearch and embedding providers (OpenAI, Azure OpenAI).
Update ai-cache runtime behavior and headers to report HIT-L1 / HIT-L2 (plus similarity header on L2 hits).
Add comprehensive unit/e2e tests, fixtures, and docs for semantic caching configuration and behavior.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
t/plugin/ai-cache.t	Updates existing tests to expect `HIT-L1` status on exact cache hits.
t/plugin/ai-cache-semantic.t	Adds new test suite covering schema validation, embedding drivers, vector search, isolation behavior, and fail-open semantics.
t/lib/ai_cache_mock.lua	Adds a mock embeddings upstream that replays captured embedding fixtures and validates auth behavior.
t/fixtures/openai/embeddings-capital.json	Adds captured embedding fixture used by semantic cache tests.
t/fixtures/openai/embeddings-capital-city.json	Adds captured embedding fixture used by semantic cache tests.
t/fixtures/openai/embeddings-largest-city.json	Adds captured embedding fixture used by semantic cache tests.
t/fixtures/openai/embeddings-tire.json	Adds captured embedding fixture used by semantic cache tests.
Makefile	Installs new `ai-cache` submodules for embeddings and vector-search at build/install time.
docs/en/latest/plugins/ai-cache.md	Documents semantic caching (Redis Stack requirement, config fields, headers, and examples).
docs/zh/latest/plugins/ai-cache.md	Chinese documentation updates for semantic caching configuration and examples.
apisix/plugins/ai-cache/vector-search/redis.lua	Adds Redis Stack (RediSearch) helpers: index creation, upsert, and KNN search with float32 packing.
apisix/plugins/ai-cache/semantic.lua	Implements L2 semantic cache: prompt extraction, embedding, partitioning, lookup, L1 backfill, and L2 write.
apisix/plugins/ai-cache/schema.lua	Extends schema with `layers` and `semantic` configuration; adds encryption paths for embedding API keys.
apisix/plugins/ai-cache/key.lua	Extends key logic to support semantic partitioning and context fingerprinting utilities.
apisix/plugins/ai-cache/embeddings/base.lua	Shared HTTP fetch + response parsing for embeddings providers.
apisix/plugins/ai-cache/embeddings/openai.lua	OpenAI embeddings driver (Bearer auth, optional endpoint/model/dimensions).
apisix/plugins/ai-cache/embeddings/azure_openai.lua	Azure OpenAI embeddings driver (api-key header, endpoint required).
apisix/plugins/ai-cache.lua	Integrates L2 lookup on L1 miss, adds new headers (`HIT-L1`/`HIT-L2`, similarity), and schedules L2 writes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…handling

…ntent and update error handling in Redis upsert

…ng and Redis index management

membphis · 2026-07-01T05:42:14Z

One compatibility concern: this PR changes the existing exact-cache hit header from X-AI-Cache-Status: HIT to X-AI-Cache-Status: HIT-L1, even when the semantic layer is not enabled.

The -L1 suffix also feels a bit odd as a user-facing status, and this may break existing clients, tests, dashboards, or log parsing that already check for HIT. Could we keep the exact-cache status as HIT for backward compatibility, and only use a distinct value such as HIT-L2 when the semantic layer serves the response? Another option would be to add a separate cache-layer header if we need to expose the serving layer explicitly.

janiussyafiq · 2026-07-01T06:01:23Z

One compatibility concern: this PR changes the existing exact-cache hit header from X-AI-Cache-Status: HIT to X-AI-Cache-Status: HIT-L1, even when the semantic layer is not enabled.

The -L1 suffix also feels a bit odd as a user-facing status, and this may break existing clients, tests, dashboards, or log parsing that already check for HIT. Could we keep the exact-cache status as HIT for backward compatibility, and only use a distinct value such as HIT-L2 when the semantic layer serves the response? Another option would be to add a separate cache-layer header if we need to expose the serving layer explicitly.

@membphis The initial plan was to keep using HIT for both cases (L1 and L2). However, I realized that it is easier for debugging (for testing) and improve readability so we can exactly know which layer was getting hit. One thing to note is that special case for L2, a unique header will appear upon hit, X-AI-Cache-Similarity in comparison to L1. To answer your concern about breaking existing clients, tests, etc, since the current ai-cache version hasn't been released hence I think it is safe for the current change. WDYT?

membphis · 2026-07-01T06:07:55Z

Thanks, that makes sense if ai-cache has not been released yet.

One extra question: what is the expected user-facing distinction between L1 and L2 here? My understanding is that the two-layer design is mainly an internal optimization/correctness model: L1 is deterministic and cheap, while L2 is approximate and more expensive because it requires embedding plus vector search. Since an L2 hit already has X-AI-Cache-Similarity, do we need to expose HIT-L1 in X-AI-Cache-Status, or could we keep the simpler HIT for exact hits and use either HIT-L2 or a separate layer/similarity header only for semantic hits?

janiussyafiq · 2026-07-01T06:14:26Z

Thanks, that makes sense if ai-cache has not been released yet.

One extra question: what is the expected user-facing distinction between L1 and L2 here? My understanding is that the two-layer design is mainly an internal optimization/correctness model: L1 is deterministic and cheap, while L2 is approximate and more expensive because it requires embedding plus vector search. Since an L2 hit already has X-AI-Cache-Similarity, do we need to expose HIT-L1 in X-AI-Cache-Status, or could we keep the simpler HIT for exact hits and use either HIT-L2 or a separate layer/similarity header only for semantic hits?

I believe it is a matter of design, both of your points are legit. I simply reverted back to design choice here in the pic (ref #13290):

I guess we can keep HIT for both to remain consistent with previous test cases.

…documentation

membphis

Reviewed, no blocking issues found.

janiussyafiq added 15 commits June 29, 2026 12:10

feat(ai-cache): add layers enum and semantic schema (cosine-only)

7a57832

refactor(ai-cache): simplify check_schema to single unconditional che…

1b51980

…ck_https call

feat(ai-cache): add context_fingerprint and partition keying for L2

bd76fc7

feat(ai-cache): add openai and azure_openai embeddings drivers

01b1e25

feat(ai-cache): add RediSearch vector-search driver

231b069

refactor(ai-cache): drop unused locals; use hex partition in TEST 10

3ccc8e0

feat(ai-cache): add semantic orchestration (embed-text, lookup, write)

907d477

fix(ai-cache): harden semantic.lua fail-open boundary (pre-require dr…

74967c0

…ivers, guard encode+write)

feat(ai-cache): wire semantic layer into access/log with L1 backfill

b00f15d

fix(ai-cache): pcall guard semantic boundary; tighten TEST 14 model i…

70c9649

…solation

feat(ai-cache): refactor embedding drivers to use shared base functio…

12800a4

…nality

janiussyafiq marked this pull request as ready for review June 30, 2026 07:21

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Jun 30, 2026

janiussyafiq added 3 commits June 30, 2026 16:04

feat(ai-cache): remove schema definitions from Azure and OpenAI embed…

a4c65c0

…ding drivers, add ssl_verify and timeout to schema

feat(ai-cache): add ssl_verify and timeout options for OpenAI and Azu…

1f519b1

…re embedding configurations

feat(ai-cache): enhance semantic layer functionality with context mes…

4d45571

…sage handling and protocol checks

shreemaan-abhishek requested a review from Copilot June 30, 2026 15:32

Copilot started reviewing on behalf of shreemaan-abhishek June 30, 2026 15:33 View session

Copilot AI reviewed Jun 30, 2026

View reviewed changes

Comment thread apisix/plugins/ai-cache/semantic.lua Outdated

Comment thread apisix/plugins/ai-cache/semantic.lua Outdated

janiussyafiq added 2 commits July 1, 2026 09:44

feat(ai-cache): update endpoint configurations and improve embedding …

007664e

…handling

feat(ai-cache): add window_has_nontext function to handle non-text co…

29b8ec4

…ntent and update error handling in Redis upsert

nic-6443 reviewed Jul 1, 2026

View reviewed changes

Comment thread apisix/plugins/ai-cache.lua Outdated

nic-6443 reviewed Jul 1, 2026

View reviewed changes

Comment thread apisix/plugins/ai-cache/vector-search/redis.lua Outdated

feat(ai-cache): refactor semantic layer for improved embedding handli…

98bfeb8

…ng and Redis index management

nic-6443 previously approved these changes Jul 1, 2026

View reviewed changes

feat(ai-cache): unify cache hit status to 'HIT' across responses and …

b2e15aa

…documentation

janiussyafiq dismissed nic-6443’s stale review via b2e15aa July 1, 2026 06:24

nic-6443 approved these changes Jul 2, 2026

View reviewed changes

AlinsRan approved these changes Jul 2, 2026

View reviewed changes

membphis approved these changes Jul 2, 2026

View reviewed changes

nic-6443 merged commit 7fd4515 into apache:master Jul 2, 2026
25 of 26 checks passed

janiussyafiq deleted the feat/ai-cache-semantic branch July 2, 2026 04:29

janiussyafiq mentioned this pull request Jul 2, 2026

feat(ai-cache): add streaming support with format tagging #13644

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ai-cache): add semantic (L2) cache layer#13632

feat(ai-cache): add semantic (L2) cache layer#13632
nic-6443 merged 22 commits into
apache:masterfrom
janiussyafiq:feat/ai-cache-semantic

janiussyafiq commented Jun 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

membphis commented Jul 1, 2026

Uh oh!

janiussyafiq commented Jul 1, 2026

Uh oh!

membphis commented Jul 1, 2026

Uh oh!

janiussyafiq commented Jul 1, 2026

Uh oh!

membphis left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

janiussyafiq commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Which issue(s) this PR fixes:

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

membphis commented Jul 1, 2026

Uh oh!

janiussyafiq commented Jul 1, 2026

Uh oh!

membphis commented Jul 1, 2026

Uh oh!

janiussyafiq commented Jul 1, 2026

Uh oh!

membphis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

janiussyafiq commented Jun 30, 2026 •

edited

Loading