Skip to content

feat(cost): semantic caching engine with pgvector integration (#396)#440

Open
Mohammedsami001 wants to merge 5 commits into
sreerevanth:mainfrom
Mohammedsami001:feature/issue-396-semantic-caching
Open

feat(cost): semantic caching engine with pgvector integration (#396)#440
Mohammedsami001 wants to merge 5 commits into
sreerevanth:mainfrom
Mohammedsami001:feature/issue-396-semantic-caching

Conversation

@Mohammedsami001

@Mohammedsami001 Mohammedsami001 commented Jun 19, 2026

Copy link
Copy Markdown

Description

Closes #396

This PR introduces the Semantic Caching Engine to significantly reduce LLM costs and latency by intercepting openai network requests and returning cached responses for semantically identical prompts.

The feature was implemented using strict Test-Driven Development (TDD) to ensure robust handling of caching edge cases, cache-eviction, and network abstractions.

What Was Accomplished

1. SemanticCache Engine (agentwatch/cost/semantic_cache.py)

  • Built an in-memory & database-agnostic caching engine.
  • Implements both exact hashing (SHA-256) and fuzzy semantic matching using _cosine_similarity.
  • Contains integrated TTL validation to automatically ignore stale entries based on a configurable timeframe (ttl_days).

2. OpenAI Network Interception (agentwatch/adapters/interception.py)

  • Dynamically monkeys-patches openai.AsyncClient.chat.completions.create via patch_openai().
  • Defensively intercepts requests, routing them to the semantic cache before hitting the network.
  • Safely reconstructs standard ChatCompletion objects upon cache hits, ensuring downstream applications (e.g., streaming and non-streaming) are completely unaware the response was fetched locally.
  • Robustness: Protects against crashing AsyncStream attributes by safely validating hasattr(response, "choices").

3. PostgreSQL / pgvector Integration (agentwatch/models/cache.py)

  • Added the SemanticCacheEntry SQLAlchemy model leveraging the Vector column type from pgvector.
  • When an AsyncSession is provided to the cache manager, it scales out of local memory and queries the backend using pgvector's native .cosine_distance to rapidly sort vector similarity.

4. Per-Session Overrides

  • Global caching constraints (e.g., AGENTWATCH_CACHE_TTL_DAYS) can be dynamically bypassed or overridden via extra_body={"agentwatch_metadata": {"cache_ttl_days": X}} in the LLM payload, giving granular control back to the specific execution session.

Testing & Verification

Comprehensive TDD testing is included (tests/test_caching.py). All 6 behavioral suites successfully pass:

  • Exact matching retrieval.
  • Semantic fuzzy matching retrieval.
  • TTL logic expiration and cache invalidation.
  • Network isolation and monkey-patch interception.
  • Session-level override precedence.
  • Mocked database backend operations.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Summary by CodeRabbit

Release Notes

  • New Features

    • Added semantic caching for model responses with TTL-based expiration (in-memory and optional database-backed storage).
    • Cache retrieval uses vector similarity matching, and cached results can be served for both standard and streamed requests.
    • Integrated OpenAI request interception to serve cached completions automatically.
  • Chores

    • Added pgvector dependency and openai dev extra for cache/interception support.
  • Tests

    • Expanded async test coverage for semantic caching, TTL behavior, OpenAI patching/unpatching, and database read/write paths.

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@Mohammedsami001, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 41 minutes and 13 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8f8f67e2-c9fa-4e53-8b69-60fc5fac1521

📥 Commits

Reviewing files that changed from the base of the PR and between 4da6c02 and 4609575.

📒 Files selected for processing (1)
  • tests/test_caching.py
📝 Walkthrough

Walkthrough

Adds a semantic caching engine with in-memory hash-based and pgvector-backed storage, TTL support, and OpenAI request interception. SemanticCacheManager provides exact-match lookup; SemanticCache extends the existing implementation with configurable TTL and optional database persistence via SemanticCacheEntry SQLAlchemy model. patch_openai monkey-patches AsyncCompletions.create to check the cache and serve or populate responses. Comprehensive async tests validate all paths including TTL expiry, DB write/read, streaming, and exception handling.

Changes

Semantic Caching Engine

Layer / File(s) Summary
Cache data shapes: CacheHit, CacheEntry, and SemanticCacheEntry model
agentwatch/cost/caching.py, agentwatch/cost/semantic_cache.py, agentwatch/models/cache.py
CacheHit dataclass holds prompt hash, response text, and framework; CacheEntry gains a created_at UTC timestamp field for TTL tracking; SemanticCacheEntry SQLAlchemy model defines the semantic_cache table with a 384-dimension pgvector column, framework, and server-default created_at timestamp.
SemanticCacheManager: in-memory hash-based store and search
agentwatch/cost/caching.py
SemanticCacheManager maintains an internal dictionary using SHA-256 prompt hashes combined with framework to form composite keys; store method hashes and caches CacheHit entries, and search method returns cached CacheHit via dictionary lookup.
SemanticCache: TTL handling and optional DB-backed persistence
agentwatch/cost/semantic_cache.py
SemanticCache.__init__ stores ttl_days and db_session parameters; get method accepts optional ttl_days_override, queries the DB with cosine-distance filtering and TTL cutoff when a session is present, falls back to in-memory cosine similarity with expired-entry pruning; set method persists SemanticCacheEntry rows to the database when a session is present and skips in-memory storage.
OpenAI AsyncCompletions monkey-patching adapter
agentwatch/adapters/interception.py
patch_openai replaces AsyncCompletions.create with an async wrapper that reads TTL from AGENTWATCH_CACHE_TTL_DAYS env var and/or extra_body["agentwatch_metadata"]["cache_ttl_days"], checks the semantic cache using the last message content as the key, returns a synthetic ChatCompletion (or async stream of ChatCompletionChunk) on cache hits, and stores real responses on misses; unpatch_openai restores the original method.
Async test suite
tests/test_caching.py
17 tests covering SemanticCacheManager exact-match store/search, semantic matching via mocked embeddings, TTL expiry via backdated timestamps, OpenAI interception with patch/unpatch, per-request extra_body TTL override, mocked DB session write/read and exception handling, streaming responses, cache eviction, empty embeddings/memory fallback, and robustness when cache population fails.
Dependencies and CLI help text
pyproject.toml, agentwatch/cli/demo.py
pgvector>=0.2.0 added to runtime dependencies; openai>=1.0.0 added to dev dependencies; CLI demo help text updated to use backtick-wrapped placeholders.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant AsyncCompletions as Patched<br/>AsyncCompletions.create
  participant SemanticCache
  participant DB as PostgreSQL<br/>with pgvector
  participant OpenAI as OpenAI API

  Caller->>AsyncCompletions: create(messages, extra_body, stream, ...)
  AsyncCompletions->>AsyncCompletions: Parse TTL from AGENTWATCH_CACHE_TTL_DAYS<br/>and extra_body.metadata.cache_ttl_days
  AsyncCompletions->>SemanticCache: get(last_message_content, ttl_override)
  
  alt DB session configured
    SemanticCache->>DB: SELECT by cosine_distance with TTL cutoff
    alt DB match found
      DB-->>SemanticCache: SemanticCacheEntry.response_text
    else No DB match
      SemanticCache->>SemanticCache: prune expired in-memory entries
      SemanticCache->>SemanticCache: cosine similarity search
    end
  else No DB session
    SemanticCache->>SemanticCache: prune expired in-memory entries
    SemanticCache->>SemanticCache: cosine similarity search
  end
  
  alt Cache hit and TTL enabled
    SemanticCache-->>AsyncCompletions: cached response_text
    alt Stream requested
      AsyncCompletions-->>Caller: async generator of ChatCompletionChunk
    else Non-stream
      AsyncCompletions-->>Caller: synthetic ChatCompletion
    end
  else Cache miss
    AsyncCompletions->>OpenAI: original create(messages, ...)
    OpenAI-->>AsyncCompletions: ChatCompletion response
    AsyncCompletions->>SemanticCache: set(prompt, response_text, framework=openai)
    alt DB session configured
      SemanticCache->>DB: INSERT SemanticCacheEntry (hashed prompt, embedding, response, framework, created_at)
      DB-->>SemanticCache: commit
    else In-memory storage
      SemanticCache->>SemanticCache: store CacheEntry with embedding
    end
    AsyncCompletions-->>Caller: real ChatCompletion
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • sreerevanth/AgentWatch#412: Directly extends the same SemanticCache/CacheEntry implementation touched by that PR, adding TTL handling, DB persistence via SemanticCacheEntry, and OpenAI AsyncCompletions interception on top of its in-memory foundation.

Poem

🐇 Hop, hop—a cache so bright,
No more asking OpenAI twice tonight!
With pgvector humming and TTLs set,
Semantic savings we'll never forget.
Each prompt hashed, each response stored—
The most economical bunny reward! 🥕✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive Changes to agentwatch/cli/demo.py update help text formatting from quoted to backtick placeholders, which is outside the semantic caching feature scope and may need justification. Clarify whether the demo.py formatting changes are intentional or should be addressed in a separate PR focused on documentation improvements.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(cost): semantic caching engine with pgvector integration' accurately reflects the main objective of introducing a semantic caching engine with database integration.
Linked Issues check ✅ Passed All key technical requirements from issue #396 are met: vector store integration with pgvector, configurable similarity thresholding and TTL, OpenAI adapter interception, and multiple acceptance criteria are satisfied by the implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (2)
tests/test_caching.py (2)

95-203: ⚡ Quick win

Add interception regression tests for malformed TTL and streaming mode.

Current suite doesn’t assert behavior for invalid AGENTWATCH_CACHE_TTL_DAYS / cache_ttl_days values or stream=True, which are high-risk paths in the wrapper.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_caching.py` around lines 95 - 203, Add two new test functions to
cover high-risk paths not currently tested. First, create a test for malformed
TTL values by testing invalid inputs for both the AGENTWATCH_CACHE_TTL_DAYS
environment variable (e.g., non-numeric strings) and the cache_ttl_days override
in extra_body metadata to ensure the semantic cache gracefully handles these
invalid values. Second, create a test for streaming mode by calling
client.chat.completions.create with stream=True parameter and verify that the
cache behaves correctly when streaming is enabled, checking whether streaming
responses are properly cached or handled appropriately. Both tests should follow
the same pattern as test_semantic_cache_manager_interception and
test_semantic_cache_manager_config_override by mocking the embedding provider,
using patch_openai and unpatch_openai context, and asserting expected behavior.

205-241: ⚡ Quick win

Add a DB-path negative test that enforces similarity threshold.

This test currently mocks a DB hit unconditionally; it won’t catch regressions where low-similarity nearest neighbors are incorrectly returned.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_caching.py` around lines 205 - 241, The
test_semantic_cache_manager_db_backend function currently mocks unconditional
database hits without verifying that the similarity threshold is enforced. Add a
negative test case after the existing assertions that sets up a second
mock_entry with embedding vectors that produce low similarity (e.g., orthogonal
vectors like [1.0, 0.0] vs [0.0, 1.0]) and verifies that when cache.get() is
called with a query that has low similarity to the cached entry, it returns None
or no hit instead of the cached response, confirming the similarity threshold
blocks incorrect matches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agentwatch/adapters/interception.py`:
- Around line 79-83: The await semantic_cache.set() call in the response
handling block is not protected from exceptions, which means any failure in the
cache population operation will raise and mask the successful upstream response.
Wrap the await semantic_cache.set() call in a try-except block to catch and
handle any exceptions gracefully, ensuring that cache failures do not prevent
the successful response from being returned to the caller.
- Around line 28-37: The code directly converts untrusted values using int()
without error handling, which can raise ValueError and crash request handling.
Add try-except blocks around both the int(global_ttl_env) conversion on line 29
and the int(override_ttl) conversion on line 37 to catch ValueError exceptions.
For each conversion, handle the exception gracefully by either logging the
invalid value, falling back to a default TTL, or skipping the override, ensuring
the system continues operating safely with malformed input.
- Around line 45-50: The current code mutates the shared
`semantic_cache.ttl_days` instance variable across an await point, which causes
concurrent requests to interfere with each other's TTL settings. Remove the
lines that save the original ttl_days value and restore it after the await in
the semantic_cache.get call. Instead, modify the SemanticCache.get method
signature to accept an optional ttl_days_override parameter, and pass the ttl
value as an argument to that method rather than mutating the instance state
directly. This way, the TTL override is scoped to the specific request without
affecting shared state.
- Around line 40-74: The code currently returns a ChatCompletion object on all
cache hits, but this breaks the API contract when stream=True is passed in
kwargs. Before returning the cached_response in the semantic cache hit block,
check if kwargs.get("stream") is True. If streaming is enabled, construct and
return an AsyncStream containing ChatCompletionChunk objects with the
appropriate delta fields instead of the ChatCompletion object. Only return the
ChatCompletion object when streaming is disabled or not specified.

In `@agentwatch/cost/caching.py`:
- Around line 23-35: The cache dictionary (self._cache) is keyed only by
prompt_hash in both the storage operation where CacheHit is assigned and the
retrieval operation in the search method, which causes identical prompts from
different frameworks to overwrite each other. Modify the cache key generation to
include both the prompt_hash and the framework parameter so that the same prompt
with different frameworks are stored as separate cache entries. Update the key
construction logic in the cache assignment (around the CacheHit instantiation)
and in the search method's cache retrieval call to use a composite key that
combines prompt_hash and framework.

In `@agentwatch/cost/semantic_cache.py`:
- Around line 79-85: The return statement in the semantic cache lookup is
returning best_match_db.response_text without verifying that the match meets the
similarity threshold. Add a check after retrieving best_match_db to calculate
the cosine distance between the query_vec and the best match's prompt_vector,
then only return best_match_db.response_text if the distance satisfies the
threshold condition (distance <= 1 - similarity_threshold). If the threshold is
not met, allow the function to continue or return None to indicate no suitable
match was found.
- Around line 131-140: The database commit operation in the SemanticCacheEntry
creation block can raise exceptions and cause caller failures even when the
upstream operation succeeds. Wrap the self.db_session.add(db_entry) and await
self.db_session.commit() calls in a try-except block that logs any errors
without re-raising them, ensuring cache persistence failures do not propagate to
callers and break successful operations.

In `@pyproject.toml`:
- Line 11: The requires-python setting in pyproject.toml specifies >=3.10, but
the code uses datetime.UTC in agentwatch/cost/semantic_cache.py (lines 25, 67,
93) which is only available in Python 3.11+. Either update the requires-python
constraint to >=3.11 to match the actual minimum version required by the
codebase, or replace all occurrences of datetime.UTC with datetime.timezone.utc
throughout semantic_cache.py, which is compatible with Python 3.10.

---

Nitpick comments:
In `@tests/test_caching.py`:
- Around line 95-203: Add two new test functions to cover high-risk paths not
currently tested. First, create a test for malformed TTL values by testing
invalid inputs for both the AGENTWATCH_CACHE_TTL_DAYS environment variable
(e.g., non-numeric strings) and the cache_ttl_days override in extra_body
metadata to ensure the semantic cache gracefully handles these invalid values.
Second, create a test for streaming mode by calling
client.chat.completions.create with stream=True parameter and verify that the
cache behaves correctly when streaming is enabled, checking whether streaming
responses are properly cached or handled appropriately. Both tests should follow
the same pattern as test_semantic_cache_manager_interception and
test_semantic_cache_manager_config_override by mocking the embedding provider,
using patch_openai and unpatch_openai context, and asserting expected behavior.
- Around line 205-241: The test_semantic_cache_manager_db_backend function
currently mocks unconditional database hits without verifying that the
similarity threshold is enforced. Add a negative test case after the existing
assertions that sets up a second mock_entry with embedding vectors that produce
low similarity (e.g., orthogonal vectors like [1.0, 0.0] vs [0.0, 1.0]) and
verifies that when cache.get() is called with a query that has low similarity to
the cached entry, it returns None or no hit instead of the cached response,
confirming the similarity threshold blocks incorrect matches.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b5c8e4c1-4e8d-4146-b7d4-e1526c4e76e0

📥 Commits

Reviewing files that changed from the base of the PR and between 3b1f4b5 and 383c9dd.

📒 Files selected for processing (6)
  • agentwatch/adapters/interception.py
  • agentwatch/cost/caching.py
  • agentwatch/cost/semantic_cache.py
  • agentwatch/models/cache.py
  • pyproject.toml
  • tests/test_caching.py

Comment thread agentwatch/adapters/interception.py Outdated
Comment thread agentwatch/adapters/interception.py
Comment thread agentwatch/adapters/interception.py Outdated
Comment thread agentwatch/adapters/interception.py Outdated
Comment thread agentwatch/cost/caching.py Outdated
Comment thread agentwatch/cost/semantic_cache.py
Comment thread agentwatch/cost/semantic_cache.py
Comment thread pyproject.toml Outdated
@Mohammedsami001

Copy link
Copy Markdown
Author

Hi @sreerevanth 👋

I've pushed a commit addressing the CodeRabbit feedback. The PR is now ready to merge!

Fixes include:

  • Robust Parsing: Guarded TTL parsing with try-except blocks.
  • State Mutation: Passed ttl_days_override directly to avoid mutating shared state across await.
  • Cache Isolation: Namespaced cache keys by framework to prevent cross-provider collisions.
  • Streaming Support: Returns an AsyncStream of chunks on cache hits when stream=True to maintain the API contract.
  • Version targeting: Reverted the pyproject.toml bump to correctly stay on Python 3.12.

Tests are passing locally. Let me know if anything else is needed! 🚀

@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

🧪 PR Test Results

Check Result
Tests (pytest tests/) ✅ success
Lint (ruff check .) ❌ failure
Coverage (agentwatch) 74.20%

Python 3.12 · commit 4609575

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/test_caching.py (2)

382-429: ⚡ Quick win

Add assertions to verify streaming miss behavior.

The test successfully exercises the streaming cache-miss code path but doesn't include any assertions about the collected chunks (lines 423-425). Without verification, the test won't catch regressions in chunk structure, count, or content propagation from the network.

🧪 Suggested assertions to validate streaming behavior
     chunks = []
     async for chunk in gen:
         chunks.append(chunk)
+
+    # Verify network chunks are properly returned
+    assert len(chunks) > 0
+    assert chunks[0].choices[0].message.content == "chunk1"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_caching.py` around lines 382 - 429, The test collects chunks from
the streaming response into the chunks variable but does not include any
assertions to verify the collected data. After the async for loop that collects
chunks, add assertions to validate that the chunks list is not empty, contains
the expected chunk count, and that the chunk structure and content match what
was mocked in the mock_network function (verify the MockChunk objects contain
the correct content like "chunk1"). This will ensure regressions in chunk
propagation are caught.

34-61: ⚡ Quick win

Consider testing true semantic similarity rather than exact embedding match.

The mock on line 43 returns identical embeddings [1.0, 0.0] for both prompts since they both contain "reverse a string". This results in cosine similarity of 1.0 (exact match) rather than a realistic semantic similarity value like 0.96. While the test correctly exercises the semantic matching code path, it doesn't validate the fuzzy matching behavior the test name implies.

♻️ Suggested enhancement for realistic semantic similarity testing
 async def mock_embed(self, texts):
-    return [[1.0, 0.0] if "reverse a string" in t else [0.0, 1.0] for t in texts]
+    # Return similar but distinct embeddings to test fuzzy matching
+    return [[1.0, 0.0] if "Can you write" in t else [0.98, 0.2] for t in texts]

This would produce embeddings with cosine similarity ≈ 0.96, genuinely testing the semantic threshold logic.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_caching.py` around lines 34 - 61, The mock_embed function in
test_semantic_cache_manager_semantic_match currently returns identical
embeddings [1.0, 0.0] for both the original and altered prompts since they both
contain "reverse a string", resulting in a cosine similarity of 1.0 (exact
match). Modify the mock_embed function to return different but semantically
similar embeddings for the two prompts such that their cosine similarity is
approximately 0.96, which will properly test the fuzzy matching behavior against
the similarity_threshold of 0.90 rather than testing an exact embedding match.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_caching.py`:
- Around line 382-429: The test collects chunks from the streaming response into
the chunks variable but does not include any assertions to verify the collected
data. After the async for loop that collects chunks, add assertions to validate
that the chunks list is not empty, contains the expected chunk count, and that
the chunk structure and content match what was mocked in the mock_network
function (verify the MockChunk objects contain the correct content like
"chunk1"). This will ensure regressions in chunk propagation are caught.
- Around line 34-61: The mock_embed function in
test_semantic_cache_manager_semantic_match currently returns identical
embeddings [1.0, 0.0] for both the original and altered prompts since they both
contain "reverse a string", resulting in a cosine similarity of 1.0 (exact
match). Modify the mock_embed function to return different but semantically
similar embeddings for the two prompts such that their cosine similarity is
approximately 0.96, which will properly test the fuzzy matching behavior against
the similarity_threshold of 0.90 rather than testing an exact embedding match.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 95eb1102-c239-404c-84b9-5662cbf35eb9

📥 Commits

Reviewing files that changed from the base of the PR and between 383c9dd and 4da6c02.

📒 Files selected for processing (7)
  • agentwatch/adapters/interception.py
  • agentwatch/cli/demo.py
  • agentwatch/cost/caching.py
  • agentwatch/cost/semantic_cache.py
  • agentwatch/models/cache.py
  • pyproject.toml
  • tests/test_caching.py
✅ Files skipped from review due to trivial changes (1)
  • agentwatch/cli/demo.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • agentwatch/models/cache.py
  • agentwatch/adapters/interception.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Premium] CST-005: Semantic Caching Engine

2 participants