Skip to content

Commit a4636dd

Browse files
uzunenescursoragent
andcommitted
feat: provider_cost_avoided_micro in Image/VideoResult; cache scope in API docs
- ImageResult and VideoResult: add provider_cost_avoided_micro (micro-USD saved on cache hit) - docs/API.md: document cache scope (org default, global), TTL=0, provider_cost_avoided_micro - examples 07_cache_demo, 08_semantic_cache_demo: print provider_cost_avoided_micro - examples/README: cache section with org-scope and new field Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 8b6be40 commit a4636dd

6 files changed

Lines changed: 190 additions & 3 deletions

File tree

docs/API.md

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,29 @@
22

33
Short reference. Full OpenAPI spec: your API base URL + `/docs`.
44

5+
## Testing against the live API
6+
7+
Use the **visgate-python** SDK and examples to hit the live API:
8+
9+
```bash
10+
pip install visgate-sdk
11+
export VISGATE_API_KEY=vg-...
12+
13+
# Health and models (minimal check)
14+
python examples/01_live_api_smoke.py
15+
16+
# Exact cache: identical requests; second = cache hit
17+
python examples/07_cache_demo.py
18+
19+
# Semantic cache: similar prompts; second may = cache hit (Vertex AI + Firestore)
20+
python examples/08_semantic_cache_demo.py
21+
22+
# Run all capability examples
23+
python examples/run_all_capabilities.py
24+
```
25+
26+
Base URL defaults to `https://visgateai.com/api/v1`. Override with `VISGATE_BASE_URL` for staging or local.
27+
528
## Installation
629

730
```bash
@@ -125,6 +148,7 @@ result = client.images.generate(
125148
num_images=1, # Optional, default 1
126149
seed=None, # Optional, for reproducibility
127150
params=None, # Optional, additional model-specific parameters
151+
include_steps=False, # Optional, if True response includes step timing (cache/provider/storage)
128152
)
129153
```
130154

@@ -137,6 +161,7 @@ result = await client.images.generate(
137161
width=1024,
138162
height=1024,
139163
num_images=1,
164+
include_steps=False, # Set True to get step timing in result.steps
140165
)
141166
```
142167

@@ -147,9 +172,19 @@ result = await client.images.generate(
147172
- `model` (str): Model identifier used
148173
- `provider` (str): Provider name
149174
- `cost` (float): Cost in USD
150-
- `cache_hit` (bool): Whether the result was served from cache
175+
- `cache_hit` (bool): Whether the result was served from cache (same model+prompt+size returns cached result with lower latency).
176+
- `provider_cost_avoided_micro` (int | None): When `cache_hit` is true, provider cost avoided in micro-USD (1e-6 USD). Omitted on cache miss.
151177
- `latency_ms` (int | None): Request latency in milliseconds
152178
- `created_at` (datetime): Creation timestamp
179+
- `output_storage` (str | None): Host/domain where the output is stored (e.g. `storage.googleapis.com` or provider CDN). Present when the API returns it.
180+
- `output_size_bytes` (int | None): Size of the primary output in bytes, when available.
181+
- `steps` (list | None): Per-step timing and metadata (e.g. cache lookup, provider call, storage). Only present when `include_steps=True` was passed.
182+
183+
## Cache
184+
185+
- **Scope:** Cache is **org-scoped** by default (`VISGATE_CACHE_SCOPE=org`). Keys include organization ID so different orgs do not share cache. Set `VISGATE_CACHE_SCOPE=global` to share cache across organizations. TTL is configurable; `VISGATE_CACHE_TTL_SECONDS=0` means never expire.
186+
- **Exact cache:** Same model + prompt + size → same cache key. Second request returns from cache (lower latency); response includes `cache_hit: true` and `provider_cost_avoided_micro` (cost saved in micro-USD). Example: `examples/07_cache_demo.py`.
187+
- **Semantic cache:** Similar wording, different text. The API uses Vertex AI embeddings and Firestore to match prompts; when similarity is above threshold, the result is served from cache and the provider is not called. Response includes `cache_hit: true` and `provider_cost_avoided_micro`. Different models’ results can be reused (no model filter). Example: `examples/08_semantic_cache_demo.py`.
153188

154189
## Videos Resource
155190

@@ -185,6 +220,7 @@ result = await client.videos.generate(
185220
- `provider` (str): Provider name
186221
- `cost` (float): Cost in USD
187222
- `cache_hit` (bool): Whether the result was served from cache
223+
- `provider_cost_avoided_micro` (int | None): When `cache_hit` is true, provider cost avoided in micro-USD (1e-6 USD). Omitted on cache miss.
188224
- `latency_ms` (int | None): Request latency in milliseconds
189225
- `created_at` (datetime): Creation timestamp
190226

examples/07_cache_demo.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
#!/usr/bin/env python3
2+
"""Cache demo: two identical image requests — second should be cache hit.
3+
4+
Run against live API. First request fills the cache; second returns from cache
5+
(cache_hit=True, lower latency).
6+
7+
VISGATE_API_KEY=vg-... python examples/07_cache_demo.py
8+
"""
9+
from __future__ import annotations
10+
11+
from _common import create_client
12+
13+
14+
def main() -> int:
15+
prompt = "a red apple on a wooden table, studio lighting"
16+
model = "fal-ai/flux/schnell"
17+
width, height = 1024, 1024
18+
19+
with create_client() as client:
20+
# 1) First request — cache miss
21+
r1 = client.images.generate(
22+
model=model,
23+
prompt=prompt,
24+
width=width,
25+
height=height,
26+
num_images=1,
27+
)
28+
print(f"Request 1: cache_hit={r1.cache_hit}, latency_ms={r1.latency_ms}, cost={r1.cost}")
29+
30+
# 2) Second request — same params, expect cache hit
31+
r2 = client.images.generate(
32+
model=model,
33+
prompt=prompt,
34+
width=width,
35+
height=height,
36+
num_images=1,
37+
)
38+
print(
39+
f"Request 2: cache_hit={r2.cache_hit}, latency_ms={r2.latency_ms}, cost={r2.cost}, "
40+
f"provider_cost_avoided_micro={r2.provider_cost_avoided_micro}"
41+
)
42+
43+
if r2.cache_hit and r2.latency_ms is not None and r1.latency_ms is not None:
44+
if r2.latency_ms < r1.latency_ms:
45+
print("OK: Second request was faster (cache hit).")
46+
else:
47+
print("OK: Second request was cache hit (latency may vary).")
48+
elif r2.cache_hit:
49+
print("OK: Second request was cache hit.")
50+
else:
51+
print("Note: Second request was not a cache hit (TTL or key may differ).")
52+
53+
return 0
54+
55+
56+
if __name__ == "__main__":
57+
import sys
58+
sys.exit(main())

examples/08_semantic_cache_demo.py

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
#!/usr/bin/env python3
2+
"""Semantic cache demo: similar (not identical) prompts — second may be cache hit.
3+
4+
The API uses Vertex AI embeddings and Firestore to match semantically similar
5+
prompts. When a match is found above the similarity threshold, the result is
6+
returned from cache without calling the provider, reducing cost significantly.
7+
8+
Run against live API. First request fills the cache; second uses a different
9+
wording but same meaning — may return cache_hit=True if API has semantic
10+
search enabled and embeddings are available.
11+
12+
VISGATE_API_KEY=vg-... python examples/08_semantic_cache_demo.py
13+
"""
14+
from __future__ import annotations
15+
16+
from _common import create_client
17+
18+
19+
def main() -> int:
20+
prompt1 = "a red apple on a wooden table, studio lighting"
21+
prompt2 = "red apple on wooden table with studio lights"
22+
model = "fal-ai/flux/schnell"
23+
width, height = 1024, 1024
24+
25+
with create_client() as client:
26+
# 1) First request — cache miss, result and embedding stored
27+
r1 = client.images.generate(
28+
model=model,
29+
prompt=prompt1,
30+
width=width,
31+
height=height,
32+
num_images=1,
33+
include_steps=True,
34+
)
35+
print(f"Request 1 (exact): cache_hit={r1.cache_hit}, latency_ms={r1.latency_ms}, cost={r1.cost}")
36+
37+
# 2) Second request — semantically similar prompt; may hit semantic cache
38+
r2 = client.images.generate(
39+
model=model,
40+
prompt=prompt2,
41+
width=width,
42+
height=height,
43+
num_images=1,
44+
include_steps=True,
45+
)
46+
print(
47+
f"Request 2 (similar): cache_hit={r2.cache_hit}, latency_ms={r2.latency_ms}, cost={r2.cost}, "
48+
f"provider_cost_avoided_micro={r2.provider_cost_avoided_micro}"
49+
)
50+
51+
if r2.cache_hit:
52+
print("OK: Second request was cache hit (semantic match). Cost avoided vs provider.")
53+
else:
54+
print("Note: Second request was not a cache hit (semantic search may need embeddings/Vertex AI enabled).")
55+
56+
return 0
57+
58+
59+
if __name__ == "__main__":
60+
import sys
61+
sys.exit(main())

examples/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ export VISGATE_RUNWAY_API_KEY="..."
2929
| `04_videos_all_providers.py` | Video generation | Yes |
3030
| `05_usage_history_verify.py` | Usage, logs, dashboard | Yes |
3131
| `06_provider_balances.py` | Provider balance and limits | Yes |
32+
| `07_cache_demo.py` | Exact cache: two identical image requests, second = cache hit | Yes |
33+
| `08_semantic_cache_demo.py` | Semantic cache: similar prompt, cache hit, lower cost (Vertex AI + Firestore) | Yes |
3234

3335
## Run All
3436

@@ -37,3 +39,9 @@ VISGATE_API_KEY=vg-... python examples/run_all_capabilities.py
3739
```
3840

3941
The first two steps run without an API key. `VISGATE_API_KEY` is required from step 3 onward.
42+
43+
## Testing cache
44+
45+
- **Exact cache:** Same model + prompt + size. Second request returns from cache; response includes `cache_hit=True` and `provider_cost_avoided_micro` (cost saved in micro-USD). Run `python examples/07_cache_demo.py`.
46+
- **Semantic cache:** Similar but different wording; API matches via Vertex AI embedding + Firestore. Second request may return `cache_hit=True` and `provider_cost_avoided_micro`. Run `python examples/08_semantic_cache_demo.py`.
47+
- Cache is org-scoped by default; see API docs for `VISGATE_CACHE_SCOPE` and TTL.

src/visgate_sdk/resources/images.py

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,12 @@ class ImageResult:
2323
provider: Provider name (e.g. ``"fal"``).
2424
cost: Cost in USD.
2525
cache_hit: Whether the result was served from cache.
26+
provider_cost_avoided_micro: When cache_hit is True, provider cost avoided in micro-USD (1e-6 USD).
2627
latency_ms: Server-side latency in milliseconds.
2728
created_at: Timestamp of the request.
29+
output_storage: Host/domain where output is stored (e.g. provider CDN). Present when API returns it.
30+
output_size_bytes: Size of primary output in bytes, when available.
31+
steps: Per-step timing/metadata (cache, provider, storage). Present when include_steps=True.
2832
"""
2933

3034
id: str
@@ -33,8 +37,12 @@ class ImageResult:
3337
provider: str
3438
cost: float
3539
cache_hit: bool = False
40+
provider_cost_avoided_micro: Optional[int] = None
3641
latency_ms: Optional[int] = None
3742
created_at: Optional[datetime] = None
43+
output_storage: Optional[str] = None
44+
output_size_bytes: Optional[int] = None
45+
steps: Optional[List[Dict[str, Any]]] = None
3846

3947
@classmethod
4048
def from_dict(cls, data: Dict[str, Any]) -> ImageResult:
@@ -45,8 +53,12 @@ def from_dict(cls, data: Dict[str, Any]) -> ImageResult:
4553
provider=data["provider"],
4654
cost=data.get("cost", 0.0),
4755
cache_hit=data.get("cache_hit", False),
56+
provider_cost_avoided_micro=data.get("provider_cost_avoided_micro"),
4857
latency_ms=data.get("latency_ms"),
4958
created_at=parse_datetime(data.get("created_at")),
59+
output_storage=data.get("output_storage"),
60+
output_size_bytes=data.get("output_size_bytes"),
61+
steps=data.get("steps"),
5062
)
5163

5264
def __repr__(self) -> str:
@@ -73,6 +85,7 @@ def generate(
7385
num_images: int = 1,
7486
seed: Optional[int] = None,
7587
params: Optional[Dict[str, Any]] = None,
88+
include_steps: bool = False,
7689
) -> ImageResult:
7790
"""Generate image(s).
7891
@@ -85,6 +98,7 @@ def generate(
8598
num_images: Number of images to generate. Defaults to 1.
8699
seed: Random seed for reproducibility.
87100
params: Additional model-specific parameters.
101+
include_steps: If True, response includes step timing (cache/provider/storage) in result.steps.
88102
89103
Returns:
90104
ImageResult with generated image URLs and metadata.
@@ -103,7 +117,10 @@ def generate(
103117
if params:
104118
payload.update(params)
105119

106-
data = self._client._request("POST", "/images/generate", json=payload)
120+
query_params = {"include_steps": str(include_steps).lower()} if include_steps else None
121+
data = self._client._request(
122+
"POST", "/images/generate", json=payload, params=query_params
123+
)
107124
return ImageResult.from_dict(data)
108125

109126

@@ -124,6 +141,7 @@ async def generate(
124141
num_images: int = 1,
125142
seed: Optional[int] = None,
126143
params: Optional[Dict[str, Any]] = None,
144+
include_steps: bool = False,
127145
) -> ImageResult:
128146
"""Generate image(s). See :meth:`Images.generate` for details."""
129147
payload: Dict[str, Any] = {
@@ -140,5 +158,8 @@ async def generate(
140158
if params:
141159
payload.update(params)
142160

143-
data = await self._client._request("POST", "/images/generate", json=payload)
161+
query_params = {"include_steps": str(include_steps).lower()} if include_steps else None
162+
data = await self._client._request(
163+
"POST", "/images/generate", json=payload, params=query_params
164+
)
144165
return ImageResult.from_dict(data)

src/visgate_sdk/resources/videos.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ class VideoResult:
2323
provider: Provider name (e.g. ``"runway"``).
2424
cost: Cost in USD.
2525
cache_hit: Whether the result was served from cache.
26+
provider_cost_avoided_micro: When cache_hit is True, provider cost avoided in micro-USD (1e-6 USD).
2627
latency_ms: Server-side latency in milliseconds.
2728
created_at: Timestamp of the request.
2829
"""
@@ -33,6 +34,7 @@ class VideoResult:
3334
provider: str
3435
cost: float
3536
cache_hit: bool = False
37+
provider_cost_avoided_micro: Optional[int] = None
3638
latency_ms: Optional[int] = None
3739
created_at: Optional[datetime] = None
3840

@@ -45,6 +47,7 @@ def from_dict(cls, data: Dict[str, Any]) -> VideoResult:
4547
provider=data["provider"],
4648
cost=data.get("cost", 0.0),
4749
cache_hit=data.get("cache_hit", False),
50+
provider_cost_avoided_micro=data.get("provider_cost_avoided_micro"),
4851
latency_ms=data.get("latency_ms"),
4952
created_at=parse_datetime(data.get("created_at")),
5053
)

0 commit comments

Comments
 (0)