You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
include_steps=False, # Optional, if True response includes step timing (cache/provider/storage)
128
152
)
129
153
```
130
154
@@ -137,6 +161,7 @@ result = await client.images.generate(
137
161
width=1024,
138
162
height=1024,
139
163
num_images=1,
164
+
include_steps=False, # Set True to get step timing in result.steps
140
165
)
141
166
```
142
167
@@ -147,9 +172,19 @@ result = await client.images.generate(
147
172
-`model` (str): Model identifier used
148
173
-`provider` (str): Provider name
149
174
-`cost` (float): Cost in USD
150
-
-`cache_hit` (bool): Whether the result was served from cache
175
+
-`cache_hit` (bool): Whether the result was served from cache (same model+prompt+size returns cached result with lower latency).
176
+
-`provider_cost_avoided_micro` (int | None): When `cache_hit` is true, provider cost avoided in micro-USD (1e-6 USD). Omitted on cache miss.
151
177
-`latency_ms` (int | None): Request latency in milliseconds
152
178
-`created_at` (datetime): Creation timestamp
179
+
-`output_storage` (str | None): Host/domain where the output is stored (e.g. `storage.googleapis.com` or provider CDN). Present when the API returns it.
180
+
-`output_size_bytes` (int | None): Size of the primary output in bytes, when available.
181
+
-`steps` (list | None): Per-step timing and metadata (e.g. cache lookup, provider call, storage). Only present when `include_steps=True` was passed.
182
+
183
+
## Cache
184
+
185
+
-**Scope:** Cache is **org-scoped** by default (`VISGATE_CACHE_SCOPE=org`). Keys include organization ID so different orgs do not share cache. Set `VISGATE_CACHE_SCOPE=global` to share cache across organizations. TTL is configurable; `VISGATE_CACHE_TTL_SECONDS=0` means never expire.
186
+
-**Exact cache:** Same model + prompt + size → same cache key. Second request returns from cache (lower latency); response includes `cache_hit: true` and `provider_cost_avoided_micro` (cost saved in micro-USD). Example: `examples/07_cache_demo.py`.
187
+
-**Semantic cache:** Similar wording, different text. The API uses Vertex AI embeddings and Firestore to match prompts; when similarity is above threshold, the result is served from cache and the provider is not called. Response includes `cache_hit: true` and `provider_cost_avoided_micro`. Different models’ results can be reused (no model filter). Example: `examples/08_semantic_cache_demo.py`.
153
188
154
189
## Videos Resource
155
190
@@ -185,6 +220,7 @@ result = await client.videos.generate(
185
220
-`provider` (str): Provider name
186
221
-`cost` (float): Cost in USD
187
222
-`cache_hit` (bool): Whether the result was served from cache
223
+
-`provider_cost_avoided_micro` (int | None): When `cache_hit` is true, provider cost avoided in micro-USD (1e-6 USD). Omitted on cache miss.
188
224
-`latency_ms` (int | None): Request latency in milliseconds
The first two steps run without an API key. `VISGATE_API_KEY` is required from step 3 onward.
42
+
43
+
## Testing cache
44
+
45
+
-**Exact cache:** Same model + prompt + size. Second request returns from cache; response includes `cache_hit=True` and `provider_cost_avoided_micro` (cost saved in micro-USD). Run `python examples/07_cache_demo.py`.
46
+
-**Semantic cache:** Similar but different wording; API matches via Vertex AI embedding + Firestore. Second request may return `cache_hit=True` and `provider_cost_avoided_micro`. Run `python examples/08_semantic_cache_demo.py`.
47
+
- Cache is org-scoped by default; see API docs for `VISGATE_CACHE_SCOPE` and TTL.
0 commit comments