A measured benchmark of how CData Connect AI features reduce Claude token usage on a multi-source enterprise query. Full methodology, reproducible code, and the locked numbers.
| Scenario | Total tokens | Reduction | $ / query |
|---|---|---|---|
| Raw baseline | 183,541 | -- | $0.596 |
| Custom Tools | 4,427 | 97.6% | $0.027 |
Full breakdown in output/results.md (regenerate via python render_final.py).
flowchart LR
U[Same user question]:::mono
C[Claude]:::mono
D["Discovery chain<br/>183K tokens · 22 calls"]:::tool
T["get_incidents Custom Tool<br/>4.4K tokens · 1 call"]:::tool
A1[Answer]:::mono
A2[Answer]:::mono
U --> C
C -->|"Raw — 11 universal tools"| D
D --> A1
C -.->|"Connect AI — 1 Custom Tool"| T
T --> A2
classDef mono fill:#FFFFFF,stroke:#002660,stroke-width:1.5px,color:#15151C;
classDef tool fill:#002660,stroke:#15151C,stroke-width:0.5px,color:#F1EEE9,font-weight:bold;
Same question, two paths. Raw baseline walks the universal-tool discovery chain (getCatalogs → getSchemas → getTables → getColumns → queryData → ...). With Connect AI, one scoped Custom Tool collapses that to a single call.
Static asset:
output/token-flow-simple.webp(for use in HTML / docs). Full elaborate hero diagram:output/token-flow.webp.
- Real multi-turn execution against live CData Connect AI MCP
POST /v1/messageson the Anthropic API per turn- Token counts read live from
usage.input_tokensandusage.output_tokens - 4 to 16 sequential runs per scenario (60 to 75s cooldowns between runs)
- Temperature=0, max_tokens=1024, max_turns=6 (10 for Raw baseline)
- 56 total locked runs across 8 scenarios
- Model:
claude-sonnet-4-6; pricing: $3.00 / MTok input, $15.00 / MTok output
The benchmark measures discovery cost compression. Connect AI's Custom Tools, Derived Views, and Workspaces work by replacing Claude's runtime schema discovery with admin-curated tool surfaces. A 97.6% reduction means the discovery work moved from Claude's runtime (input tokens) to the admin's design-time (one-time setup of the Custom Tool).
The reduction is real and reproducible, but its size depends on three factors that vary per deployment:
- Tool coverage. If your admin has built Custom Tools that match the queries your users actually send, you will see numbers in this range. Queries that fall outside the tool library hit Raw-like discovery costs.
- Query shape. A 3-source cross-system query (this benchmark) maximises the discovery surface that Connect AI compresses. A single-source lookup shows smaller wins because there is less discovery to skip in the first place.
- Caching scope. Connect AI's Caching feature is optimised for warm single-source reuse. The number you see for the Caching scenario here reflects the specific cross-source query and the relational-source constraint (Snowflake cannot be cached); single-source caching produces different (typically better) numbers.
The headline 97.6% is the measured best-case payoff when a Custom Tool is well-aligned to the query -- not an average across an unknown query distribution.
connectai-token-benchmark/
+-- README.md this file
+-- BRIEF.pdf original content brief (context)
+-- BENCHMARK-PROMPTS-AND-QUERIES.md every system prompt, user prompt, SQL
+-- LOCKED_RESULTS.py final published numbers (source of truth)
+-- render_final.py regenerate results.md + charts (no API calls)
+-- run_benchmark.py re-fire the benchmark from scratch
+-- config.py endpoints, pricing, query, model
+-- requirements.txt
+-- .env.example credentials + source-name template
+-- benchmark/ core package
| +-- mcp_client.py SSE-aware MCP JSON-RPC client
| +-- normalizer.py MCP to Anthropic tool schema converter
| +-- scenarios.py 8 scenario definitions
| +-- runner.py count_tokens + messages.create
| +-- pricing.py cost math (Sonnet 4.6 list price)
| +-- reporter.py JSON + CSV + Markdown + per-run ledger
| +-- charts.py matplotlib charts + Mermaid render
+-- seed/ data + per-source loading docs
| +-- accounts.csv, incidents.csv, opportunities.csv (50 rows each, BMK prefix)
| +-- derived_view.sql CREATE statement for the Derived View
| +-- cleanup.sql DELETE WHERE ... LIKE 'BMK%' templates
| +-- load-google-sheets.md easy mode: three Google Sheets connections
| +-- load-salesforce.md, load-servicenow.md, load-snowflake.md authentic mode
+-- output/
+-- results.md, .json, .csv regenerated by render_final.py
+-- run_ledger.jsonl one line per individual run (audit trail)
+-- chart-*.webp 4 brand-palette charts
+-- token-flow.webp/.png/.mmd elaborate hero diagram (full discovery chain)
+-- token-flow-simple.webp/.png/.mmd simple two-path illustration (README + KB inline)
+-- _fixtures.json cached MCP tool lists (offline replay)
Three ways to run this benchmark, in order of setup effort. All three use the same scenario set and code -- only the underlying data sources change.
| Path | Setup time | API spend | Reproduces |
|---|---|---|---|
| A. Offline render | 0 min | $0 | Locked headline numbers, charts, tables straight from LOCKED_RESULTS.py |
| B. Google Sheets (easy) | ~10-15 min | ~$6 (n=3 runs) | Methodology + reduction trend; absolute numbers land near locked (Raw baseline slightly lower) |
| C. Salesforce + ServiceNow + Snowflake (authentic) | ~45-60 min | ~$6 (n=3 runs) | Locked headline numbers exactly (within multi-turn variance) |
- Python 3.10+ and
pip install -r requirements.txt - An Anthropic API key (B / C only) -- ~$6 for
--runs 3, ~$18 for--runs 10 - A CData Connect AI account (B / C only) -- free trial
Regenerate the published charts and tables from the locked numbers. No API calls, no Connect AI account needed.
pip install -r requirements.txt
python render_final.pyOutput lands in output/results.md plus the four chart WebPs and the two
Mermaid flow diagrams.
Easiest live reproduction. Copy three public Google Sheets, create three Google Sheets connections in Connect AI, run.
-
Follow
seed/load-google-sheets.md-- it walks through the sheet copies, the three CData connections, and the Derived View create. -
Set
CDATA_ITSM_CATALOG,CDATA_CRM_CATALOG,CDATA_WH_CATALOGin.envto the connection names you chose (BMK_ITSM,BMK_CRM,BMK_WH). -
Run:
python run_benchmark.py --full --runs 3
Your numbers will land near the locked headline. The Raw baseline drops a bit (Google Sheets has a leaner schema than enterprise Salesforce / SN / Snowflake), so the 97.6% reduction will read as somewhere in the 93-97% range. The trend reproduces; the absolute values shift slightly.
For an authentic reproduction of the exact published numbers.
-
Load the seed data per source:
seed/load-salesforce.md-- accounts (50 rows,BMK_prefix)seed/load-servicenow.md-- incidents (50 rows,BMK-prefix)seed/load-snowflake.md-- opportunities (50 rows, joined by account_id)
-
Create three connections in Connect AI (one per source).
-
Create the Derived View using
seed/derived_view.sql. -
Set
CDATA_ITSM_CATALOG,CDATA_CRM_CATALOG,CDATA_WH_CATALOGin.envto the connection names you chose. -
Run:
python run_benchmark.py --full --runs 3 # ~$6, ~1.5 hrs python run_benchmark.py --full --runs 10 # ~$18, ~5 hrs (publishable CIs)
Copy-Item .env.example .env
# Edit .env:
# - ANTHROPIC_API_KEY, CDATA_EMAIL, CDATA_ACCESS_TOKEN
# - CDATA_ITSM_CATALOG / CRM_CATALOG / WH_CATALOG (your connection names)
# - DERIVED_VIEW_NAME (default BMK_Incident_Account_Revenue)
# - MCP_WORKSPACE_URL, MCP_TOOLKIT_URL (optional)For each of the three sources (Path B or C), in Connect AI:
- Click Sources in the left nav -> + Add Connection.
- Search for the data source type (Google Sheets, Salesforce, ServiceNow, Snowflake, ...).
- Fill in Connection Name (e.g.
BMK_CRM,BMK_ITSM,BMK_WH) and the source-specific connection fields:- Google Sheets --
Spreadsheet Id(workbook file ID), optionallySpreadsheet(tab name) - Salesforce -- OAuth against your dev org
- ServiceNow -- instance URL + Basic/OAuth credentials
- Snowflake -- account URL, warehouse, database, schema, user
- Google Sheets --
- Click Save & Test. Status should change to Authenticated.
- Create a Personal Access Token under your user icon -> Settings ->
Personal Access Tokens -> Create PAT. Paste it into
CDATA_ACCESS_TOKENin your.env.
Full per-source walkthroughs in seed/load-*.md.
Each benchmark scenario measures a different Connect AI feature. Set up all five in the order the scenarios run:
Explorer -> SQL Editor -> paste the cross-source join from
seed/derived_view.sql (or the Google Sheets variant in
seed/load-google-sheets.md) -> Execute -> Save ->
Save as Derived View -> name BMK_Incident_Account_Revenue.
Docs: https://docs.cloud.cdata.com/en/Data-Explorer.md
Workspaces -> + Add -> name BMK_Workspace -> + Add Asset ->
assign your three tables + the Derived View -> View Endpoints -> copy
the AI (MCP) URL into MCP_WORKSPACE_URL.
Docs: https://docs.cloud.cdata.com/en/Workspaces.md
AI -> Toolkits -> + Add -> name BMK_Toolkit -> assign your
connections -> Custom Tools tab -> + Add -> SQL
SELECT TOP 50 * FROM [CData].[DerivedViews].[BMK_Incident_Account_Revenue]
-> Validate SQL -> Save Changes -> enable. Copy the toolkit's
Remote MCP Server URL into MCP_TOOLKIT_URL.
Docs: https://docs.cloud.cdata.com/en/Toolkits.md
Jobs -> Setup Cache Connection (PostgreSQL) -> Save & Test -> + Add Job -> Cache -> pick tables -> frequency -> scheme -> Confirm -> Run Now. Caching is not available for relational sources (Snowflake / Postgres / MySQL). Docs: https://docs.cloud.cdata.com/en/Caching.md
No UI step. Tightly scripted system prompt that pre-specifies the SQL and
forbids discovery. Defined as SYSTEM_PROMPT_SKILL in config.py.
Full Connect AI feature documentation: https://docs.cloud.cdata.com/en/docs
If you have an Anthropic API key but no CData connections, pass --skip-mcp:
python run_benchmark.py --skip-mcp --full --runs 3The runner uses the cached tool definitions from output/_fixtures.json
and routes every tool_use to synthetic payloads. The dispatch is
short-circuited, so numbers will be smaller than the live paths -- but the
harness, scenarios, and aggregation all run end-to-end. Useful for
validating that the kit works before committing to Path C.
- Connect AI runs as the user who issued the PAT. Use a read-only PAT-bound user, ideally in a dev org. The benchmark only reads; a misconfigured external job that shares the same credentials could write.
- Every seeded row is prefixed
BMK_(accounts) orBMK-(tickets). Every WHERE clause is scoped to that prefix. The benchmark touches its own rows only. seed/cleanup.sqlremoves BMK-prefixed rows when you are done.
Reproductions are welcome. If a link back to this repo fits naturally in what you publish, it helps others trace the methodology and the underlying numbers.
Your results will differ from ours; that's expected. The reduction trend across scenarios is what reproduces.
MIT — see LICENSE.