fix(cll): include adapter_type in cache key to prevent cross-database poisoning by even-wei · Pull Request #1285 · DataRecce/recce

even-wei · 2026-04-09T22:57:48Z

PR checklist

Ensure you have added or ran the appropriate tests for your PR.
DCO signed

What type of PR is this?

Bug fix + refactor

What this PR does / why we need it:

Fixes cross-database cache poisoning in the CLL SQLite cache and redesigns the cache key for better invalidation.

Bug: The CLL cache key did not include the database adapter type. When a user switched adapters (e.g. Snowflake → DuckDB), the cache silently returned lineage computed under the wrong dialect — Snowflake uppercases all identifiers while DuckDB preserves case, causing column mapping failures.

Redesign: While investigating, we found the cache key used raw SQL strings and parent node IDs as proxies. Replaced with a more principled design using dbt-computed checksums:

Before	After
`sha256(node_id + raw_code + parent_ids + column_names)`	`sha256(adapter_type + checksum + parent_checksums + column_names)`

Key improvements:

adapter_type — prevents cross-database poisoning (the original bug)
checksum — uses dbt's pre-computed sha256(raw_code) instead of re-hashing large SQL strings
parent_checksums — cascading invalidation: if any parent's SQL changes, children recompute (previously only tracked parent IDs, not content)
node_id removed from content key — already included in the DB lookup key (CllCache.make_node_key), was redundant
Jinja fallback warning — logger.warning when compiled_code is missing and the fragile Jinja rendering path is used

Verified with jaffle-shop-expand (1060 models):

DuckDB cold cache → 1060 computed (194s)
DuckDB warm cache → All 1060 cached (0.2s)
Switched manifest to adapter_type: snowflake → 0 cache hits, 973 recomputed (fix confirmed)

Which issue(s) this PR fixes:

Resolves DRC-3199

Special notes for your reviewer:

Backward compatible — old cache entries (hashed with the old key format) become natural cache misses and recompute on first access. No migration or recce cache clear needed.

Sources, exposures, and metrics have no checksum in the manifest (no SQL), so their node ID is used as a stable placeholder.

Does this PR introduce a user-facing change?:

Users who switch between database adapters (e.g. DuckDB ↔ Snowflake) with CLL cache enabled will no longer get silently incorrect lineage from stale cache entries. Additionally, changes to parent model SQL now correctly invalidate downstream cache entries.

🤖 Generated with Claude Code

… poisoning The CLL cache key was sha256(node_id + raw_code + parents + columns) which did not include the database adapter type. When a user switched between adapters (e.g. Snowflake → DuckDB), the cache would return lineage computed under the wrong dialect — Snowflake uppercases all identifiers while DuckDB preserves case, causing silent column mapping failures. Add adapter_type as the first component of the content hash so that lineage computed under one dialect is never returned for another. Also add a warning log when compiled_code is missing and the fragile Jinja fallback path is used. Resolves DRC-3199 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>

The warning said "has no compiled_code" but the fallback also fires for alias collisions where compiled_code exists. Use a generic message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>

codecov · 2026-04-09T23:06:35Z

Codecov Report

❌ Patch coverage is 98.61111% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
recce/cli.py	90.00%	1 Missing ⚠️

Files with missing lines	Coverage Δ
recce/adapter/dbt_adapter/__init__.py	`78.92% <100.00%> (+0.13%)`	⬆️
recce/util/cll.py	`93.65% <ø> (ø)`
tests/test_cli_cache.py	`100.00% <100.00%> (ø)`
tests/util/test_cll_cache.py	`99.55% <100.00%> (+<0.01%)`	⬆️
recce/cli.py	`61.79% <90.00%> (+0.15%)`	⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Improve the cache key to use dbt-computed checksums instead of raw content: - Remove node_id from content key (already in DB lookup key via make_node_key) - Replace raw_code with manifest checksum (sha256 already computed by dbt) - Replace parent ID list with parent checksums — cascading invalidation so if any parent's SQL changes, children recompute Sources/exposures/metrics have no checksum (no SQL), so their node ID is used as a stable placeholder. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>

The test helper uses `hash(raw_code)` which returns an int, and some older dbt versions may also store non-string checksums. Wrap with `str()` to ensure the content key always receives a string. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>

even-wei and others added 2 commits April 10, 2026 06:57

even-wei self-assigned this Apr 10, 2026

even-wei and others added 2 commits April 10, 2026 09:23

even-wei force-pushed the feature/drc-3199-stale-cache-can-silently-return-empty-lineage branch from 288611f to df05c81 Compare April 10, 2026 02:05

even-wei marked this pull request as ready for review April 10, 2026 02:17

even-wei requested a review from wcchang1115 April 10, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cll): include adapter_type in cache key to prevent cross-database poisoning#1285

fix(cll): include adapter_type in cache key to prevent cross-database poisoning#1285
even-wei wants to merge 4 commits intomainfrom
feature/drc-3199-stale-cache-can-silently-return-empty-lineage

even-wei commented Apr 9, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

even-wei commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

even-wei commented Apr 9, 2026 •

edited

Loading

codecov bot commented Apr 9, 2026 •

edited

Loading