feat(cli): add --cloud mode to recce init for CLL pre-computation by even-wei · Pull Request #1284 · DataRecce/recce

even-wei · 2026-04-09T08:40:54Z

PR checklist

Ensure you have added or ran the appropriate tests for your PR.
DCO signed

What type of PR is this?

Feature -- adds recce init --cloud for CLL pre-computation

What this PR does / why we need it:

Adds --cloud mode to recce init so it can download artifacts from Recce Cloud, compute CLL, and upload results (cll_map.json + cll_cache.db) back to the session S3 bucket.

This enables the Cloud server to serve /cll data before a Recce instance is available (DRC-3183).

Key behaviors:

Downloads manifests + catalogs for both current and base sessions
CLL cache warm-start fallback: current session -> base (production) session -> compute from scratch
Builds full CLL map and serializes to cll_map.json
Uploads both cache and map to Cloud via presigned URLs
Also adds cll_map.json generation to local (non-cloud) recce init

Which issue(s) this PR fixes:

Resolves DRC-3181

Special notes for your reviewer:

Tested locally with jaffle-shop-expand (1058 models): cold start 207s, warm cache 0.4s, CLL map build 13s
Cloud upload not yet testable until the companion cloud-infra PR is deployed
The --cloud flag reuses the same recce_cloud_options as recce server --cloud

Does this PR introduce a user-facing change?:

Added `recce init --cloud --session-id <id>` for pre-computing CLL data in Recce Cloud.

Generated with Claude Code

Add `recce init --cloud --session-id <id>` that: 1. Downloads manifests + catalogs from Recce Cloud session 2. Downloads existing CLL cache (current session → base session fallback) 3. Computes per-node CLL and builds full CLL map 4. Uploads cll_map.json + cll_cache.db back to session S3 This enables the Cloud instance to pre-compute CLL data so the /cll endpoint can serve it without a running Recce instance (DRC-3183). The cache fallback chain (current → base → scratch) means the 200s+ cold-start only happens once per project on production metadata upload. Subsequent PR sessions reuse the warm cache. Also adds cll_map.json generation to local `recce init` (non-cloud), saved alongside the SQLite cache for local development use. Resolves DRC-3181 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>

- Handle get_session 403 error explicitly (was producing misleading "missing org_id" error instead of "access denied") - Fix state_file_host override (was setting nonexistent .host attribute; now correctly overrides base_url and base_url_v2) - Wrap get_download_urls and get_base_session_download_urls in try/except for graceful error handling - Remove duplicate import requests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>

codecov · 2026-04-09T08:51:49Z

Codecov Report

❌ Patch coverage is 91.01124% with 24 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
recce/cli.py	82.70%	23 Missing ⚠️
tests/test_cli_cache.py	99.25%	1 Missing ⚠️

Files with missing lines	Coverage Δ
tests/test_cli_cache.py	`99.77% <99.25%> (-0.23%)`	⬇️
recce/cli.py	`63.83% <82.70%> (+2.18%)`	⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…er-artifact-upload

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>

…er-artifact-upload

Copilot

Pull request overview

Adds a new cloud-aware initialization workflow to the Recce CLI so CLL (column-level lineage) artifacts can be precomputed and made available in Recce Cloud before a Recce server instance is running.

Changes:

Extend recce init with --cloud + --session-id to download dbt artifacts from Recce Cloud, warm-start the CLL cache, and upload results back via presigned URLs.
Generate and persist a full cll_map.json during recce init (local and cloud modes).
Add CLI test coverage for recce init --cloud flows (missing args, download/upload failures, map build failures).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
`recce/cli.py`	Implements `recce init --cloud` artifact download/cache warm-start, full CLL map generation, and optional Cloud upload.
`tests/test_cli_cache.py`	Adds tests covering cloud-mode init argument validation and common failure/edge scenarios.

Copilot · 2026-04-10T03:03:23Z

recce/cli.py

+        console.print(f"[bold]Cloud mode[/bold]: session {session_id}")
+
+        # Get session info
+        session_info = cloud_client.get_session(session_id)


cloud_client.get_session(session_id) can raise RecceCloudException (e.g., non-200/non-403 responses, network issues). Currently this call is not wrapped, so recce init --cloud may crash with an unhandled exception instead of producing a clean CLI error and exit code 1. Catch RecceCloudException here and print a user-friendly error before exiting.

Suggested change

session_info = cloud_client.get_session(session_id)

try:

session_info = cloud_client.get_session(session_id)

except RecceCloudException as e:

console.print(f"[[red]Error[/red]] Failed to get session: {e}")

exit(1)

Copilot · 2026-04-10T03:03:24Z

recce/cli.py

+            url = download_urls.get(artifact_key)
+            if url:
+                resp = requests.get(url)
+                if resp.status_code == 200:
+                    (target_path / filename).write_bytes(resp.content)


Artifact downloads use requests.get(url) and immediately access resp.content. For potentially large manifests/catalogs this is workable, but for large artifacts (and especially when similar logic is used for the cache) it can cause high memory usage and hangs because no timeout is set. Use a reasonable timeout and stream responses to disk (e.g., stream=True + chunked writes), and handle requests.RequestException explicitly so failures are reported cleanly.

Copilot · 2026-04-10T03:03:24Z

recce/cli.py

+        if cll_cache_url:
+            resp = requests.get(cll_cache_url)
+            if resp.status_code == 200 and len(resp.content) > 0:
+                Path(cache_db).write_bytes(resp.content)
+                console.print(f"  Downloaded CLL cache from session ({len(resp.content) / 1024 / 1024:.1f} MB)")


Downloading the existing cll_cache.db reads the entire body into memory via resp.content before writing it. Since the cache DB can be large, this can spike memory usage in CI/containers. Stream the response to a temporary file and move it into place on success (and include a timeout / exception handling).

Copilot · 2026-04-10T03:03:24Z

recce/cli.py

+                with open(cll_map_path, "rb") as f:
+                    resp = requests.put(cll_map_upload_url, data=f, headers={"Content-Type": "application/json"})
+                if resp.status_code in (200, 204):
+                    console.print(f"  Uploaded cll_map.json ({cll_map_path.stat().st_size / 1024 / 1024:.1f} MB)")
+                else:


The upload path uses requests.put(...) without a timeout and doesn't catch requests.RequestException, so transient network issues can hang the command or produce noisy tracebacks. Add a timeout and handle request exceptions similarly to other cloud interactions so recce init --cloud fails/warns deterministically.

Copilot · 2026-04-10T03:03:24Z

recce/cli.py

+            elif not cll_cache_upload_url:
+                logger.debug("No cll_cache_url in upload URLs — cache upload not supported yet")
+
+            console.print("[bold green]Cloud upload complete.[/bold green]")


Cloud upload complete. is printed unconditionally even if one or both uploads fail (HTTP 5xx). This is misleading for automation/CI logs. Track whether each upload succeeded and print a success message only when everything succeeded, otherwise print a completion-with-warnings message (or consider a non-zero exit code if uploads are required).

Copilot · 2026-04-10T03:03:24Z

recce/cli.py

+    console.print("\n[bold]Building full CLL map...[/bold]")
+    t_map_start = time.perf_counter()
+    try:
+        full_cll_map = dbt_adapter.build_full_cll_map()
+        cll_map_path = Path(cache_db).parent / "cll_map.json"


recce init now builds and writes cll_map.json, but there are no assertions in the existing init tests verifying that this file is produced (and has expected structure) for the non-cloud path. Add/extend a unit test in tests/test_cli_cache.py to assert cll_map.json is created next to --cache-db and is valid JSON, so this new behavior is protected against regressions.

even-wei self-assigned this Apr 9, 2026

even-wei requested a review from wcchang1115 April 9, 2026 09:34

even-wei marked this pull request as ready for review April 9, 2026 09:34

even-wei and others added 3 commits April 10, 2026 09:56

Merge branch 'main' into feature/drc-3181-session-cll-computation-aft…

bfb6b6a

…er-artifact-upload

test(cli): add coverage for recce init --cloud mode

1a6b170

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>

Merge branch 'main' into feature/drc-3181-session-cll-computation-aft…

6d23eaa

…er-artifact-upload

gcko requested a review from Copilot April 10, 2026 02:59

Copilot started reviewing on behalf of gcko April 10, 2026 03:00 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

gcko requested review from gcko and removed request for wcchang1115 April 10, 2026 03:08

gcko added the enhancement New feature or request label Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): add --cloud mode to recce init for CLL pre-computation#1284

feat(cli): add --cloud mode to recce init for CLL pre-computation#1284
even-wei wants to merge 5 commits intomainfrom
feature/drc-3181-session-cll-computation-after-artifact-upload

even-wei commented Apr 9, 2026

Uh oh!

codecov bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

even-wei commented Apr 9, 2026

Uh oh!

codecov bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Apr 9, 2026 •

edited

Loading