Skip to content

feat(cli): add --cloud mode to recce init for CLL pre-computation#1284

Open
even-wei wants to merge 5 commits intomainfrom
feature/drc-3181-session-cll-computation-after-artifact-upload
Open

feat(cli): add --cloud mode to recce init for CLL pre-computation#1284
even-wei wants to merge 5 commits intomainfrom
feature/drc-3181-session-cll-computation-after-artifact-upload

Conversation

@even-wei
Copy link
Copy Markdown
Contributor

@even-wei even-wei commented Apr 9, 2026

PR checklist

  • Ensure you have added or ran the appropriate tests for your PR.
  • DCO signed

What type of PR is this?

Feature -- adds recce init --cloud for CLL pre-computation

What this PR does / why we need it:

Adds --cloud mode to recce init so it can download artifacts from Recce Cloud, compute CLL, and upload results (cll_map.json + cll_cache.db) back to the session S3 bucket.

This enables the Cloud server to serve /cll data before a Recce instance is available (DRC-3183).

Key behaviors:

  • Downloads manifests + catalogs for both current and base sessions
  • CLL cache warm-start fallback: current session -> base (production) session -> compute from scratch
  • Builds full CLL map and serializes to cll_map.json
  • Uploads both cache and map to Cloud via presigned URLs
  • Also adds cll_map.json generation to local (non-cloud) recce init

Which issue(s) this PR fixes:

Resolves DRC-3181

Special notes for your reviewer:

  • Tested locally with jaffle-shop-expand (1058 models): cold start 207s, warm cache 0.4s, CLL map build 13s
  • Cloud upload not yet testable until the companion cloud-infra PR is deployed
  • The --cloud flag reuses the same recce_cloud_options as recce server --cloud

Does this PR introduce a user-facing change?:

Added `recce init --cloud --session-id <id>` for pre-computing CLL data in Recce Cloud.

Generated with Claude Code

Add `recce init --cloud --session-id <id>` that:
1. Downloads manifests + catalogs from Recce Cloud session
2. Downloads existing CLL cache (current session → base session fallback)
3. Computes per-node CLL and builds full CLL map
4. Uploads cll_map.json + cll_cache.db back to session S3

This enables the Cloud instance to pre-compute CLL data so the /cll
endpoint can serve it without a running Recce instance (DRC-3183).

The cache fallback chain (current → base → scratch) means the 200s+
cold-start only happens once per project on production metadata upload.
Subsequent PR sessions reuse the warm cache.

Also adds cll_map.json generation to local `recce init` (non-cloud),
saved alongside the SQLite cache for local development use.

Resolves DRC-3181

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
@even-wei even-wei self-assigned this Apr 9, 2026
- Handle get_session 403 error explicitly (was producing misleading
  "missing org_id" error instead of "access denied")
- Fix state_file_host override (was setting nonexistent .host attribute;
  now correctly overrides base_url and base_url_v2)
- Wrap get_download_urls and get_base_session_download_urls in
  try/except for graceful error handling
- Remove duplicate import requests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 91.01124% with 24 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
recce/cli.py 82.70% 23 Missing ⚠️
tests/test_cli_cache.py 99.25% 1 Missing ⚠️
Files with missing lines Coverage Δ
tests/test_cli_cache.py 99.77% <99.25%> (-0.23%) ⬇️
recce/cli.py 63.83% <82.70%> (+2.18%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@even-wei even-wei requested a review from wcchang1115 April 9, 2026 09:34
@even-wei even-wei marked this pull request as ready for review April 9, 2026 09:34
even-wei and others added 3 commits April 10, 2026 09:56
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new cloud-aware initialization workflow to the Recce CLI so CLL (column-level lineage) artifacts can be precomputed and made available in Recce Cloud before a Recce server instance is running.

Changes:

  • Extend recce init with --cloud + --session-id to download dbt artifacts from Recce Cloud, warm-start the CLL cache, and upload results back via presigned URLs.
  • Generate and persist a full cll_map.json during recce init (local and cloud modes).
  • Add CLI test coverage for recce init --cloud flows (missing args, download/upload failures, map build failures).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
recce/cli.py Implements recce init --cloud artifact download/cache warm-start, full CLL map generation, and optional Cloud upload.
tests/test_cli_cache.py Adds tests covering cloud-mode init argument validation and common failure/edge scenarios.

console.print(f"[bold]Cloud mode[/bold]: session {session_id}")

# Get session info
session_info = cloud_client.get_session(session_id)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloud_client.get_session(session_id) can raise RecceCloudException (e.g., non-200/non-403 responses, network issues). Currently this call is not wrapped, so recce init --cloud may crash with an unhandled exception instead of producing a clean CLI error and exit code 1. Catch RecceCloudException here and print a user-friendly error before exiting.

Suggested change
session_info = cloud_client.get_session(session_id)
try:
session_info = cloud_client.get_session(session_id)
except RecceCloudException as e:
console.print(f"[[red]Error[/red]] Failed to get session: {e}")
exit(1)

Copilot uses AI. Check for mistakes.
Comment on lines +388 to +392
url = download_urls.get(artifact_key)
if url:
resp = requests.get(url)
if resp.status_code == 200:
(target_path / filename).write_bytes(resp.content)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Artifact downloads use requests.get(url) and immediately access resp.content. For potentially large manifests/catalogs this is workable, but for large artifacts (and especially when similar logic is used for the cache) it can cause high memory usage and hangs because no timeout is set. Use a reasonable timeout and stream responses to disk (e.g., stream=True + chunked writes), and handle requests.RequestException explicitly so failures are reported cleanly.

Copilot uses AI. Check for mistakes.
Comment on lines +425 to +429
if cll_cache_url:
resp = requests.get(cll_cache_url)
if resp.status_code == 200 and len(resp.content) > 0:
Path(cache_db).write_bytes(resp.content)
console.print(f" Downloaded CLL cache from session ({len(resp.content) / 1024 / 1024:.1f} MB)")
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downloading the existing cll_cache.db reads the entire body into memory via resp.content before writing it. Since the cache DB can be large, this can spike memory usage in CI/containers. Stream the response to a temporary file and move it into place on success (and include a timeout / exception handling).

Copilot uses AI. Check for mistakes.
Comment on lines +634 to +638
with open(cll_map_path, "rb") as f:
resp = requests.put(cll_map_upload_url, data=f, headers={"Content-Type": "application/json"})
if resp.status_code in (200, 204):
console.print(f" Uploaded cll_map.json ({cll_map_path.stat().st_size / 1024 / 1024:.1f} MB)")
else:
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upload path uses requests.put(...) without a timeout and doesn't catch requests.RequestException, so transient network issues can hang the command or produce noisy tracebacks. Add a timeout and handle request exceptions similarly to other cloud interactions so recce init --cloud fails/warns deterministically.

Copilot uses AI. Check for mistakes.
elif not cll_cache_upload_url:
logger.debug("No cll_cache_url in upload URLs — cache upload not supported yet")

console.print("[bold green]Cloud upload complete.[/bold green]")
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloud upload complete. is printed unconditionally even if one or both uploads fail (HTTP 5xx). This is misleading for automation/CI logs. Track whether each upload succeeded and print a success message only when everything succeeded, otherwise print a completion-with-warnings message (or consider a non-zero exit code if uploads are required).

Copilot uses AI. Check for mistakes.
Comment on lines +602 to +606
console.print("\n[bold]Building full CLL map...[/bold]")
t_map_start = time.perf_counter()
try:
full_cll_map = dbt_adapter.build_full_cll_map()
cll_map_path = Path(cache_db).parent / "cll_map.json"
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recce init now builds and writes cll_map.json, but there are no assertions in the existing init tests verifying that this file is produced (and has expected structure) for the non-cloud path. Add/extend a unit test in tests/test_cli_cache.py to assert cll_map.json is created next to --cache-db and is valid JSON, so this new behavior is protected against regressions.

Copilot uses AI. Check for mistakes.
@gcko gcko requested review from gcko and removed request for wcchang1115 April 10, 2026 03:08
@gcko gcko added the enhancement New feature or request label Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants