Skip to content

Add byok migration script#3584

Merged
TheodoreSpeaks merged 5 commits intostagingfrom
feat/migrate-byok-script
Mar 14, 2026
Merged

Add byok migration script#3584
TheodoreSpeaks merged 5 commits intostagingfrom
feat/migrate-byok-script

Conversation

@TheodoreSpeaks
Copy link
Collaborator

@TheodoreSpeaks TheodoreSpeaks commented Mar 14, 2026

Summary

Adding hosted keys will overwrite anyone's currently existing api keys. Added migration script to move existing api keys to byok so our users don't get auto-switched over.
Fixes #(issue)

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: ___________

Testing

Ran in staging and local, validated that byok is migrated over.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

@cursor
Copy link

cursor bot commented Mar 14, 2026

PR Summary

Medium Risk
Adds a one-off migration script that reads/decrypts stored env vars and inserts encrypted BYOK keys into the database; incorrect mappings or resolution rules could migrate the wrong key per workspace/provider.

Overview
Adds a new Bun-based migration script (packages/db/scripts/migrate-block-api-keys-to-byok.ts) to copy existing block-level API keys into workspace_byok_keys per workspace/provider without modifying the original block data.

The script supports a dry-run audit (detects conflicts, previews inserts, writes a workspace-id list for a follow-up live run) and a live mode that encrypts selected keys (including resolving {{ENV_VAR}} references from workspace/personal env vars) and inserts them with ON CONFLICT DO NOTHING, plus batching/concurrency throttling and summary stats.

Written by Cursor Bugbot for commit 7f6eb10. This will update automatically on new commits. Configure here.

@vercel
Copy link

vercel bot commented Mar 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Mar 14, 2026 8:04pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 14, 2026

Greptile Summary

This PR adds a self-contained Bun migration script (migrate-block-api-keys-to-byok.ts) that backfills the workspace_byok_keys table by scanning existing workflow_blocks for block-level API keys and promoting them to workspace-scoped BYOK entries. The script handles both plaintext keys and {{ENV_VAR}} references (decrypting them from the environment/workspace_environment tables), supports a safe two-phase dry-run → live-run flow, processes workspaces concurrently in batches, and skips inserting a key if a BYOK entry for that provider already exists.

Key observations:

  • The encryption format produced by the script's local encryptSecret (iv:encrypted:authTag) correctly matches what the application's decryptSecret in byok.ts expects — the migrated keys will decrypt successfully at runtime.
  • The dry-run output file (migrate-byok-workspace-ids.txt) can include workspace IDs where all key-resolution attempts failed, causing the live run to silently process those workspaces with no inserts.
  • KEY_SOURCE_PRIORITY sorts ascending so plaintext (0) is chosen first, but the conflict log message says "Using highest-priority key" — either the log or the sort order is inverted relative to the intended semantics.
  • SLEEP_MS is 30_000 ms but the in-code comment describes it as a 60-second pause.
  • The index function parameter in processWorkspace shadows the index named import from drizzle-orm/pg-core.

Confidence Score: 3/5

  • The script is non-destructive (original block values are untouched, live run requires an explicit --from-file flag), but the priority conflict-resolution logic and the dry-run output accuracy issues should be clarified before running in production.
  • The overall design is sound — two-phase dry/live, onConflictDoNothing, encryption format is compatible with the app. However, the shouldWriteWorkspaceId logic can write workspace IDs to the file even when no keys are insertable, the KEY_SOURCE_PRIORITY sort order vs. log message inconsistency could silently pick the wrong key in a conflict, and the WIP PR description notes this is not yet fully reviewed/tested.
  • packages/db/scripts/migrate-block-api-keys-to-byok.ts — particularly the conflict resolution priority logic and dry-run file-writing conditions.

Important Files Changed

Filename Overview
packages/db/scripts/migrate-block-api-keys-to-byok.ts New one-shot migration script that reads block-level API keys from workflow_blocks, decrypts env-var references, re-encrypts the resolved keys, and upserts them into workspace_byok_keys. Several issues: dry-run workspace-ID file can include workspaces with no insertable keys, the KEY_SOURCE_PRIORITY sort order contradicts the "highest-priority" log message, SLEEP_MS (30 s) contradicts the code comment (60 s), and the index parameter name shadows the drizzle-orm index import.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([Start]) --> B{dry-run flag?}
    B -- Yes --> C[Query distinct workspace IDs from DB]
    B -- No --> D[Read workspace IDs from --from-file]
    C --> E[Init migrate-byok-workspace-ids.txt]
    E --> F[Process workspaces in batches of 1000 concurrency 100]
    D --> F

    F --> G[Fetch matching blocks and workspace owner]
    G --> H{providerKeys found?}
    H -- No --> I[Skip workspace]
    H -- Yes --> J{Env var references present?}
    J -- Yes --> K[Fetch workspace_environment and personal environment]
    J -- No --> L[Resolve keys]
    K --> L
    L --> M{resolved.length > 0?}
    M -- No --> N[Continue to next provider]
    M -- Yes --> O[Sort by KEY_SOURCE_PRIORITY plaintext=0 workspace=1 personal=2]
    O --> P{distinct keys > 1?}
    P -- Yes --> Q[Log CONFLICT and pick resolved index 0]
    P -- No --> R[Pick resolved index 0]
    Q --> S{DRY_RUN?}
    R --> S
    S -- Yes --> T[Log preview and write workspace ID to file]
    S -- No --> U[Encrypt chosen key and INSERT with onConflictDoNothing]
    U --> V{Rows returned?}
    V -- Yes --> W[stats.inserted++]
    V -- No --> X[stats.skippedExisting++]
    T --> Y[Next provider]
    W --> Y
    X --> Y
    I --> Z[Next workspace]
    N --> Y
    Y --> Z
    Z --> AA{More batches?}
    AA -- Yes --> AB[Sleep SLEEP_MS between batches]
    AB --> F
    AA -- No --> AC([Print summary and exit])
Loading

Last reviewed commit: 399db36

Comment on lines +418 to +424
const subBlocks = block.subBlocks as Record<string, { value?: any }>

const providerId = BLOCK_TYPE_TO_PROVIDER[block.blockType]
if (providerId) {
const val = subBlocks?.apiKey?.value
if (typeof val === 'string' && val.trim()) {
const refs = providerKeys.get(providerId) ?? []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index parameter shadows drizzle-orm import

The processWorkspace function's index parameter shadows the index named import from drizzle-orm/pg-core (used in the table definitions at module scope). While this doesn't cause a runtime error (the module-level index is only needed at definition time), it creates a confusing naming collision. Consider renaming the parameter to avoid the shadow:

Suggested change
const subBlocks = block.subBlocks as Record<string, { value?: any }>
const providerId = BLOCK_TYPE_TO_PROVIDER[block.blockType]
if (providerId) {
const val = subBlocks?.apiKey?.value
if (typeof val === 'string' && val.trim()) {
const refs = providerKeys.get(providerId) ?? []
async function processWorkspace(
workspaceId: string,
allBlockTypes: string[],
userFilter: ReturnType<typeof sql>,
total: number,
workspaceIndex: number
): Promise<WorkspaceResult> {

Then update all references to index inside the function to workspaceIndex.

Comment on lines +221 to +223

// ---------- DB ----------
const postgresClient = postgres(CONNECTION_STRING, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SLEEP_MS value contradicts the code comment

SLEEP_MS is set to 30_000 (30 seconds), but the processing loop (line ~620) has a comment that says "pausing for 60s after each 1000". One of the two needs to be corrected to avoid confusion for operators running this script.

Suggested change
// ---------- DB ----------
const postgresClient = postgres(CONNECTION_STRING, {
const SLEEP_MS = 60_000

Comment on lines +547 to +551
continue
}

try {
const encrypted = await encryptSecret(chosen.key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dry-run writes workspace IDs even when all keys fail to resolve

shouldWriteWorkspaceId is set to DRY_RUN (i.e., always true in dry-run mode) for any workspace that has at least one matching block. However, if every API key reference for a workspace fails to resolve — for example because all values are {{ENV_VAR}} references that don't exist in the environment tables — resolved.length === 0 for every provider and the continue is hit each time. The workspace ID is still written to migrate-byok-workspace-ids.txt.

When the live run later reads that file, it processes the workspace, finds no insertable keys, and does nothing — wasting a DB round-trip per failed workspace. More importantly, the dry-run summary message ("Wrote ${stats.workspacesProcessed} workspace IDs (with keys)") will report a higher count than the number of IDs that will actually produce inserts.

Consider tracking a per-workspace boolean that is only set to true when at least one resolved.length > 0 path is reached:

let hasInsertableKey = false
// ... inside the provider loop, before `continue`:
if (resolved.length === 0) continue
hasInsertableKey = true
// ...
return { stats, shouldWriteWorkspaceId: DRY_RUN && hasInsertableKey }

Comment on lines +243 to +248

function extractEnvVarName(value: string): string | null {
const match = ENV_VAR_PATTERN.exec(value)
return match ? match[1].trim() : null
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Priority ordering and log message are inconsistent

KEY_SOURCE_PRIORITY assigns plaintext = 0, workspace = 1, personal = 2, and the sort at line ~503 is ascending — so resolved[0] is the entry with the lowest priority number (i.e., a bare plaintext key in a block wins over a workspace env-var reference). The conflict log on line ~514 says "Using highest-priority key", which contradicts the ascending sort (it actually picks the lowest-numbered entry).

This is at minimum a confusing log message. More importantly, it's worth verifying the intended precedence: should a workspace-level env var key take precedence over a plaintext block key, or the other way around? If a workspace env var should win, the priority values should be inverted (plaintext: 2, workspace: 1, personal: 0) or the sort order should be descending.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@TheodoreSpeaks TheodoreSpeaks merged commit 5ba3118 into staging Mar 14, 2026
12 checks passed
@TheodoreSpeaks TheodoreSpeaks deleted the feat/migrate-byok-script branch March 14, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant