Skip to content

refactor: separate .env from deployed endpoint env vars#257

Open
deanq wants to merge 7 commits intomainfrom
deanq/ae-1549-explicit-env-vars
Open

refactor: separate .env from deployed endpoint env vars#257
deanq wants to merge 7 commits intomainfrom
deanq/ae-1549-explicit-env-vars

Conversation

@deanq
Copy link
Member

@deanq deanq commented Mar 5, 2026

Summary

  • Stop auto-carrying .env file contents to deployed endpoints
  • ServerlessResource.env defaults to None instead of reading .env via get_env_vars()
  • Delete EnvironmentVars class and environment.py
  • Add deploy-time env preview table with secret masking
  • Stop stripping RUNPOD_API_KEY from explicit resource env in manifest

Motivation

.env files were implicitly dumped into every deployed endpoint, causing:

  • Platform-injected vars (PORT, PORT_HEALTH) overwritten on template updates
  • False config drift from runtime var injection into self.env
  • User confusion about what actually reaches deployed workers

Changes

Commit Description
bd92886 Delete environment.py, remove get_env_vars(), change env default to None
50ffc7e Fix broken tests referencing deleted code
d6545f7 Stop stripping RUNPOD_API_KEY from explicit resource env in manifest
4d24808 Deploy-time env preview with secret masking, wired into flash deploy
5b765f6 Doc updates (5 docs + design plan)

How it works now

  • .env still populates os.environ via load_dotenv() for CLI and get_api_key() usage
  • To pass env vars to deployed endpoints, declare them explicitly: env={"HF_TOKEN": os.environ["HF_TOKEN"]}
  • Flash injects runtime vars (RUNPOD_API_KEY, FLASH_MODULE_PATH) into template.env automatically
  • flash deploy renders a preview table showing all env vars per resource before provisioning

Breaking change

ServerlessResource.env no longer defaults to .env file contents. Users relying on implicit carryover must add explicit env={} to their resource configs.

Companion PR

Test plan

  • All 2259 tests pass (84.56% coverage)
  • Format and lint clean
  • No remaining references to deleted EnvironmentVars or get_env_vars()
  • Manual test: flash deploy with explicit env={} on resource
  • Manual test: flash deploy with no env (verify preview shows only flash-injected vars)

@deanq deanq requested a review from Copilot March 5, 2026 19:13
deanq added 5 commits March 5, 2026 11:19
Change ServerlessResource.env default from get_env_vars() (reads .env)
to None. Delete EnvironmentVars class and get_env_vars(). Template
creation now uses self.env or {} instead of falling back to .env file.

.env still populates os.environ via load_dotenv() in __init__.py for
CLI and get_api_key() usage. This change only affects what gets sent
to deployed endpoints.
Replace TestServerlessResourceEnvLoading tests that imported deleted
get_env_vars/EnvironmentVars with tests that verify env defaults to
None and preserves explicit dicts.
With env separation, resource.env only contains user-declared vars.
If a user explicitly sets RUNPOD_API_KEY in their resource env, it
should be preserved. Runtime injection via _inject_runtime_template_vars()
handles the automatic case.
New module renders a Rich table per resource showing all env vars
that will be sent to deployed endpoints. User-declared vars shown
directly; flash-injected vars (RUNPOD_API_KEY, FLASH_MODULE_PATH)
labeled as 'injected by flash'. Secret values masked based on key
pattern matching (KEY, TOKEN, SECRET, PASSWORD, CREDENTIAL).

Wired into flash deploy: renders before provisioning so users see
exactly what goes to each endpoint.
Clarify that .env populates os.environ for CLI/local dev only.
Resource env vars must be explicit via env={} on resource config.
RUNPOD_API_KEY injected automatically for makes_remote_calls=True.
@deanq deanq force-pushed the deanq/ae-1549-explicit-env-vars branch from 5b765f6 to be4269a Compare March 5, 2026 19:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@deanq deanq requested a review from Copilot March 5, 2026 22:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/runpod_flash/cli/utils/env_preview.py:1

  • The _SECRET_PATTERN regex matches any key containing TOKEN as a substring, so TOKENIZER_PATH gets masked even though it's not a secret. This is a false positive — environment variables like TOKENIZER_PATH, TOKEN_COUNT, or BUCKET_NAME (no match, but illustrative) could be misleadingly masked. The test at line 52-55 in test_env_preview.py confirms this behavior as intended, but the docstring says "TOKEN in TOKENIZER should still match" which suggests this was a deliberate choice. However, this will surprise users who have non-secret vars with these substrings. Consider using word-boundary matching (e.g., r"\b(KEY|TOKEN|SECRET|PASSWORD|CREDENTIAL)\b" or r"(^|_)(KEY|TOKEN|SECRET|PASSWORD|CREDENTIAL)($|_)") to avoid masking keys like TOKENIZER_PATH or TOKENIZER_CONFIG.
"""Deploy-time env preview: show what env vars go to each endpoint."""

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

deanq added 2 commits March 5, 2026 16:26
- Fix preview/deploy mismatch: LB endpoints no longer show
  RUNPOD_API_KEY injection in preview (matches _do_deploy behavior)
- Wrap render_env_preview in try-except so preview failures
  don't abort deployment
- Fix stale comment referencing .env file in serverless_cpu.py
- Correct drift detection doc: env is conditionally included
  in hash, not always excluded
- Fix LB architecture doc: LB endpoints get FLASH_MODULE_PATH,
  not RUNPOD_API_KEY
- Update config_hash docstring to reflect exclude_none behavior
- Add tests for render_env_preview, LB API key exclusion,
  and _configure_existing_template with env=None
- Compact env preview into single table with resource column to
  reduce terminal clutter
- Show "user" source label for user-declared vars instead of
  empty string
- Remove local filesystem paths from plan docs
- Fix design doc: env=None omits the key, not stores empty dict
- Update render tests for new table format
@deanq deanq force-pushed the deanq/ae-1549-explicit-env-vars branch from bab2b46 to 6782404 Compare March 6, 2026 06:05
Copy link
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Clean, well-motivated change. The core fix (env defaults to None instead of implicitly loading .env) removes a real footgun. Good test coverage in test_env_separation.py. One confirmed bug and two nits below.


Bug: preview/deploy parity broken — makes_remote_calls default mismatch

_do_deploy calls _check_makes_remote_calls() which defaults to True in all failure cases:

# serverless.py:583
makes_remote_calls = resource_config.get("makes_remote_calls", True)
# also: "Manifest not found, assuming makes_remote_calls=True"
# also: "Resource not in manifest, assuming makes_remote_calls=True"

collect_env_for_preview uses the opposite default:

# env_preview.py
config.get("makes_remote_calls", False)  # ← should be True

On first deploy (no prior build/manifest), the preview shows no RUNPOD_API_KEY for QB endpoints but deploy injects it. The preview is wrong exactly when it matters most — the first time a user runs flash deploy.

Fix: config.get("makes_remote_calls", True) in env_preview.py to match deploy behaviour.


Nit: env={} explicit empty dict won't appear in manifest

manifest.py uses:

if hasattr(resource_config, "env") and resource_config.env:

An explicit env={} is falsy so it's silently skipped. Probably the right behaviour (nothing to store), but test_env_separation.py::test_serverless_resource_env_explicit_empty_dict_preserved only checks the resource object — not the manifest path. Worth a test or a comment to make the intent explicit.


Nit: _MASK_VISIBLE_CHARS = 6 exposes key type prefix

RUNPOD_API_KEY values start with rp_ — 6 visible chars reveals the full type prefix (rp_xxx). Not a security issue since the prefix is public, but worth being intentional about if multiple key formats are ever supported.


Positives

  • Removing env_dict.pop("RUNPOD_API_KEY", None) in manifest.py is correct — user-explicit keys should be preserved, not silently stripped
  • except Exception: logger.debug(...) around render_env_preview in deploy.py is the right call — preview failure must not block deploy
  • test_env_separation.py covers the _configure_existing_template None case, which was the trickiest correctness question
  • Companion flash-examples PR (#39) is the right process for a breaking change

Verdict: PASS with fix — the makes_remote_calls default is a one-line fix but should land before merge to avoid shipping a misleading preview.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants