Skip to content

chore: add flash agent skill co-located with source code#254

Closed
TimPietruskyRunPod wants to merge 24 commits intomainfrom
chore/add-flash-skill
Closed

chore: add flash agent skill co-located with source code#254
TimPietruskyRunPod wants to merge 24 commits intomainfrom
chore/add-flash-skill

Conversation

@TimPietruskyRunPod
Copy link
Member

Summary

  • Adds flash/SKILL.md (588 lines) rewritten around the unified Endpoint class API
  • Replaces the old skill in runpod/skills which documented the deprecated 8-class resource hierarchy (LiveServerless, CpuLiveServerless, etc.)
  • Co-locating the skill with source code ensures it stays in sync with the codebase
  • Discoverable via npx skills add runpod/flash

Skill contents

  1. Quick reference and getting started (flash login, flash init, flash run, flash deploy)
  2. Four Endpoint modes: QB decorator, LB decorator, external image client, existing endpoint client
  3. Full constructor parameter table (verified against endpoint.py)
  4. EndpointJob API
  5. GPU groups & types, CPU instance types (verified against enums)
  6. Cloudpickle scoping rules
  7. All CLI commands (verified against cli/main.py)
  8. Common patterns matching skeleton templates
  9. Architecture overview and common gotchas

Test plan

  • Verify npx skills add runpod/flash discovers the skill
  • Verify code examples match skeleton templates (gpu_worker.py, cpu_worker.py, lb_worker.py)
  • Verify all constructor parameters match Endpoint.__init__ in endpoint.py
  • Verify CLI commands match registrations in cli/main.py
  • Verify GPU/CPU enums match current definitions

@TimPietruskyRunPod TimPietruskyRunPod marked this pull request as draft March 5, 2026 17:26
Adds flash/SKILL.md rewritten around the unified Endpoint class API.
Replaces the old skill in runpod/skills which documented the deprecated
8-class resource hierarchy. Co-locating the skill ensures it stays in
sync with the codebase. Discoverable via `npx skills add runpod/flash`.
Remove content an agent doesn't need: architecture internals, full
enum listings, verbose CLI option tables, redundant code patterns.
Keep: constructor params, four modes, cloudpickle rules, gotchas.
Point agents to source files for enum details they can read themselves.
- replace v1.6.0 content with eval-tested v1.7.0 skill
- remove non-existent flash login command
- fix GpuType.ANY to GpuGroup (GpuType has no ANY member)
- consolidate from four modes to three (QB, LB, client)
- all examples use Endpoint class exclusively
- scored 18/18 on eval assertions across 3 test prompts
@TimPietruskyRunPod TimPietruskyRunPod marked this pull request as ready for review March 5, 2026 22:21
Copy link
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #254 Review: chore: add flash agent skill co-located with source code

1. Bug: Fabricated CLI subcommands (high confidence — verified against cli/main.py and flash deploy --help)

The CLI section documents subcommands that don't exist:

flash deploy new staging      # ❌ not a real command
flash deploy send staging     # ❌ not a real command
flash deploy list staging     # ❌ not a real command
flash deploy info staging     # ❌ not a real command
flash deploy delete staging   # ❌ not a real command

The actual flash deploy is a single command with --env:

flash deploy --env staging           # build + deploy to staging
flash deploy --exclude torch,pkg2    # exclude packages

Environment management is via flash env:

flash env list
flash env create
flash env get
flash env delete

2. Issue: execution_timeout_ms missing from the constructor table

Endpoint.__init__ has execution_timeout_ms: int = 0 but it's absent from the skill's parameter table. This is user-facing (needed for long-running jobs) and was a known fix (executionTimeoutMsexecution_timeout_ms snake_case rename in 1.7.0).

3. Question: "Auto GPU switching requires workers >= 5"

Gotcha #8 states this as a rule. Is this a documented platform policy? If not verified, it could mislead users into setting unnecessarily high worker counts.

4. Nit: ADA_80_PRO VRAM labeled "80GB" but includes H100 NVL

The source docstring says: "NVIDIA H100 PCIe, NVIDIA H100 80GB HBM3, NVIDIA H100 NVL". H100 NVL is 94GB, not 80GB. Worth noting the label is approximate.

5. Nit: Missing less-common constructor params

accelerate_downloads, datacenter, scaler_type, scaler_value are in __init__ but absent from the table. Fine to omit as advanced, but accelerate_downloads=False is a useful workaround for slow dep installs.


Verdict: NEEDS WORK — The CLI section needs to be corrected before merge. An agent using this skill will generate flash deploy new/send/list commands that fail immediately.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

- Replace made-up `flash deploy new/send/list/info/delete` with actual
  `flash deploy --env`, `flash env list/create/get/delete` commands
- Add `flash deploy --preview` for local Docker preview
- Add `execution_timeout_ms` to Endpoint constructor
@TimPietruskyRunPod
Copy link
Member Author

we continue in runpod/skills#7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants