Flash is a Python SDK for developing cloud-native AI apps where you define everything—hardware, remote functions, and dependencies—using local code.
import asyncio
from runpod_flash import Endpoint, GpuType
# Mark the function below for remote execution
@Endpoint(name="hello-gpu", gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, dependencies=["torch"])
async def hello(): # This function runs on Runpod
import torch
gpu_name = torch.cuda.get_device_name(0)
print(f"Hello from your GPU! ({gpu_name})")
return {"gpu": gpu_name}
asyncio.run(hello())
print("Done!") # This runs locallyWrite @Endpoint decorated Python functions on your local machine. Run them, and Flash automatically handles GPU/CPU provisioning and worker scaling on Runpod Serverless.
Install Flash using pip or uv:
# Install with pip
pip install runpod-flash
# Or uv
uv add runpod-flashFlash requires Python 3.10+, and is currently available for macOS and Linux. Windows support is in development.
Before you can use Flash, you need to authenticate with your Runpod account:
flash loginThis saves your API key securely and allows you to use the Flash CLI and run @Endpoint functions.
Install the Flash skill package for AI coding agents like Claude Code, Cline, and Cursor:
npx skills add runpod/skillsYou can review the SKILL.md file in the runpod/skills repository.
Create gpu_demo.py:
import asyncio
from runpod_flash import Endpoint, GpuType
@Endpoint(
name="flash-quickstart",
gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
workers=3,
dependencies=["numpy", "torch"]
)
def gpu_matrix_multiply(size):
# IMPORTANT: Import packages INSIDE the function
import numpy as np
import torch
# Get GPU name
device_name = torch.cuda.get_device_name(0)
# Create random matrices
A = np.random.rand(size, size)
B = np.random.rand(size, size)
# Multiply matrices
C = np.dot(A, B)
return {
"matrix_size": size,
"result_mean": float(np.mean(C)),
"gpu": device_name
}
# Call the function
async def main():
print("Running matrix multiplication on Runpod GPU...")
result = await gpu_matrix_multiply(1000)
print(f"\n✓ Matrix size: {result['matrix_size']}x{result['matrix_size']}")
print(f"✓ Result mean: {result['result_mean']:.4f}")
print(f"✓ GPU used: {result['gpu']}")
if __name__ == "__main__":
asyncio.run(main())Run it:
python gpu_demo.pyFirst run takes 30-60 seconds (provisioning). Subsequent runs take 2-3 seconds.
- Remote execution:
@Endpointfunctions run on Runpod Serverless GPUs/CPUs - Auto-scaling: Workers scale from 0 to N based on demand
- Dependency management: Packages install automatically on remote workers
- Two patterns: Queue-based (
@Endpoint) for batch work, load-balanced (Endpoint()+ routes) for REST APIs
Full documentation: docs.runpod.io/flash
- Quickstart - First GPU workload in 5 minutes
- Create endpoints - Queue-based, load-balancing, and custom Docker endpoints
- CLI reference -
flash run,flash deploy,flash build - Configuration - All endpoint parameters
When you're ready to move beyond scripts and build a production-ready API, you can create a Flash app (a collection of interconnected endpoints with diverse hardware configurations) and deploy it to Runpod.
Follow this tutorial to build your first Flash app.
The Flash CLI provides a set of commands for managing your Flash apps and endpoints.
flash --helpLearn more about the Flash CLI.
Browse working examples: github.com/runpod/flash-examples
- Python 3.10+
- macOS or Linux (Windows support in development)
- Runpod account with API key
We welcome contributions! See RELEASE_SYSTEM.md for development workflow.
# Clone and install
git clone https://github.com/runpod/flash.git
cd flash
pip install -e ".[dev]"
# Use conventional commits
git commit -m "feat: add new feature"
git commit -m "fix: resolve issue"- Discord - Community support
- GitHub Issues - Bug reports
MIT License - see LICENSE for details.