diff --git a/flash/apps/local-testing.mdx b/flash/apps/local-testing.mdx
index a0f12cf3..15463e7a 100644
--- a/flash/apps/local-testing.mdx
+++ b/flash/apps/local-testing.mdx
@@ -36,18 +36,34 @@ flash run --host 0.0.0.0
# Call a queue-based endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
-H "Content-Type: application/json" \
- -d '{"message": "Hello from Flash"}'
+ -d '{"input": {"input_data": {"message": "Hello from Flash"}}}'
# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
-H "Content-Type: application/json" \
- -d '{"data": "test"}'
+ -d '{"input_data": {"message": "Hello from Flash"}}'
```
+
+Queue-based endpoints require the `{"input": {...}}` wrapper to match the deployed endpoint behavior. The inner payload structure maps to your function's parameter names—the skeleton template uses `input_data: dict`, so the payload is `{"input_data": {...}}`. Load-balanced endpoints accept the payload directly without the `input` wrapper.
+
+
### Using the API explorer
Open [http://localhost:8888/docs](http://localhost:8888/docs) in your browser to access the interactive Swagger UI. You can test all endpoints directly from the browser.
+Flash extracts the first line of each function's docstring and displays it as the endpoint description in the API explorer. Add docstrings to your `@Endpoint` functions to make your API self-documenting:
+
+```python
+@Endpoint(name="gpu-worker", gpu=GpuGroup.ANY)
+def process_data(input_data: dict) -> dict:
+ """Process input data and return computed results."""
+ # Function implementation
+ return {"result": "processed"}
+```
+
+The docstring "Process input data and return computed results" appears in the Swagger UI, making it easier to understand what each endpoint does.
+
### Using Python
```python
@@ -56,14 +72,14 @@ import requests
# Call queue-based endpoint
response = requests.post(
"http://localhost:8888/gpu_worker/runsync",
- json={"message": "Hello from Flash"}
+ json={"input": {"input_data": {"message": "Hello from Flash"}}}
)
print(response.json())
# Call load-balanced endpoint
response = requests.post(
"http://localhost:8888/lb_worker/process",
- json={"data": "test"}
+ json={"input_data": {"message": "Hello from Flash"}}
)
print(response.json())
```
diff --git a/flash/cli/run.mdx b/flash/cli/run.mdx
index 45b23380..78d7fe6f 100644
--- a/flash/cli/run.mdx
+++ b/flash/cli/run.mdx
@@ -47,6 +47,25 @@ Enable or disable auto-reload on code changes. Enabled by default.
Auto-provision all Serverless endpoints on startup instead of lazily on first call. Eliminates cold-start delays during development.
+## Endpoint descriptions from docstrings
+
+Flash extracts the first line of each function's docstring and uses it in two places:
+
+- **Startup table**: The "Description" column shows the docstring when the server starts.
+- **Swagger UI**: The endpoint summary in the API explorer at `/docs`.
+
+Add docstrings to your `@Endpoint` functions to make your API self-documenting:
+
+```python
+@Endpoint(name="text-processor", gpu=GpuGroup.ANY)
+def analyze_text(text: str) -> dict:
+ """Analyze text and return sentiment scores."""
+ # Implementation here
+ return {"sentiment": "positive"}
+```
+
+When you run `flash run`, the startup table displays "Analyze text and return sentiment scores" as the description for this endpoint, and the same text appears in the Swagger UI summary.
+
## Architecture
With `flash run`, Flash starts a local development server alongside remote Serverless endpoints:
@@ -136,7 +155,7 @@ curl http://localhost:8888/
# Call a queue-based GPU endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
-H "Content-Type: application/json" \
- -d '{"message": "Hello from GPU!"}'
+ -d '{"input": {"message": "Hello from GPU!"}}'
# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
@@ -144,6 +163,10 @@ curl -X POST http://localhost:8888/lb_worker/process \
-d '{"data": "test"}'
```
+
+Queue-based endpoints require the `{"input": {...}}` wrapper format to match deployed endpoint behavior. Load-balanced endpoints accept direct JSON payloads.
+
+
Open http://localhost:8888/docs for the interactive API explorer.
## Requirements