From 07d9b844b0e896167dc4dcb2e86492e642742869 Mon Sep 17 00:00:00 2001
From: "promptless[bot]" <179508745+promptless[bot]@users.noreply.github.com>
Date: Mon, 23 Mar 2026 14:39:03 +0000
Subject: [PATCH 1/2] Update Flash local testing docs with request format and
docstring feature
---
flash/apps/local-testing.mdx | 20 ++++++++++++++++++--
flash/cli/run.mdx | 25 ++++++++++++++++++++++++-
2 files changed, 42 insertions(+), 3 deletions(-)
diff --git a/flash/apps/local-testing.mdx b/flash/apps/local-testing.mdx
index a0f12cf3..8ce1a3b9 100644
--- a/flash/apps/local-testing.mdx
+++ b/flash/apps/local-testing.mdx
@@ -36,7 +36,7 @@ flash run --host 0.0.0.0
# Call a queue-based endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
-H "Content-Type: application/json" \
- -d '{"message": "Hello from Flash"}'
+ -d '{"input": {"message": "Hello from Flash"}}'
# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
@@ -44,10 +44,26 @@ curl -X POST http://localhost:8888/lb_worker/process \
-d '{"data": "test"}'
```
+
+Queue-based endpoints require the `{"input": {...}}` wrapper format to match the deployed endpoint behavior. Load-balanced endpoints accept direct JSON payloads.
+
+
### Using the API explorer
Open [http://localhost:8888/docs](http://localhost:8888/docs) in your browser to access the interactive Swagger UI. You can test all endpoints directly from the browser.
+Flash extracts the first line of each function's docstring and displays it as the endpoint description in the API explorer. Add docstrings to your `@Endpoint` functions to make your API self-documenting:
+
+```python
+@Endpoint(name="gpu-worker", gpu=GpuGroup.ANY)
+def process_data(data: dict) -> dict:
+ """Process input data and return computed results."""
+ # Function implementation
+ return {"result": "processed"}
+```
+
+The docstring "Process input data and return computed results" appears in the Swagger UI, making it easier to understand what each endpoint does.
+
### Using Python
```python
@@ -56,7 +72,7 @@ import requests
# Call queue-based endpoint
response = requests.post(
"http://localhost:8888/gpu_worker/runsync",
- json={"message": "Hello from Flash"}
+ json={"input": {"message": "Hello from Flash"}}
)
print(response.json())
diff --git a/flash/cli/run.mdx b/flash/cli/run.mdx
index 45b23380..78d7fe6f 100644
--- a/flash/cli/run.mdx
+++ b/flash/cli/run.mdx
@@ -47,6 +47,25 @@ Enable or disable auto-reload on code changes. Enabled by default.
Auto-provision all Serverless endpoints on startup instead of lazily on first call. Eliminates cold-start delays during development.
+## Endpoint descriptions from docstrings
+
+Flash extracts the first line of each function's docstring and uses it in two places:
+
+- **Startup table**: The "Description" column shows the docstring when the server starts.
+- **Swagger UI**: The endpoint summary in the API explorer at `/docs`.
+
+Add docstrings to your `@Endpoint` functions to make your API self-documenting:
+
+```python
+@Endpoint(name="text-processor", gpu=GpuGroup.ANY)
+def analyze_text(text: str) -> dict:
+ """Analyze text and return sentiment scores."""
+ # Implementation here
+ return {"sentiment": "positive"}
+```
+
+When you run `flash run`, the startup table displays "Analyze text and return sentiment scores" as the description for this endpoint, and the same text appears in the Swagger UI summary.
+
## Architecture
With `flash run`, Flash starts a local development server alongside remote Serverless endpoints:
@@ -136,7 +155,7 @@ curl http://localhost:8888/
# Call a queue-based GPU endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
-H "Content-Type: application/json" \
- -d '{"message": "Hello from GPU!"}'
+ -d '{"input": {"message": "Hello from GPU!"}}'
# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
@@ -144,6 +163,10 @@ curl -X POST http://localhost:8888/lb_worker/process \
-d '{"data": "test"}'
```
+
+Queue-based endpoints require the `{"input": {...}}` wrapper format to match deployed endpoint behavior. Load-balanced endpoints accept direct JSON payloads.
+
+
Open http://localhost:8888/docs for the interactive API explorer.
## Requirements
From 66733843845054753734bb9f3c163fc5386a3be0 Mon Sep 17 00:00:00 2001
From: "promptless[bot]" <179508745+promptless[bot]@users.noreply.github.com>
Date: Mon, 23 Mar 2026 14:55:20 +0000
Subject: [PATCH 2/2] Sync documentation updates
---
flash/apps/local-testing.mdx | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/flash/apps/local-testing.mdx b/flash/apps/local-testing.mdx
index 8ce1a3b9..15463e7a 100644
--- a/flash/apps/local-testing.mdx
+++ b/flash/apps/local-testing.mdx
@@ -36,16 +36,16 @@ flash run --host 0.0.0.0
# Call a queue-based endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
-H "Content-Type: application/json" \
- -d '{"input": {"message": "Hello from Flash"}}'
+ -d '{"input": {"input_data": {"message": "Hello from Flash"}}}'
# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
-H "Content-Type: application/json" \
- -d '{"data": "test"}'
+ -d '{"input_data": {"message": "Hello from Flash"}}'
```
-Queue-based endpoints require the `{"input": {...}}` wrapper format to match the deployed endpoint behavior. Load-balanced endpoints accept direct JSON payloads.
+Queue-based endpoints require the `{"input": {...}}` wrapper to match the deployed endpoint behavior. The inner payload structure maps to your function's parameter names—the skeleton template uses `input_data: dict`, so the payload is `{"input_data": {...}}`. Load-balanced endpoints accept the payload directly without the `input` wrapper.
### Using the API explorer
@@ -56,7 +56,7 @@ Flash extracts the first line of each function's docstring and displays it as th
```python
@Endpoint(name="gpu-worker", gpu=GpuGroup.ANY)
-def process_data(data: dict) -> dict:
+def process_data(input_data: dict) -> dict:
"""Process input data and return computed results."""
# Function implementation
return {"result": "processed"}
@@ -72,14 +72,14 @@ import requests
# Call queue-based endpoint
response = requests.post(
"http://localhost:8888/gpu_worker/runsync",
- json={"input": {"message": "Hello from Flash"}}
+ json={"input": {"input_data": {"message": "Hello from Flash"}}}
)
print(response.json())
# Call load-balanced endpoint
response = requests.post(
"http://localhost:8888/lb_worker/process",
- json={"data": "test"}
+ json={"input_data": {"message": "Hello from Flash"}}
)
print(response.json())
```