Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 20 additions & 4 deletions flash/apps/local-testing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -36,18 +36,34 @@ flash run --host 0.0.0.0
# Call a queue-based endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
-H "Content-Type: application/json" \
-d '{"message": "Hello from Flash"}'
-d '{"input": {"input_data": {"message": "Hello from Flash"}}}'

# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
-H "Content-Type: application/json" \
-d '{"data": "test"}'
-d '{"input_data": {"message": "Hello from Flash"}}'
```

<Note>
Queue-based endpoints require the `{"input": {...}}` wrapper to match the deployed endpoint behavior. The inner payload structure maps to your function's parameter names—the skeleton template uses `input_data: dict`, so the payload is `{"input_data": {...}}`. Load-balanced endpoints accept the payload directly without the `input` wrapper.
</Note>

### Using the API explorer

Open [http://localhost:8888/docs](http://localhost:8888/docs) in your browser to access the interactive Swagger UI. You can test all endpoints directly from the browser.

Flash extracts the first line of each function's docstring and displays it as the endpoint description in the API explorer. Add docstrings to your `@Endpoint` functions to make your API self-documenting:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Citation: PR #215 adds docstring extraction via RemoteFunctionMetadata.docstring in scanner.py and displays it in run.py's startup table and Swagger UI summary using _escape_summary() helper.
View source


```python
@Endpoint(name="gpu-worker", gpu=GpuGroup.ANY)
def process_data(input_data: dict) -> dict:
"""Process input data and return computed results."""
# Function implementation
return {"result": "processed"}
```

The docstring "Process input data and return computed results" appears in the Swagger UI, making it easier to understand what each endpoint does.

### Using Python

```python
Expand All @@ -56,14 +72,14 @@ import requests
# Call queue-based endpoint
response = requests.post(
"http://localhost:8888/gpu_worker/runsync",
json={"message": "Hello from Flash"}
json={"input": {"input_data": {"message": "Hello from Flash"}}}
)
print(response.json())

# Call load-balanced endpoint
response = requests.post(
"http://localhost:8888/lb_worker/process",
json={"data": "test"}
json={"input_data": {"message": "Hello from Flash"}}
)
print(response.json())
```
Expand Down
25 changes: 24 additions & 1 deletion flash/cli/run.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,25 @@ Enable or disable auto-reload on code changes. Enabled by default.
Auto-provision all Serverless endpoints on startup instead of lazily on first call. Eliminates cold-start delays during development.
</ResponseField>

## Endpoint descriptions from docstrings
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Citation: PR #215 introduces function_docstrings in WorkerInfo dataclass and displays docstrings in the startup table's "Description" column and as Swagger UI summary via the summary= parameter.
View source


Flash extracts the first line of each function's docstring and uses it in two places:

- **Startup table**: The "Description" column shows the docstring when the server starts.
- **Swagger UI**: The endpoint summary in the API explorer at `/docs`.

Add docstrings to your `@Endpoint` functions to make your API self-documenting:

```python
@Endpoint(name="text-processor", gpu=GpuGroup.ANY)
def analyze_text(text: str) -> dict:
"""Analyze text and return sentiment scores."""
# Implementation here
return {"sentiment": "positive"}
```

When you run `flash run`, the startup table displays "Analyze text and return sentiment scores" as the description for this endpoint, and the same text appears in the Swagger UI summary.

## Architecture

With `flash run`, Flash starts a local development server alongside remote Serverless endpoints:
Expand Down Expand Up @@ -136,14 +155,18 @@ curl http://localhost:8888/
# Call a queue-based GPU endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
-H "Content-Type: application/json" \
-d '{"message": "Hello from GPU!"}'
-d '{"input": {"message": "Hello from GPU!"}}'
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Citation: PR #215 introduces make_wrapped_model() helper that wraps request body in an input envelope for queue-based endpoints, changing the request format from direct JSON to {"input": {...}}.
View source


# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
-H "Content-Type: application/json" \
-d '{"data": "test"}'
```

<Note>
Queue-based endpoints require the `{"input": {...}}` wrapper format to match deployed endpoint behavior. Load-balanced endpoints accept direct JSON payloads.
</Note>

Open http://localhost:8888/docs for the interactive API explorer.

## Requirements
Expand Down
Loading