From 3e3267cfeef974e2629105ec8127810675188175 Mon Sep 17 00:00:00 2001 From: "promptless[bot]" <179508745+promptless[bot]@users.noreply.github.com> Date: Mon, 23 Mar 2026 15:11:52 +0000 Subject: [PATCH] Add code example for cross-endpoint dispatch --- flash/apps/deploy-apps.mdx | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/flash/apps/deploy-apps.mdx b/flash/apps/deploy-apps.mdx index ca19b9dd..e9e57000 100644 --- a/flash/apps/deploy-apps.mdx +++ b/flash/apps/deploy-apps.mdx @@ -316,6 +316,41 @@ When one endpoint needs to call a function on another endpoint: Each endpoint maintains its own connection to the state manager, querying for peer endpoint URLs as needed and caching results for 300 seconds to minimize API calls. +#### Calling another endpoint from your code + +To call one endpoint from another, import the target endpoint function **inside** your function body. Flash automatically detects these imports and generates the necessary dispatch stubs. + +For example, if you have a GPU worker for inference: + +```python gpu_worker.py +from runpod_flash import Endpoint, GpuType + +@Endpoint( + name="gpu-inference", + gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, + dependencies=["torch"] +) +async def gpu_inference(payload: dict) -> dict: + import torch + # GPU inference logic + return {"result": "processed"} +``` + +You can call it from a CPU-based pipeline endpoint: + +```python cpu_worker.py +from runpod_flash import Endpoint + +@Endpoint(name="pipeline", cpu="cpu5c-4-8") +async def classify(text: str) -> dict: + # Import the GPU endpoint inside the function body + from gpu_worker import gpu_inference + + # Flash routes this call to the gpu-inference endpoint + result = await gpu_inference({"text": text}) + return {"classification": result} +``` + ## Troubleshooting ### No @Endpoint functions found