Skip to content

Commit 59a7fcc

Browse files
deanqCopilot
andauthored
feat: fully deployed environment with generated handlers (#68)
* refactor: rename FLASH_IS_MOTHERSHIP to FLASH_ENDPOINT_TYPE in lb_handler Replace is_mothership with is_lb_endpoint throughout lb_handler.py. Add _is_lb_endpoint() helper that checks FLASH_ENDPOINT_TYPE=lb first, with backward compat for legacy FLASH_IS_MOTHERSHIP=true (logs deprecation warning). Update all log messages and ping endpoint from "mothership" to "LB endpoint" / "QB endpoint" terminology. * docs: add sync warnings between lb_handler and test copy of _is_lb_endpoint * refactor: recognize FLASH_ENDPOINT_TYPE in flash deployment detection - Add FLASH_ENDPOINT_TYPE check (lb/qb) to is_flash_deployment() - Retain FLASH_IS_MOTHERSHIP for backward compatibility - Update constants.py comments to use LB endpoint terminology - Add tests for new env var, legacy compat, and missing endpoint ID * test: add FLASH_ENDPOINT_TYPE tests alongside legacy mothership tests Add parallel tests using FLASH_ENDPOINT_TYPE=lb for all existing FLASH_IS_MOTHERSHIP tests. Legacy tests preserved as backward compatibility regression tests. Update docstring comment in unpack_volume.py to reflect new env var priority. * test: verify cross-endpoint routing preserves serialization format * refactor: remove FLASH_IS_MOTHERSHIP backward compatibility Drop legacy FLASH_IS_MOTHERSHIP env var support and deprecation warnings. All detection now uses FLASH_ENDPOINT_TYPE exclusively. Remove duplicate test classes that existed for parallel coverage of both old and new env vars. Remove dead serialization_format/json_result code path in _execute_flash_function and corresponding tests -- these fields do not exist on the current FunctionRequest/FunctionResponse protocol models. * feat(handler): delegate to generated handlers for deployed QB endpoints When FLASH_RESOURCE_NAME is set and a generated handler_<name>.py exists at /app (extracted from the build tarball), use it instead of the FunctionRequest handler. This enables deployed QB endpoints to accept plain JSON input without cloudpickle serialization. Falls back gracefully to FunctionRequest handler when: - FLASH_RESOURCE_NAME not set (Live Serverless mode) - No generated handler file found - Generated handler fails to import (with actionable log message) * refactor(lb): auto-discover handler from FLASH_RESOURCE_NAME Replace FLASH_MAIN_FILE/FLASH_APP_VARIABLE env vars with FLASH_RESOURCE_NAME-based handler discovery. The LB handler now derives the handler file path as handler_{resource_name}.py, matching the convention used by flash build's codegen. * docs: regenerate CLAUDE.md via /analyze-repos * fix: harden error handling across handler discovery and manifest refresh - handler.py: deployed QB endpoints (FLASH_ENDPOINT_TYPE=qb) now fail fast instead of silently falling back to FunctionRequest mode. Split broad except into ImportError/SyntaxError/Exception with appropriate log levels. Add warnings for missing handler file and missing handler attribute. - lb_handler.py: extract _discover_lb_app() as testable function with handler_dir parameter. Fix f-string in logger.error to %s-style. - unpack_volume.py: remove premature _UNPACKED=True before extraction succeeds. Fix off-by-one in retry sleep that caused 30s delay after final failed attempt. - manifest_reconciliation.py: split broad except in _fetch_and_save into expected network errors (warning) vs unexpected errors (error with traceback). Add debug logging to OSError in _is_manifest_stale. - test_lb_handler.py: eliminate standalone function copies by importing production code directly with mocked module-level side effects. - test_handler.py: add tests for SyntaxError branch, generic Exception branch, deployed QB hard failures, and missing handler attr warning. * fix(handler): remove hard failure for deployed QB endpoints Generated handler files are a new capability introduced in this PR. Existing deployed QB endpoints have FLASH_ENDPOINT_TYPE=qb and FLASH_RESOURCE_NAME set but no generated handler files yet. The hard failure (FileNotFoundError/RuntimeError) killed the module import at load time, preventing runpod.serverless.start() from executing and leaving workers unhealthy. Revert to warning-and-fallback for all discovery failures. Keep the improved logging (SyntaxError/Exception split, missing-attr warning) but always fall back to the FunctionRequest handler gracefully. * fix: add path traversal defense and callable guard for handler discovery - Validate FLASH_RESOURCE_NAME cannot resolve outside /app (handler.py) or handler_dir (lb_handler.py) using Path.resolve().is_relative_to() - Add callable() check on loaded handler attribute in handler.py - Fix docstrings: ImportError -> FileNotFoundError for missing file, remove RUNPOD_POD_ID reference, correct "LB endpoint" terminology - Add tests for path traversal and non-callable handler scenarios * docs: Update CLAUDE.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix: gate ServiceRegistry init on manifest existence Live Serverless workers never have flash_manifest.json on disk, so eagerly constructing ServiceRegistry caused unnecessary errors. Guard with Path.exists() and only initialize in Flash Deployed mode. * chore(deps): bump runpod-flash to >=1.4.0 Pin minimum version and update lock file from 1.1.1 to 1.4.0. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 8699020 commit 59a7fcc

16 files changed

Lines changed: 757 additions & 361 deletions

CLAUDE.md

Lines changed: 169 additions & 250 deletions
Large diffs are not rendered by default.

docs/Runtime_Execution_Paths.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -71,15 +71,17 @@ graph TB
7171

7272
The handler automatically detects the deployment mode using environment variables:
7373

74-
| Environment | RUNPOD_POD_ID | FLASH_* vars | Mode Detected |
75-
|-------------|---------------|--------------|---------------|
74+
| Environment | RUNPOD_ENDPOINT_ID | FLASH_* vars | Mode Detected |
75+
|-------------|-------------------|--------------|---------------|
7676
| Local dev | ❌ Not set | ❌ Not set | Live Serverless only |
7777
| Live Serverless | ✅ Set | ❌ Not set | Live Serverless |
78-
| Flash Mothership | ✅ Set | ✅ FLASH_IS_MOTHERSHIP=true | Flash Deployed |
78+
| Flash LB Endpoint | ✅ Set | ✅ FLASH_ENDPOINT_TYPE=lb | Flash Deployed |
79+
| Flash QB Endpoint | ✅ Set | ✅ FLASH_ENDPOINT_TYPE=qb | Flash Deployed |
7980
| Flash Child | ✅ Set | ✅ FLASH_RESOURCE_NAME | Flash Deployed |
8081

8182
Flash-specific environment variables:
82-
- `FLASH_IS_MOTHERSHIP=true` - Set for mothership endpoints
83+
- `FLASH_ENDPOINT_TYPE=lb` - Set for load-balanced endpoints
84+
- `FLASH_ENDPOINT_TYPE=qb` - Set for queue-based endpoints
8385
- `FLASH_RESOURCE_NAME` - Specifies resource config name
8486

8587
## Request Format Differences

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ dependencies = [
1414
"huggingface_hub>=0.32.0",
1515
"fastapi>=0.115.0",
1616
"uvicorn[standard]>=0.34.0",
17-
"runpod-flash",
17+
"runpod-flash>=1.4.0",
1818
]
1919

2020
[dependency-groups]

src/constants.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,6 @@
4747
"""Default timeout in seconds for cross-endpoint HTTP requests."""
4848

4949
DEFAULT_TARBALL_UNPACK_ATTEMPTS = 3
50-
"""Number of times the mothership CPU will attempt to unpack the worker-flash tarball from mounted volume"""
50+
"""Number of times the Flash-deployed endpoint will attempt to unpack the worker-flash tarball from mounted volume."""
5151
DEFAULT_TARBALL_UNPACK_INTERVAL = 30
52-
"""Time in seconds mothership CPU endpoint will wait between tarball unpack attempts"""
52+
"""Time in seconds the Flash-deployed endpoint will wait between tarball unpack attempts."""

src/handler.py

Lines changed: 122 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
1-
from typing import Dict, Any
1+
import importlib.util
2+
import logging
3+
import os
4+
from pathlib import Path
5+
from typing import Any, Dict, Optional
26

3-
from runpod_flash.protos.remote_execution import FunctionRequest, FunctionResponse
4-
from remote_executor import RemoteExecutor
57
from logger import setup_logging
68
from unpack_volume import maybe_unpack
79

@@ -12,25 +14,129 @@
1214
# This is a no-op for Live Serverless and local development
1315
maybe_unpack()
1416

17+
logger = logging.getLogger(__name__)
1518

16-
async def handler(event: Dict[str, Any]) -> Dict[str, Any]:
17-
"""
18-
RunPod serverless function handler with dependency installation.
19+
20+
def _load_generated_handler() -> Optional[Any]:
21+
"""Load Flash-generated handler if available (deployed QB mode).
22+
23+
Checks for a handler_<resource_name>.py file generated by the flash
24+
build pipeline. These handlers accept plain JSON input without
25+
FunctionRequest/cloudpickle serialization.
26+
27+
Returns:
28+
Handler function if generated handler found, None otherwise.
1929
"""
20-
output: FunctionResponse
30+
resource_name = os.getenv("FLASH_RESOURCE_NAME")
31+
if not resource_name:
32+
return None
33+
34+
handler_file = Path(f"/app/handler_{resource_name}.py")
35+
36+
if not handler_file.resolve().is_relative_to(Path("/app").resolve()):
37+
logger.warning(
38+
"FLASH_RESOURCE_NAME '%s' resolves outside /app. "
39+
"Falling back to FunctionRequest handler.",
40+
resource_name,
41+
)
42+
return None
43+
44+
if not handler_file.exists():
45+
logger.warning(
46+
"Generated handler file %s not found for resource '%s'. "
47+
"The build artifact may be incomplete. "
48+
"Falling back to FunctionRequest handler.",
49+
handler_file,
50+
resource_name,
51+
)
52+
return None
2153

54+
spec = importlib.util.spec_from_file_location(f"handler_{resource_name}", handler_file)
55+
if not spec or not spec.loader:
56+
logger.warning("Failed to create module spec for %s", handler_file)
57+
return None
58+
59+
mod = importlib.util.module_from_spec(spec)
2260
try:
23-
executor = RemoteExecutor()
24-
input_data = FunctionRequest(**event.get("input", {}))
25-
output = await executor.ExecuteFunction(input_data)
26-
27-
except Exception as error:
28-
output = FunctionResponse(
29-
success=False,
30-
error=f"Error in handler: {str(error)}",
61+
spec.loader.exec_module(mod)
62+
except ImportError as e:
63+
logger.warning(
64+
"Generated handler %s failed to import (missing dependency: %s). "
65+
"Deploy with --use-local-flash to include latest runpod_flash. "
66+
"Falling back to FunctionRequest handler.",
67+
handler_file,
68+
e,
69+
)
70+
return None
71+
except SyntaxError as e:
72+
logger.error(
73+
"Generated handler %s has a syntax error: %s. "
74+
"This indicates a bug in the flash build pipeline. "
75+
"Falling back to FunctionRequest handler.",
76+
handler_file,
77+
e,
78+
)
79+
return None
80+
except Exception as e:
81+
logger.error(
82+
"Generated handler %s failed to load unexpectedly: %s (%s). "
83+
"Falling back to FunctionRequest handler.",
84+
handler_file,
85+
e,
86+
type(e).__name__,
87+
exc_info=True,
3188
)
89+
return None
90+
91+
generated = getattr(mod, "handler", None)
92+
if generated is None:
93+
logger.warning(
94+
"Generated handler %s loaded but has no 'handler' attribute. "
95+
"Ensure the flash build pipeline generates a 'handler' function. "
96+
"Falling back to FunctionRequest handler.",
97+
handler_file,
98+
)
99+
return None
100+
101+
if not callable(generated):
102+
logger.warning(
103+
"Generated handler %s has a 'handler' attribute but it is not callable (%s). "
104+
"Falling back to FunctionRequest handler.",
105+
handler_file,
106+
type(generated).__name__,
107+
)
108+
return None
109+
110+
logger.info("Loaded generated handler from %s", handler_file)
111+
return generated
112+
113+
114+
# Try generated handler first (plain JSON mode for deployed QB endpoints)
115+
_generated = _load_generated_handler()
116+
117+
if _generated:
118+
handler = _generated
119+
else:
120+
# Fallback: original FunctionRequest handler (backward compatible)
121+
from runpod_flash.protos.remote_execution import FunctionRequest, FunctionResponse
122+
from remote_executor import RemoteExecutor
123+
124+
async def handler(event: Dict[str, Any]) -> Dict[str, Any]:
125+
"""RunPod serverless function handler with dependency installation."""
126+
output: FunctionResponse
127+
128+
try:
129+
executor = RemoteExecutor()
130+
input_data = FunctionRequest(**event.get("input", {}))
131+
output = await executor.ExecuteFunction(input_data)
132+
133+
except Exception as error:
134+
output = FunctionResponse(
135+
success=False,
136+
error=f"Error in handler: {str(error)}",
137+
)
32138

33-
return output.model_dump() # type: ignore[no-any-return]
139+
return output.model_dump() # type: ignore[no-any-return]
34140

35141

36142
# Start the RunPod serverless handler (only available on RunPod platform)

src/lb_handler.py

Lines changed: 68 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -3,25 +3,26 @@
33
This handler provides a FastAPI application for the Load Balancer runtime.
44
It supports:
55
- /ping: Health check endpoint (required by RunPod Load Balancer)
6-
- /execute: Remote function execution via HTTP POST (queue-based mode)
7-
- User's FastAPI app routes (mothership mode)
6+
- /execute: Remote function execution via HTTP POST (QB endpoint mode)
7+
- User's FastAPI app routes (LB endpoint mode)
88
99
The handler uses worker-flash's RemoteExecutor for function execution.
1010
11-
Mothership Mode (FLASH_IS_MOTHERSHIP=true):
12-
- Imports user's FastAPI application from FLASH_MAIN_FILE
13-
- Loads the app object from FLASH_APP_VARIABLE
11+
LB Endpoint Mode (FLASH_ENDPOINT_TYPE=lb):
12+
- Auto-discovers generated handler from FLASH_RESOURCE_NAME
13+
- Loads handler_{resource_name}.py with FastAPI app
1414
- Preserves all user routes and middleware
1515
- Adds /ping health check endpoint
1616
17-
Queue-Based Mode (FLASH_IS_MOTHERSHIP not set or false):
17+
QB Endpoint Mode (FLASH_ENDPOINT_TYPE not set or not "lb"):
1818
- Creates generic FastAPI app with /execute endpoint
1919
- Uses RemoteExecutor for function execution
2020
"""
2121

2222
import importlib.util
2323
import logging
2424
import os
25+
from pathlib import Path
2526
from typing import Any, Dict
2627

2728
from fastapi import FastAPI
@@ -42,67 +43,98 @@
4243
from runpod_flash.protos.remote_execution import FunctionRequest, FunctionResponse # noqa: E402
4344
from remote_executor import RemoteExecutor # noqa: E402
4445

45-
# Determine mode based on environment variables
46-
is_mothership = os.getenv("FLASH_IS_MOTHERSHIP") == "true"
4746

48-
if is_mothership:
49-
# Mothership mode: Import user's FastAPI application
50-
try:
51-
main_file = os.getenv("FLASH_MAIN_FILE", "main.py")
52-
app_variable = os.getenv("FLASH_APP_VARIABLE", "app")
47+
def _is_lb_endpoint() -> bool:
48+
"""Determine if this endpoint runs in LB mode (serves user FastAPI routes)."""
49+
return os.getenv("FLASH_ENDPOINT_TYPE") == "lb"
5350

54-
logger.info(f"Mothership mode: Importing {app_variable} from {main_file}")
5551

56-
# Dynamic import of user's module
57-
spec = importlib.util.spec_from_file_location("user_main", main_file)
58-
if spec is None or spec.loader is None:
59-
raise ImportError(f"Cannot find or load {main_file}")
52+
def _discover_lb_app(handler_dir: str = "/app") -> FastAPI:
53+
"""Auto-discover and load the generated LB handler's FastAPI app.
6054
61-
user_module = importlib.util.module_from_spec(spec)
62-
spec.loader.exec_module(user_module)
55+
Derives handler path from FLASH_RESOURCE_NAME and imports the module.
6356
64-
# Get the FastAPI app from user's module
65-
if not hasattr(user_module, app_variable):
66-
raise AttributeError(f"Module {main_file} does not have '{app_variable}' attribute")
57+
Args:
58+
handler_dir: Base directory for handler files (default /app).
6759
68-
app = getattr(user_module, app_variable)
60+
Returns:
61+
FastAPI app from the generated handler.
6962
70-
if not isinstance(app, FastAPI):
71-
raise TypeError(
72-
f"Expected FastAPI instance, got {type(app).__name__} for {app_variable}"
73-
)
63+
Raises:
64+
RuntimeError: If FLASH_RESOURCE_NAME is not set or resolves outside handler_dir.
65+
FileNotFoundError: If the handler file does not exist.
66+
ImportError: If the handler module cannot produce a valid spec.
67+
AttributeError: If the handler module lacks an 'app' attribute.
68+
TypeError: If the 'app' attribute is not a FastAPI instance.
69+
"""
70+
resource_name = os.getenv("FLASH_RESOURCE_NAME")
71+
if not resource_name:
72+
raise RuntimeError("FLASH_RESOURCE_NAME not set. Cannot discover generated LB handler.")
73+
74+
handler_file = f"{handler_dir}/handler_{resource_name}.py"
75+
76+
handler_path = Path(handler_file)
77+
if not handler_path.resolve().is_relative_to(Path(handler_dir).resolve()):
78+
raise RuntimeError(f"FLASH_RESOURCE_NAME '{resource_name}' resolves outside {handler_dir}")
79+
80+
app_variable = "app"
81+
82+
spec = importlib.util.spec_from_file_location("user_main", handler_file)
83+
if spec is None or spec.loader is None:
84+
raise ImportError(f"Cannot find or load {handler_file}")
85+
86+
user_module = importlib.util.module_from_spec(spec)
87+
spec.loader.exec_module(user_module)
7488

75-
logger.info(f"Successfully imported FastAPI app '{app_variable}' from {main_file}")
89+
if not hasattr(user_module, app_variable):
90+
raise AttributeError(f"Module {handler_file} does not have '{app_variable}' attribute")
91+
92+
discovered_app = getattr(user_module, app_variable)
93+
94+
if not isinstance(discovered_app, FastAPI):
95+
raise TypeError(
96+
f"Expected FastAPI instance, got {type(discovered_app).__name__} for {app_variable}"
97+
)
98+
99+
return discovered_app
100+
101+
102+
is_lb_endpoint = _is_lb_endpoint()
103+
104+
if is_lb_endpoint:
105+
# LB endpoint mode: Auto-discover generated handler from FLASH_RESOURCE_NAME
106+
try:
107+
app = _discover_lb_app()
108+
logger.info("Successfully imported FastAPI app for LB endpoint")
76109

77110
# Add /ping endpoint if not already present
78-
# Check if /ping route already exists to avoid adding a duplicate health check endpoint
79111
ping_exists = any(getattr(route, "path", None) == "/ping" for route in app.routes)
80112

81113
if not ping_exists:
82114

83115
@app.get("/ping")
84-
async def ping_mothership() -> Dict[str, Any]:
85-
"""Health check endpoint for mothership (added by framework)."""
116+
async def ping_lb() -> Dict[str, Any]:
117+
"""Health check endpoint for LB (added by framework)."""
86118
return {
87119
"status": "healthy",
88-
"endpoint": "mothership",
120+
"endpoint": "lb",
89121
"id": os.getenv("RUNPOD_ENDPOINT_ID", "unknown"),
90122
}
91123

92124
logger.info("Added /ping endpoint to user's FastAPI app")
93125

94126
except Exception as error:
95-
logger.error(f"Failed to initialize mothership mode: {error}", exc_info=True)
127+
logger.error("Failed to initialize LB endpoint mode: %s", error, exc_info=True)
96128
raise
97129

98130
else:
99131
# Queue-based mode: Create generic Load Balancer handler app
100132
app = FastAPI(title="Load Balancer Handler")
101-
logger.info("Queue-based mode: Using generic Load Balancer handler")
133+
logger.info("QB endpoint mode: Using generic Load Balancer handler")
102134

103135

104136
# Queue-based mode endpoints
105-
if not is_mothership:
137+
if not is_lb_endpoint:
106138

107139
@app.get("/ping")
108140
async def ping() -> Dict[str, Any]:

src/manifest_reconciliation.py

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ def is_flash_deployment() -> bool:
3030
endpoint_id = os.getenv("RUNPOD_ENDPOINT_ID")
3131
is_flash = any(
3232
[
33-
os.getenv("FLASH_IS_MOTHERSHIP") == "true",
33+
os.getenv("FLASH_ENDPOINT_TYPE") in ("lb", "qb"),
3434
os.getenv("FLASH_RESOURCE_NAME"),
3535
]
3636
)
@@ -78,8 +78,9 @@ def _is_manifest_stale(
7878
if is_stale:
7979
logger.debug(f"Manifest is stale: {age_seconds:.0f}s old (TTL: {ttl_seconds}s)")
8080
return is_stale
81-
except OSError:
82-
return True # Error reading file, consider stale
81+
except OSError as e:
82+
logger.debug("Cannot stat manifest file %s: %s. Treating as stale.", manifest_path, e)
83+
return True
8384

8485

8586
async def _fetch_and_save_manifest(
@@ -112,8 +113,16 @@ async def _fetch_and_save_manifest(
112113
logger.info("Manifest refreshed from State Manager")
113114
return True
114115

116+
except (OSError, ConnectionError, TimeoutError) as e:
117+
logger.warning("Failed to refresh manifest from State Manager: %s", e)
118+
return False
115119
except Exception as e:
116-
logger.warning(f"Failed to refresh manifest from State Manager: {e}")
120+
logger.error(
121+
"Unexpected error refreshing manifest from State Manager: %s (%s)",
122+
e,
123+
type(e).__name__,
124+
exc_info=True,
125+
)
117126
return False
118127

119128

0 commit comments

Comments
 (0)