Environment
- EKS add-on
amazon-sagemaker-hyperpod-inference v1.3.0-eksbuild.1 (latest available in ap-northeast-1)
- Inference Operator image
hyperpod-inference-operator:v3.2.0
- Worker image
lmcache/vllm-openai:v0.4.7 (LMCache 0.4.7, vLLM 0.23.0)
- Instance type
ml.g7e.4xlarge, HyperPod EKS-orchestrated cluster
What happened
Setting kvCacheSpec.l2CacheSpec.l2CacheBackend: redis (with l2CacheLocalUrl: redis://...:6379) in the InferenceEndpointConfig has no effect. The operator instead injects into the worker pod:
LMCACHE_REMOTE_URL=sagemaker-hyperpod://$(NODE_IP):9200
LMCACHE_EXTRA_CONFIG={"sagemaker_hyperpod_shared_memory_name": "ai_toolkit_cache"}
The redis value from the CRD is silently dropped — the operator never emits LMCACHE_REMOTE_URL=redis://.... Attempting to override LMCACHE_REMOTE_URL via worker.environmentVariables does not work either: those LMCACHE_* entries do not appear in the rendered pod (the operator strips/overrides them).
Because the worker is forced onto the sagemaker-hyperpod connector, and that connector requires a host-side ai-toolkit daemon (POSIX shared memory /ai_toolkit_cache + TCP :9200) which is not present on the cluster, LMCache enters degraded mode and the L2 cache never stores anything:
LMCache ERROR: Failed to initialize shared memory: [Errno 22] Invalid argument: '/ai_toolkit_cache'
LMCache WARNING: Health check failed: RemoteBackendHealthCheck(sagemaker-hyperpod://<NODE_IP>:9200)
LMCache WARNING: HealthMonitor: System unhealthy, entering degraded mode
LMCache WARNING: LMCache is unhealthy, skipping store operation
... LMCache hit tokens: 0 (External prefix cache hit rate: 0.0%)
Expected behavior
With l2CacheBackend: redis + l2CacheLocalUrl: redis://...:6379, the worker's LMCache should be configured with LMCACHE_REMOTE_URL=redis://..., connect to the specified Redis endpoint, report healthy, and perform L2 KV-cache store/lookup. redis is listed as a supported L2 backend in the docs (KV cache & intelligent routing), so the operator overriding it contradicts the documentation.
Reproduction
- Deploy an
InferenceEndpointConfig with:
kvCacheSpec:
enableL1Cache: true
enableL2Cache: true
l2CacheSpec:
l2CacheBackend: redis
l2CacheLocalUrl: redis://<redis-svc>.<ns>.svc.cluster.local:6379
- (Optionally) add
worker.environmentVariables entries for LMCACHE_REMOTE_URL to try to override.
- Inspect the rendered worker pod:
kubectl get pod <worker> -o jsonpath='{range .spec.containers[*].env[*]}{.name}={.value}{"\n"}{end}' | grep LMCACHE
- Observe
LMCACHE_REMOTE_URL=sagemaker-hyperpod://$(NODE_IP):9200 (not redis://...), and the worker logs showing the LMCache unhealthy/degraded loop above.
Additional question (tieredstorage)
Separately: l2CacheBackend: tieredstorage requires the host-side ai-toolkit daemon (shm /ai_toolkit_cache + :9200). On our cluster this daemon is not installed as any DaemonSet, and the add-on configuration schema (aws eks describe-addon-configuration) exposes no toggle for it (only alb, enableCustomServiceAccounts, executionRoleArn, hyperpodClusterArn, jumpstartGatedModelDownloadRoleArn, keda, tlsCertificateS3Bucket). How is tiered storage meant to be enabled — a cluster-creation flag, and can it be enabled on an existing cluster?
References
Environment
amazon-sagemaker-hyperpod-inferencev1.3.0-eksbuild.1 (latest available inap-northeast-1)hyperpod-inference-operator:v3.2.0lmcache/vllm-openai:v0.4.7(LMCache 0.4.7, vLLM 0.23.0)ml.g7e.4xlarge, HyperPod EKS-orchestrated clusterWhat happened
Setting
kvCacheSpec.l2CacheSpec.l2CacheBackend: redis(withl2CacheLocalUrl: redis://...:6379) in theInferenceEndpointConfighas no effect. The operator instead injects into the worker pod:The
redisvalue from the CRD is silently dropped — the operator never emitsLMCACHE_REMOTE_URL=redis://.... Attempting to overrideLMCACHE_REMOTE_URLviaworker.environmentVariablesdoes not work either: thoseLMCACHE_*entries do not appear in the rendered pod (the operator strips/overrides them).Because the worker is forced onto the
sagemaker-hyperpodconnector, and that connector requires a host-sideai-toolkitdaemon (POSIX shared memory/ai_toolkit_cache+ TCP:9200) which is not present on the cluster, LMCache enters degraded mode and the L2 cache never stores anything:Expected behavior
With
l2CacheBackend: redis+l2CacheLocalUrl: redis://...:6379, the worker's LMCache should be configured withLMCACHE_REMOTE_URL=redis://..., connect to the specified Redis endpoint, report healthy, and perform L2 KV-cache store/lookup.redisis listed as a supported L2 backend in the docs (KV cache & intelligent routing), so the operator overriding it contradicts the documentation.Reproduction
InferenceEndpointConfigwith:worker.environmentVariablesentries forLMCACHE_REMOTE_URLto try to override.LMCACHE_REMOTE_URL=sagemaker-hyperpod://$(NODE_IP):9200(notredis://...), and the worker logs showing the LMCache unhealthy/degraded loop above.Additional question (tieredstorage)
Separately:
l2CacheBackend: tieredstoragerequires the host-sideai-toolkitdaemon (shm/ai_toolkit_cache+:9200). On our cluster this daemon is not installed as any DaemonSet, and the add-on configuration schema (aws eks describe-addon-configuration) exposes no toggle for it (onlyalb, enableCustomServiceAccounts, executionRoleArn, hyperpodClusterArn, jumpstartGatedModelDownloadRoleArn, keda, tlsCertificateS3Bucket). How is tiered storage meant to be enabled — a cluster-creation flag, and can it be enabled on an existing cluster?References