Inference Operator v3.2.0 ignores l2CacheBackend: redis, hardcodes LMCACHE_REMOTE_URL=sagemaker-hyperpod://...

### Environment

- EKS add-on `amazon-sagemaker-hyperpod-inference` **v1.3.0-eksbuild.1** (latest available in `ap-northeast-1`)
- Inference Operator image `hyperpod-inference-operator:v3.2.0`
- Worker image `lmcache/vllm-openai:v0.4.7` (LMCache 0.4.7, vLLM 0.23.0)
- Instance type `ml.g7e.4xlarge`, HyperPod EKS-orchestrated cluster

### What happened

Setting `kvCacheSpec.l2CacheSpec.l2CacheBackend: redis` (with `l2CacheLocalUrl: redis://...:6379`) in the `InferenceEndpointConfig` has **no effect**. The operator instead injects into the worker pod:

```
LMCACHE_REMOTE_URL=sagemaker-hyperpod://$(NODE_IP):9200
LMCACHE_EXTRA_CONFIG={"sagemaker_hyperpod_shared_memory_name": "ai_toolkit_cache"}
```

The `redis` value from the CRD is silently dropped — the operator never emits `LMCACHE_REMOTE_URL=redis://...`. Attempting to override `LMCACHE_REMOTE_URL` via `worker.environmentVariables` does **not** work either: those `LMCACHE_*` entries do not appear in the rendered pod (the operator strips/overrides them).

Because the worker is forced onto the `sagemaker-hyperpod` connector, and that connector requires a host-side `ai-toolkit` daemon (POSIX shared memory `/ai_toolkit_cache` + TCP `:9200`) which is not present on the cluster, LMCache enters degraded mode and the L2 cache never stores anything:

```
LMCache ERROR: Failed to initialize shared memory: [Errno 22] Invalid argument: '/ai_toolkit_cache'
LMCache WARNING: Health check failed: RemoteBackendHealthCheck(sagemaker-hyperpod://<NODE_IP>:9200)
LMCache WARNING: HealthMonitor: System unhealthy, entering degraded mode
LMCache WARNING: LMCache is unhealthy, skipping store operation
... LMCache hit tokens: 0     (External prefix cache hit rate: 0.0%)
```

### Expected behavior

With `l2CacheBackend: redis` + `l2CacheLocalUrl: redis://...:6379`, the worker's LMCache should be configured with `LMCACHE_REMOTE_URL=redis://...`, connect to the specified Redis endpoint, report healthy, and perform L2 KV-cache store/lookup. `redis` is listed as a supported L2 backend in the docs ([KV cache & intelligent routing](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod-model-deployment-caching-routing.html)), so the operator overriding it contradicts the documentation.

### Reproduction

1. Deploy an `InferenceEndpointConfig` with:
   ```yaml
   kvCacheSpec:
     enableL1Cache: true
     enableL2Cache: true
     l2CacheSpec:
       l2CacheBackend: redis
       l2CacheLocalUrl: redis://<redis-svc>.<ns>.svc.cluster.local:6379
   ```
2. (Optionally) add `worker.environmentVariables` entries for `LMCACHE_REMOTE_URL` to try to override.
3. Inspect the rendered worker pod:
   ```
   kubectl get pod <worker> -o jsonpath='{range .spec.containers[*].env[*]}{.name}={.value}{"\n"}{end}' | grep LMCACHE
   ```
4. Observe `LMCACHE_REMOTE_URL=sagemaker-hyperpod://$(NODE_IP):9200` (not `redis://...`), and the worker logs showing the LMCache unhealthy/degraded loop above.

### Additional question (tieredstorage)

Separately: `l2CacheBackend: tieredstorage` requires the host-side `ai-toolkit` daemon (shm `/ai_toolkit_cache` + `:9200`). On our cluster this daemon is not installed as any DaemonSet, and the add-on configuration schema (`aws eks describe-addon-configuration`) exposes no toggle for it (only `alb, enableCustomServiceAccounts, executionRoleArn, hyperpodClusterArn, jumpstartGatedModelDownloadRoleArn, keda, tlsCertificateS3Bucket`). How is tiered storage meant to be enabled — a cluster-creation flag, and can it be enabled on an existing cluster?

### References

- LMCache SageMaker HyperPod connector (client expecting a pre-existing daemon): https://github.com/LMCache/LMCache/pull/1937
- LMCache backend docs: https://docs.lmcache.ai/kv_cache/storage_backends/sagemaker_hyperpod.html
- AWS KV cache & routing docs: https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod-model-deployment-caching-routing.html


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference Operator v3.2.0 ignores l2CacheBackend: redis, hardcodes LMCACHE_REMOTE_URL=sagemaker-hyperpod://... #431

Environment

What happened

Expected behavior

Reproduction

Additional question (tieredstorage)

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Inference Operator v3.2.0 ignores l2CacheBackend: redis, hardcodes LMCACHE_REMOTE_URL=sagemaker-hyperpod://... #431

Description

Environment

What happened

Expected behavior

Reproduction

Additional question (tieredstorage)

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions