Describe the feature
Problem
The transcription endpoint filtering logic is broken in K8S service discovery mode due to inconsistent labeling between the Helm chart and service discovery.
Root cause
- Transcription filtering (
src/vllm_router/services/request_service/request.py:594) expects ep.model_label == "transcription"
- Service discovery (
src/vllm_router/service_discovery.py:619) gets model_label from pod label "model"
- Helm chart (
helm/templates/deployment-vllm-multi.yaml:42) automatically sets model: {{ $modelSpec.name }} (e.g.,
"whisper-small")
Result
Transcription requests always fail with "No transcription backend available" because no endpoints ever have model_label == "transcription" - they all have the actual model name instead.
Suggested
I see two ways to solve this problem.
Option 1
Add model label override to the model specification in the HelmChart.
I'm not sure about this one. Is this label used for any other purpose?
Option 2
Add a task parameter to the model specification in Helm values, then:
- Add
task: {{ $modelSpec.task }} label on engine pod in the Helm chart
- Update service discovery to read the
task label instead of/alongside the model label
- Filter transcription endpoints using
ep.task == "transcription" or similar logic
Why do you need this feature?
I’m running a Kubernetes cluster that hosts multiple distinct models behind a single router.
This router is then shared between several AWS accounts via a single ServiceEndpoint.
The list of models and their numbers varies frequently for R&D reasons.
For this reason, k8s discovery is really useful.
Additional context
No response
Describe the feature
Problem
The transcription endpoint filtering logic is broken in K8S service discovery mode due to inconsistent labeling between the Helm chart and service discovery.
Root cause
src/vllm_router/services/request_service/request.py:594) expectsep.model_label == "transcription"src/vllm_router/service_discovery.py:619) getsmodel_labelfrom pod label"model"helm/templates/deployment-vllm-multi.yaml:42) automatically setsmodel: {{ $modelSpec.name }}(e.g.,"whisper-small")
Result
Transcription requests always fail with "No transcription backend available" because no endpoints ever have
model_label == "transcription"- they all have the actual model name instead.Suggested
I see two ways to solve this problem.
Option 1
Add
modellabel override to the model specification in the HelmChart.I'm not sure about this one. Is this label used for any other purpose?
Option 2
Add a
taskparameter to the model specification in Helm values, then:task: {{ $modelSpec.task }}label on engine pod in the Helm charttasklabel instead of/alongside themodellabelep.task == "transcription"or similar logicWhy do you need this feature?
I’m running a Kubernetes cluster that hosts multiple distinct models behind a single router.
This router is then shared between several AWS accounts via a single ServiceEndpoint.
The list of models and their numbers varies frequently for R&D reasons.
For this reason, k8s discovery is really useful.
Additional context
No response