Skip to content

Commit 24f685c

Browse files
authored
Merge pull request #27 from tarilabs/tarilabs-20260327
docs: reflected latest inference simulator
2 parents 0d2be6a + a7450ac commit 24f685c

1 file changed

Lines changed: 12 additions & 15 deletions

File tree

docs/development/openshift-setup.md

Lines changed: 12 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -602,7 +602,7 @@ cat > eval-request.json <<EOF
602602
{
603603
"model": {
604604
"url": "http://vllm-server.models.svc.cluster.local:8000/v1",
605-
"name": "meta-llama/Llama-3.2-1B-Instruct"
605+
"name": "Qwen/Qwen3.5-397B-A17B"
606606
},
607607
"benchmarks": [
608608
{
@@ -634,29 +634,26 @@ curl -sS -k -X POST \
634634

635635
For specific testing purposes when the Model's answers quality is not relevant, you can deploy an OpenAI-compatible inference simulator instead of a real model server.
636636

637-
Deploy the simulator with:
637+
Deploy the simulator in the currently active namespace with:
638638

639639
```bash
640-
curl -s https://raw.githubusercontent.com/tarilabs/llm-d-inference-sim/refs/heads/patch-1/manifests/deployment.yaml | yq '.spec.replicas = 1' | oc apply -f -
640+
curl -s https://raw.githubusercontent.com/llm-d/llm-d-inference-sim/refs/heads/main/manifests/deployment.yaml \
641+
| yq 'select(.kind == "Deployment").spec.replicas = 1 | select(.kind == "Deployment").spec.template.spec.containers[0].image |= sub(":dev$", ":latest")' \
642+
| oc apply -f -
641643
```
642644

643-
!!! note
644-
645-
The `patch-1` branch is used until [llm-d/llm-d-inference-sim#348](https://github.com/llm-d/llm-d-inference-sim/pull/348) is resolved to fix the standard example.
646-
See the upstream [llm-d testing documentation](https://github.com/llm-d/llm-d-inference-sim?tab=readme-ov-file#kubernetes-testing) for more details.
647-
648645
This makes the simulator available as an internal service at:
649646

650-
- **Service:** `vllm-llama3-8b-instruct-svc.evalhub-test.svc.cluster.local` (Accessible within the cluster and the namespace only)
647+
- **Service:** `vllm-sim-demo-svc.evalhub-test.svc.cluster.local` (Accessible within the cluster and the namespace only)
651648
- **Port:** `8000`
652-
- **Model name:** `meta-llama/Llama-3.1-8B-Instruct`
649+
- **Model name:** `Qwen/Qwen3.5-397B-A17B`
653650

654651
The service exposes an OpenAI-compatible endpoint (e.g. `/v1/chat/completions`).
655652

656653
You can verify the simulator is working from within the namesapce in the cluster (for example by opening a Terminal on the evalhub Pod) with:
657654

658655
```sh
659-
export SIM_URL=vllm-llama3-8b-instruct-svc.evalhub-test.svc.cluster.local:8000
656+
export SIM_URL=vllm-sim-demo-svc.evalhub-test.svc.cluster.local:8000
660657

661658
# List available models
662659
curl -s "http://$SIM_URL/v1/models"
@@ -665,7 +662,7 @@ curl -s "http://$SIM_URL/v1/models"
665662
curl -s "http://$SIM_URL/v1/chat/completions" \
666663
-H "Content-Type: application/json" \
667664
-d '{
668-
"model": "meta-llama/Llama-3.1-8B-Instruct",
665+
"model": "Qwen/Qwen3.5-397B-A17B",
669666
"messages": [{"role": "user", "content": "Hello"}]
670667
}'
671668
```
@@ -675,7 +672,7 @@ curl -s "http://$SIM_URL/v1/chat/completions" \
675672
If you need to invoke the simulator externally, create a Route in the OpenShift console:
676673

677674
1. Navigate to **Networking → Routes → Create Route**
678-
2. Select the **Service** `vllm-llama3-8b-instruct-svc`
675+
2. Select the **Service** `vllm-sim-demo-svc`
679676
3. Select the only available **Target Port**
680677
4. Check **Secure Route**
681678
5. Set **TLS Termination** to **Edge**
@@ -684,7 +681,7 @@ If you need to invoke the simulator externally, create a Route in the OpenShift
684681
Once the Route is created, you can test it:
685682

686683
```sh
687-
export SIM_URL=$(oc get route -n evalhub-test -o jsonpath='{.items[?(@.spec.to.name=="vllm-llama3-8b-instruct-svc")].spec.host}')
684+
export SIM_URL=$(oc get route -n evalhub-test -o jsonpath='{.items[?(@.spec.to.name=="vllm-sim-demo-svc")].spec.host}')
688685

689686
# List available models
690687
curl -s "https://$SIM_URL/v1/models" | jq .
@@ -693,7 +690,7 @@ curl -s "https://$SIM_URL/v1/models" | jq .
693690
curl -s "https://$SIM_URL/v1/chat/completions" \
694691
-H "Content-Type: application/json" \
695692
-d '{
696-
"model": "meta-llama/Llama-3.1-8B-Instruct",
693+
"model": "Qwen/Qwen3.5-397B-A17B",
697694
"messages": [{"role": "user", "content": "Hello"}]
698695
}' | jq .
699696
```

0 commit comments

Comments
 (0)