Skip to content

Commit 171f0e8

Browse files
authored
docs: propose llm-d inference sim for OpenAI api testing (#11)
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
1 parent c856b7b commit 171f0e8

1 file changed

Lines changed: 67 additions & 0 deletions

File tree

docs/development/openshift-setup.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -559,6 +559,73 @@ curl -k -X POST \
559559
"https://$EVALHUB_URL/api/v1/evaluations/jobs" | jq .
560560
```
561561

562+
## Deploying an OpenAI-compatible simulator
563+
564+
For specific testing purposes when the Model's answers quality is not relevant, you can deploy an OpenAI-compatible inference simulator instead of a real model server.
565+
566+
Deploy the simulator with:
567+
568+
```bash
569+
curl -s https://raw.githubusercontent.com/tarilabs/llm-d-inference-sim/refs/heads/patch-1/manifests/deployment.yaml | yq '.spec.replicas = 1' | kubectl apply -f -
570+
```
571+
572+
> [!NOTE]
573+
> The `patch-1` branch is used until [llm-d/llm-d-inference-sim#348](https://github.com/llm-d/llm-d-inference-sim/pull/348) is resolved to fix the standard example.
574+
> See the upstream [llm-d testing documentation](https://github.com/llm-d/llm-d-inference-sim?tab=readme-ov-file#kubernetes-testing) for more details.
575+
576+
This makes the simulator available as an internal service at:
577+
578+
- **Service:** `vllm-llama3-8b-instruct-svc.evalhub-test.svc.cluster.local` (Accessible within the cluster and the namespace only)
579+
- **Port:** `8000`
580+
- **Model name:** `meta-llama/Llama-3.1-8B-Instruct`
581+
582+
The service exposes an OpenAI-compatible endpoint (e.g. `/v1/chat/completions`).
583+
584+
You can verify the simulator is working from within the namesapce in the cluster (for example by opening a Terminal on the evalhub Pod) with:
585+
586+
```sh
587+
export SIM_URL=vllm-llama3-8b-instruct-svc.evalhub-test.svc.cluster.local:8000
588+
589+
# List available models
590+
curl -s "http://$SIM_URL/v1/models"
591+
592+
# Chat completion
593+
curl -s "http://$SIM_URL/v1/chat/completions" \
594+
-H "Content-Type: application/json" \
595+
-d '{
596+
"model": "meta-llama/Llama-3.1-8B-Instruct",
597+
"messages": [{"role": "user", "content": "Hello"}]
598+
}'
599+
```
600+
601+
### Exposing the simulator externally via a Route
602+
603+
If you need to invoke the simulator externally, create a Route in the OpenShift console:
604+
605+
1. Navigate to **Networking → Routes → Create Route**
606+
2. Select the **Service** `vllm-llama3-8b-instruct-svc`
607+
3. Select the only available **Target Port**
608+
4. Check **Secure Route**
609+
5. Set **TLS Termination** to **Edge**
610+
6. Set **Insecure Traffic** to **None**
611+
612+
Once the Route is created, you can test it:
613+
614+
```sh
615+
export SIM_URL=$(oc get route -n evalhub-test -o jsonpath='{.items[?(@.spec.to.name=="vllm-llama3-8b-instruct-svc")].spec.host}')
616+
617+
# List available models
618+
curl -s "https://$SIM_URL/v1/models" | jq .
619+
620+
# Chat completion
621+
curl -s "https://$SIM_URL/v1/chat/completions" \
622+
-H "Content-Type: application/json" \
623+
-d '{
624+
"model": "meta-llama/Llama-3.1-8B-Instruct",
625+
"messages": [{"role": "user", "content": "Hello"}]
626+
}' | jq .
627+
```
628+
562629
## Troubleshooting
563630

564631
### EvalHub Pod Not Starting

0 commit comments

Comments
 (0)