Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 12 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,23 @@
# llm-load-test-exporter

### Overview
The purpose of this program is to run [llm-load-test](https://github.com/openshift-psap/llm-load-test) application and then serve the resulting metrics to a /metrics endpoint. This is meant to be run in a kubernetes/openshift environment.
## Overview

This application is split into 2:
1. run-llm-load-test: This is the application that actually runs llm-load-test against all models running in a cluster, and saves the output into a volume.
2. exporter: This application exports the results of running llm-load-test to the /metrics endpoint.
The purpose of this program is to run [llm-load-test](https://github.com/openshift-psap/llm-load-test) application and then serve the resulting metrics to a `/metrics` endpoint. This is meant to be run in a kubernetes/OpenShift environment.

### Deploying Application in OpenShift
This application is split into two parts:

1. `run-llm-load-test`: the application that actually runs `llm-load-test` against all models running in a cluster, and saves the output into a volume.
2. `exporter`: exports the results of running `llm-load-test` to the `/metrics` endpoint.

## Deploying Application in OpenShift

In order to deploy this application in OpenShift do the following:

1. Modify the namespace this will be deployed to in base/kustomization.yaml
2. Modify how often the load test is run by modifying WAIT_TIME env variable in base/deployment.yaml
3. Run `oc create -k base/`
2. Modify how often the load test is run by modifying `WAIT_TIME` environment variable in `base/deployment.yaml`
3. Run `oc apply -k base`

### Example output when querying the /metrics endpoint:
## Example output when querying the /metrics endpoint

```
# HELP llm_performance_itl Inter-token Latency (ms)
Expand All @@ -40,4 +42,4 @@ llm_performance_throughput{model="granite",namespace="granite-instruct"} 18.8040
llm_performance_latency{model="granite-internal",namespace="granite-instruct"} 277.4471044540405
llm_performance_latency{model="granite-test2",namespace="granite-instruct"} 285.0987911224365
llm_performance_latency{model="granite",namespace="granite-instruct"} 268.87357234954834
```
```
9 changes: 0 additions & 9 deletions base/configmap.yaml

This file was deleted.

9 changes: 5 additions & 4 deletions base/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ spec:
template:
spec:
initContainers:
- name: init-container
image: alpine/git:latest
- name: clone-repository
image: docker.io/alpine/git:latest
imagePullPolicy: Always
command: ['sh', '-c', 'git clone https://github.com/openshift-psap/llm-load-test.git /shared_data/llm-load-test']
command: ['git', 'clone', 'https://github.com/openshift-psap/llm-load-test.git /shared_data/llm-load-test']
volumeMounts:
- name: llm-load-test-dir
mountPath: /shared_data
Expand All @@ -19,7 +19,8 @@ spec:
image: quay.io/rh-ee-istaplet/nerc-tools:llm-load-test-exporter
imagePullPolicy: Always
ports:
- containerPort: 8080
- name: web
containerPort: 8080
volumeMounts:
- name: llm-load-test-dir
mountPath: /shared_data
Expand Down
3 changes: 3 additions & 0 deletions base/files/uwl_metrics_list.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
matches:
- __name__=~"(llm_performance_.*)"
- __name__=~"(vllm:.*)"
8 changes: 7 additions & 1 deletion base/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,10 @@ resources:
- clusterrole.yaml
- clusterrolebinding.yaml
- servicemonitor.yaml
- configmap.yaml

configMapGenerator:
- name: observability-metrics-custom-allowlist
options:
disableNameSuffixHash: true
files:
- files/uwl_metrics_list.yaml
2 changes: 1 addition & 1 deletion base/service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ spec:
- name: web
protocol: TCP
port: 8080
targetPort: 8080
targetPort: web
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ FROM python:3.12-slim

WORKDIR /app/

COPY . ./
COPY requirements.txt ./

RUN pip install -r requirements.txt

COPY . ./
Loading