Skip to content

Commit 0c23eb5

Browse files
authored
Merge pull request #5 from cld2labs/docs/rename-benchmarks-to-metrics
Rename Inference Benchmarks section to Inference Metrics
2 parents f124b33 + 13bfa73 commit 0c23eb5

1 file changed

Lines changed: 4 additions & 4 deletions

File tree

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ An AI-powered full-stack application that translates source code between program
3333
- [Project Structure](#project-structure)
3434
- [Usage Guide](#usage-guide)
3535
- [Performance Tips](#performance-tips)
36-
- [Inference Benchmarks](#inference-benchmarks)
36+
- [Inference Metrics](#inference-metrics)
3737
- [Model Capabilities](#model-capabilities)
3838
- [Qwen3-4B-Instruct-2507](#qwen3-4b-instruct-2507)
3939
- [GPT-4o-mini](#gpt-4o-mini)
@@ -361,7 +361,7 @@ The app defaults to dark mode. Click the theme toggle in the header to switch to
361361

362362
---
363363

364-
## Inference Benchmarks
364+
## Inference Metrics
365365

366366
The table below compares inference performance across different providers, deployment modes, and hardware profiles using a standardized code-translation workload (averaged over 3 runs).
367367

@@ -377,7 +377,7 @@ The table below compares inference performance across different providers, deplo
377377
> **Notes:**
378378
>
379379
> - Context Window for Ollama (8K) and vLLM (4K) reflects the `LLM_MAX_TOKENS` / `--max-model-len` used during benchmarking, not the model's native 262K context. vLLM shares its 4K context between input and output tokens.
380-
> - All benchmarks use the same CodeTrans translation prompt and identical inputs (3 runs: small python→java, medium python→rust, large python→go). Token counts may vary slightly per run due to non-deterministic model output.
380+
> - All metrics use the same CodeTrans translation prompt and identical inputs (3 runs: small python→java, medium python→rust, large python→go). Token counts may vary slightly per run due to non-deterministic model output.
381381
> - Ollama on Apple Silicon uses Metal (MPS) GPU acceleration — running it inside Docker would fall back to CPU-only inference. The `qwen3:4b-instruct` tag must be used (not `qwen3:4b`) to disable the default thinking mode.
382382
> - vLLM on Apple Silicon uses [vllm-metal](https://github.com/vllm-project/vllm-metal) — the standard `pip install vllm` does not support macOS.
383383
> - [Intel OPEA Enterprise Inference](https://github.com/opea-project/Enterprise-Inference) runs on Intel Xeon CPUs without GPU acceleration.
@@ -726,4 +726,4 @@ This project is licensed under our [LICENSE](./LICENSE.md) file for details.
726726
- Do not submit confidential or proprietary code to third-party API providers without reviewing their data handling policies
727727
- The quality of translation depends on the underlying model and may vary across language pairs and code complexity
728728

729-
For full disclaimer details, see [DISCLAIMER.md](./DISCLAIMER.md).
729+
For full disclaimer details, see [DISCLAIMER.md](./DISCLAIMER.md).

0 commit comments

Comments
 (0)