You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -361,7 +361,7 @@ The app defaults to dark mode. Click the theme toggle in the header to switch to
361
361
362
362
---
363
363
364
-
## Inference Benchmarks
364
+
## Inference Metrics
365
365
366
366
The table below compares inference performance across different providers, deployment modes, and hardware profiles using a standardized code-translation workload (averaged over 3 runs).
367
367
@@ -377,7 +377,7 @@ The table below compares inference performance across different providers, deplo
377
377
> **Notes:**
378
378
>
379
379
> - Context Window for Ollama (8K) and vLLM (4K) reflects the `LLM_MAX_TOKENS` / `--max-model-len` used during benchmarking, not the model's native 262K context. vLLM shares its 4K context between input and output tokens.
380
-
> - All benchmarks use the same CodeTrans translation prompt and identical inputs (3 runs: small python→java, medium python→rust, large python→go). Token counts may vary slightly per run due to non-deterministic model output.
380
+
> - All metrics use the same CodeTrans translation prompt and identical inputs (3 runs: small python→java, medium python→rust, large python→go). Token counts may vary slightly per run due to non-deterministic model output.
381
381
> - Ollama on Apple Silicon uses Metal (MPS) GPU acceleration — running it inside Docker would fall back to CPU-only inference. The `qwen3:4b-instruct` tag must be used (not `qwen3:4b`) to disable the default thinking mode.
382
382
> - vLLM on Apple Silicon uses [vllm-metal](https://github.com/vllm-project/vllm-metal) — the standard `pip install vllm` does not support macOS.
383
383
> -[Intel OPEA Enterprise Inference](https://github.com/opea-project/Enterprise-Inference) runs on Intel Xeon CPUs without GPU acceleration.
@@ -726,4 +726,4 @@ This project is licensed under our [LICENSE](./LICENSE.md) file for details.
726
726
- Do not submit confidential or proprietary code to third-party API providers without reviewing their data handling policies
727
727
- The quality of translation depends on the underlying model and may vary across language pairs and code complexity
728
728
729
-
For full disclaimer details, see [DISCLAIMER.md](./DISCLAIMER.md).
729
+
For full disclaimer details, see [DISCLAIMER.md](./DISCLAIMER.md).
0 commit comments