diff --git a/fern/versions/latest/pages/model-server/local-vllm-proxy.mdx b/fern/versions/latest/pages/model-server/local-vllm-proxy.mdx index 602715fc7..e02b83e97 100644 --- a/fern/versions/latest/pages/model-server/local-vllm-proxy.mdx +++ b/fern/versions/latest/pages/model-server/local-vllm-proxy.mdx @@ -10,12 +10,12 @@ It is a subclass of VLLMModel, so it accepts the same configuration fields, but ## When to use it Use a proxy when you need several model servers that share **one** vLLM deployment but differ in their request-time configuration. -For example, one server with reasoning enabled and one with reasoning disabled in through the request params, or servers with different sampling parameters. +For example, one server with reasoning enabled and one with reasoning disabled through the request params, or servers with different sampling parameters. Without the proxy you would have to launch a separate vLLM engine (and duplicate GPUs) for each variation. At startup the proxy waits for its referenced LocalVLLMModel to come up, reads that server's inner vLLM endpoint (`base_url`, `api_key`, `model`), and routes all of its own requests there. -If you are working with an extising vLLM endpoint that you manage outside of Gym, use [VLLMModel](/model-server/vllm) instead. +If you are working with an existing vLLM endpoint that you manage outside of Gym, use [VLLMModel](/model-server/vllm) instead. ## Configuration @@ -54,4 +54,4 @@ ng_run "+config_paths=[${config_paths}]" \ | `model_server` | `ModelServerRef` | — | **Required.** The LocalVLLMModel server to forward requests to, by `type` and `name`. | `base_url`, `api_key`, and `model` are populated automatically from the backing server and should **not** be set in your config. -All other VLLMModel fields (`chat_template_kwargs`, `extra_body`, `return_token_id_information`, etc.) behave as documented in the [VLLMModel configuration reference](/model-server/vllm#vllmmodel-configuration-reference). +All other VLLMModel fields (`chat_template_kwargs`, `extra_body`, `return_token_id_information`, and so on) behave as documented in the [VLLMModel configuration reference](/model-server/vllm#vllmmodel-configuration-reference). diff --git a/fern/versions/latest/pages/model-server/local-vllm.mdx b/fern/versions/latest/pages/model-server/local-vllm.mdx index 9091eacde..afcbccdac 100644 --- a/fern/versions/latest/pages/model-server/local-vllm.mdx +++ b/fern/versions/latest/pages/model-server/local-vllm.mdx @@ -8,7 +8,7 @@ NeMo Gym can launch and manage the vLLM server for you using LocalVLLMModel (in LocalVLLMModel is a subclass of VLLMModel that spawns the vLLM engine and auto-configures the model server to use it. The Chat Completions to Responses API conversion is inherited from VLLMModel. See [VLLMModel](/model-server/vllm) for details. -A single LocalVLLMModel deployment can back multiple model servers, even when they need different request-time settings (e.g. sampling parameters or reasoning on/off). +A single LocalVLLMModel deployment can back multiple model servers, even when they need different request-time settings (for example, sampling parameters or reasoning on or off). See [Local vLLM Proxy](/model-server/local-vllm-proxy) for this configuration. @@ -49,7 +49,7 @@ LocalVLLMModel inherits all fields from VLLMModel (see [VLLMModel configuration |-----------|------|---------|-------------| | `vllm_serve_kwargs` | `dict` | — | **Required.** Arguments passed through to `vllm serve`. See `vllm_serve_kwargs` below. | | `vllm_serve_env_vars` | `dict` | — | **Required.** Environment variables for the vLLM process. Must include `VLLM_RAY_DP_PACK_STRATEGY`. | -| `hf_home` | `str` | `/.cache/huggingface` | Hugging Face cache directory. Set this if you've already downloaded weights elsewhere. | +| `hf_home` | `str` | `/.cache/huggingface` | Hugging Face cache directory. Set this if you have already downloaded weights elsewhere. | | `debug` | `bool` | `false` | Print vLLM server logs to stderr. | | `show_vllm_engine_stats` | `bool` | `false` | Periodically log vLLM engine throughput stats. | | `ray_worker_py_executable` | `str` | `sys.executable` | Python interpreter Ray uses for worker processes. | @@ -152,7 +152,7 @@ ng_run "+config_paths=[${config_paths}]" The following capabilities work the same as in VLLMModel. See [VLLMModel configuration reference](/model-server/vllm#vllmmodel-configuration-reference) for details. - **`chat_template_kwargs`**: override chat template behavior per model. -- **`extra_body`**: pass vLLM-specific request parameters (e.g. `guided_json`, `reasoning.effort`). +- **`extra_body`**: pass vLLM-specific request parameters (for example, `guided_json`, `reasoning.effort`). - **`return_token_id_information`**: enable for training workflows that need `prompt_token_ids`, `generation_token_ids`, and `generation_log_probs`. diff --git a/fern/versions/latest/pages/model-server/vllm.mdx b/fern/versions/latest/pages/model-server/vllm.mdx index b81906831..45ec76e5e 100644 --- a/fern/versions/latest/pages/model-server/vllm.mdx +++ b/fern/versions/latest/pages/model-server/vllm.mdx @@ -17,7 +17,7 @@ ng_run "+config_paths=[$config_paths]" ``` -VLLMModel connects NeMo Gym to a vLLM server that you start and manage yourself. If you'd prefer NeMo Gym to launch and manage vLLM itself, use LocalVLLMModel instead. See [LocalVLLMModel](/model-server/local-vllm) to learn more. +VLLMModel connects NeMo Gym to a vLLM server that you start and manage yourself. If you would prefer NeMo Gym to launch and manage vLLM itself, use LocalVLLMModel instead. See [LocalVLLMModel](/model-server/local-vllm) to learn more. ## Use VLLMModel diff --git a/responses_api_models/vllm_model/README.md b/responses_api_models/vllm_model/README.md index 85c51de5e..5578d29bf 100644 --- a/responses_api_models/vllm_model/README.md +++ b/responses_api_models/vllm_model/README.md @@ -16,7 +16,7 @@ View the logs tail -f temp.log ``` -Once you see that server instances are up, call the server. If you see a model response here, then everything is working as intended! +Once you see that server instances are up, call the server. If you see a model response here, then everything is working as intended. ```bash python responses_api_agents/simple_agent/client.py ```