When the responses_api_models/vllm_model/configs/vllm_model_for_training.yaml configuration is used with a vLLM server (for example, when training using NeMo RL), log probabilities are returned by the vLLM server when the top_logprobs field is not included in the request to the /v1/responses endpoint of the responses API model. However, if the top_logprobs field is included in the request with a value of null (the default value specified in nemo_gym/openai_utils.py), then log probabilities are not returned by the vLLM server, and an error occurs because Gym expects log probabilities to be present in the chat completions choice object returned by the vLLM server.
To observe this behavior, one can run the command
ng_run "+config_paths=[responses_api_models/vllm_model/configs/vllm_model_for_training.yaml]" +policy_base_url=<URL of vLLM server> +policy_api_key=<API key> +policy_model_name=<model name>
to start the vLLM responses API model for training. Then, the first request in the responses_api_models/vllm_model/client.py script can be changed to
task_1a = await server_client.post(
server_name="policy_model",
url_path="/v1/responses",
json={
"input": [{"role": "user", "content": "hello"}],
"top_logprobs": None,
},
)
so that the top_logprobs field is present in the request to the responses API model with a value of null. The modified script can then be run using the command
python responses_api_models/vllm_model/client.py
to produce an error with a message such as the following:
(policy_model) File "responses_api_models/vllm_model/app.py", line 109, in
responses
(policy_model) chat_completion_response = await self.chat_completions(request, chat_completion_create_params)
(policy_model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(policy_model) File "responses_api_models/vllm_model/app.py", line 270, in
chat_completions
(policy_model) log_probs = choice_dict["logprobs"]["content"]
(policy_model) ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
(policy_model) TypeError: 'NoneType' object is not subscriptable
When the responses_api_models/vllm_model/configs/vllm_model_for_training.yaml configuration is used with a vLLM server (for example, when training using NeMo RL), log probabilities are returned by the vLLM server when the
top_logprobsfield is not included in the request to the/v1/responsesendpoint of the responses API model. However, if thetop_logprobsfield is included in the request with a value ofnull(the default value specified in nemo_gym/openai_utils.py), then log probabilities are not returned by the vLLM server, and an error occurs because Gym expects log probabilities to be present in the chat completions choice object returned by the vLLM server.To observe this behavior, one can run the command
to start the vLLM responses API model for training. Then, the first request in the responses_api_models/vllm_model/client.py script can be changed to
so that the
top_logprobsfield is present in the request to the responses API model with a value ofnull. The modified script can then be run using the commandto produce an error with a message such as the following: