diff --git a/integrations/llms/gemini.mdx b/integrations/llms/gemini.mdx index fc9236d0..34f6229f 100644 --- a/integrations/llms/gemini.mdx +++ b/integrations/llms/gemini.mdx @@ -2135,17 +2135,32 @@ Note that you will have to set [`strict_open_ai_compliance=False`](/product/ai-g ### Using reasoning_effort Parameter -You can also control thinking using the OpenAI-compatible `reasoning_effort` parameter instead of `thinking.budget_tokens`. The value is passed directly to Gemini as the `thinkingLevel`: +You can also control thinking using the OpenAI-compatible `reasoning_effort` parameter instead of `thinking.budget_tokens`: ```python response = portkey.chat.completions.create( model="gemini-2.5-flash-preview-04-17", max_tokens=3000, - reasoning_effort="medium", # Options: "none", "minimal", "low", "medium", "high" + reasoning_effort="medium", # Options: "none", "low", "medium", "high" messages=[{"role": "user", "content": "Explain quantum computing"}] ) ``` +#### Gemini 2.5 Models + +For Gemini 2.5 models, `reasoning_effort` maps to `thinking_budget` with specific token allocations: + +| reasoning_effort | thinking_budget (tokens) | +|------------------|--------------------------| +| `none` | Disabled | +| `low` | 1,024 | +| `medium` | 8,192 | +| `high` | 24,576 | + +#### Gemini 3.0+ Models + +For Gemini 3.0 and later models, `reasoning_effort` maps directly to `thinkingLevel`: + | reasoning_effort | Gemini thinkingLevel | |------------------|---------------------| | `none` | Disabled | diff --git a/integrations/llms/vertex-ai.mdx b/integrations/llms/vertex-ai.mdx index 08a0c4f4..c9d508fd 100644 --- a/integrations/llms/vertex-ai.mdx +++ b/integrations/llms/vertex-ai.mdx @@ -723,17 +723,32 @@ Note that you will have to set [`strict_open_ai_compliance=False`](/product/ai-g ### Using reasoning_effort Parameter -You can also control thinking using the OpenAI-compatible `reasoning_effort` parameter instead of `thinking.budget_tokens`. The value is passed directly to Gemini as the `thinkingLevel`: +You can also control thinking using the OpenAI-compatible `reasoning_effort` parameter instead of `thinking.budget_tokens`: ```python response = portkey.chat.completions.create( model="@VERTEX_PROVIDER/google.gemini-2.5-flash-preview-04-17", max_tokens=3000, - reasoning_effort="medium", # Options: "none", "minimal", "low", "medium", "high" + reasoning_effort="medium", # Options: "none", "low", "medium", "high" messages=[{"role": "user", "content": "Explain quantum computing"}] ) ``` +#### Gemini 2.5 Models + +For Gemini 2.5 models, `reasoning_effort` maps to `thinking_budget` with specific token allocations: + +| reasoning_effort | thinking_budget (tokens) | +|------------------|--------------------------| +| `none` | Disabled | +| `low` | 1,024 | +| `medium` | 8,192 | +| `high` | 24,576 | + +#### Gemini 3.0+ Models + +For Gemini 3.0 and later models, `reasoning_effort` maps directly to `thinkingLevel`: + | reasoning_effort | Vertex thinkingLevel | |------------------|---------------------| | `none` | Disabled |