Skip to content

Conversation

@stbenjam
Copy link
Member

@stbenjam stbenjam commented Nov 5, 2025

  • Add support for multiple LLM models including ChatVertexAnthropic
  • Implement models.yaml configuration system for model management
  • Add model selection UI in chat settings
  • Update environment configuration and documentation
  • Supports thought signatures (now required) for Gemini 3.0

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Multi-model support: backend model listing, per-request model selection, model shown on messages, settings UI to choose models, and model_id returned in responses.
    • Claude (Vertex AI) support with extended-thinking controls and runtime agent parameters (temperature, iterations, execution time, persona).
    • CLI/server options to configure Google/Vertex settings and external models config path.
  • Documentation

    • Expanded README with Vertex AI/Claude setup, extended-thinking guidance, and models.yaml examples.
  • Chores

    • Added dependencies and gitignore rule for models.yaml.

✏️ Tip: You can customize this high-level summary in your review settings.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repository is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. Review these jobs and use /test <job> to manually trigger optional jobs most likely to be impacted by the proposed changes.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 5, 2025

Walkthrough

Adds multi-model support via a YAML model registry, Google Vertex AI (Claude) integration with extended-thinking modes, async SippyAgent initialization, an AgentManager for per-model agents, per-request model selection/propagation (model_id), a backend /chat/models endpoint, and frontend model UI/state and attribution.

Changes

Cohort / File(s) Summary
Env / Examples
chat/.env.example, chat/.gitignore, chat/models.yaml.example
Consolidates LLM config into a unified Sippy AI Agent section; adds globals (TEMPERATURE, SIPPY_API_URL, MAX_ITERATIONS, MAX_EXECUTION_TIME, PERSONA), de-emphasizes SIPPY_READ_ONLY_DATABASE_DSN, ignores models.yaml, and provides chat/models.yaml.example schema/entries.
Docs
chat/README.md
Adds Vertex AI / Claude setup, auth options, “Claude Extended Thinking” guidance, and a “Multiple Model Configuration (Optional)” section with models.yaml examples and server/CLI usage.
CLI / Bootstrap
chat/main.py
Adds CLI options --google-project, --google-location, --thinking-budget, --models-config; applies overrides to Config and passes models_config_path to the server.
Config & Loader
chat/sippy_agent/config.py
Adds ModelConfig (per-model overrides + to_config()), extends Config with google_project_id, google_location, extended_thinking_budget, is_claude_model(), and load_models_config() to parse/validate YAML and determine default.
Agent & Manager
chat/sippy_agent/agent.py
Defers heavy setup via async _initialize(), extends _create_llm() to support ChatAnthropicVertex (Claude) including extended-thinking setup and streaming semantics, improves tool-loading warnings, and introduces AgentManager to load/cache per-model SippyAgent instances and expose listing/getter APIs.
API Models
chat/sippy_agent/api_models.py
Adds model_id to ChatRequest and ChatResponse; introduces ModelInfo and ModelsResponse for model listing and default selection.
Web Server / Routes
chat/sippy_agent/web_server.py
Replaces single-agent wiring with AgentManager, adds GET /chat/models, resolves per-request agent via model_id (fallback default), propagates model_id in HTTP/WebSocket/thinking flows, and accepts models_config_path in constructor.
Requirements
chat/requirements.txt
Adds dependencies: langchain-google-vertexai>=3.0.0, anthropic>=0.20.0, pyyaml>=6.0.0.
Frontend — Store
sippy-ng/src/chat/store/modelsSlice.js, sippy-ng/src/chat/store/settingsSlice.js, sippy-ng/src/chat/store/useChatStore.js
Adds modelsSlice (models, defaultModel, loading/error + loadModels() fetching /api/chat/models), exposes useModels, and adds modelId to settings (initial null).
Frontend — WebSocket
sippy-ng/src/chat/store/webSocketSlice.js
Includes model_id in outgoing user payload and assistant final message; moves adding the user message to history after WS send; retains per-message error handling.
Frontend — UI attribution
sippy-ng/src/chat/ChatMessage.js
Resolves modelName via useModels, shows tooltip "AI-generated by {modelName}" when available, replaces static "AI" chip with model name fallback, and adds model_id PropType.
Frontend — Settings UI
sippy-ng/src/chat/ChatSettings.js
Adds "AI Model" section using useModels, lazy-loads models, renders Select dropdown with descriptions/icons, updates settings.modelId, and handles loading/error states.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend
    participant ModelsStore as Frontend:Models Store
    participant WebServer as Backend:Web Server
    participant AgentManager as Backend:Agent Manager
    participant Agent as Backend:SippyAgent
    participant LLM as External:Vertex AI / Claude

    User->>Frontend: Open chat UI
    Frontend->>ModelsStore: loadModels()
    ModelsStore->>WebServer: GET /chat/models
    WebServer->>AgentManager: list_models()
    AgentManager-->>WebServer: {models, default_model}
    ModelsStore->>Frontend: update models state

    User->>Frontend: Select model + send message
    Frontend->>WebServer: POST /chat (model_id, message)
    WebServer->>AgentManager: get_agent(model_id)
    AgentManager->>Agent: return cached or create + Agent._initialize()
    Agent->>Agent: _create_llm() (Claude -> ChatAnthropicVertex)
    Agent->>LLM: send message (project/location, thinking budget)
    LLM->>Agent: stream thinking & content
    Agent->>WebServer: thinking_callback(model_id, partial_thought)
    Agent->>WebServer: final response(model_id, content)
    WebServer->>Frontend: ChatResponse(model_id, content)
    Frontend->>User: render message with model attribution
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Excessive Css In React Should Use Styles ⚠️ Warning PR introduces inline style with 5 properties in ChatMessage.js lines 460-466, exceeding recommended 3-4 property limit and inconsistent with codebase's useStyles pattern. Extract inline style into useStyles hook as new class (e.g., systemMessageLink) and replace with className={classes.systemMessageLink}.
Single Responsibility And Clear Naming ⚠️ Warning Pull request violates Single Responsibility principle: AgentManager uses generic suffix and Config class exceeds recommended field limit with 20+ fields spanning multiple domains. Rename AgentManager to MultiModelAgentProvider or LLMAgentFactory; refactor Config into focused sub-models: VertexAIConfig, AgentControlConfig, and ThinkingConfig.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding multi-model support with a configuration system, which aligns with the core objectives of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 90.32% which is sufficient. The required threshold is 80.00%.
Go Error Handling ✅ Passed This custom check assesses Go error handling patterns, but the pull request contains no Go code changes. All modifications are in Python, JavaScript/React, and configuration files.
Sql Injection Prevention ✅ Passed The PR does not introduce SQL injection vulnerabilities. It adds multi-model LLM support and UI enhancements without modifying the SippyDatabaseQueryTool or introducing new SQL query construction or user input handling in database contexts.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
chat/sippy_agent/web_server.py (2)

226-261: Avoid mutating shared agent config across concurrent requests

AgentManager caches a single SippyAgent per model, but both the HTTP and WebSocket paths mutate agent.config.show_thinking, agent.config.persona, and even rebuild agent.graph on that shared instance. With concurrent requests targeting the same model, those mutations race: persona or thinking overrides from request A can leak into request B before either finally block restores the original settings. A third request arriving in that window will run with the wrong persona/model behavior. Please isolate per-request overrides (e.g., clone the agent/config, add an agent-level async lock around these mutations, or extend SippyAgent.achat to accept override args so we never touch shared state).

Also applies to: 330-475


153-163: Return status from the actual default agent config

/status now resolves the default agent via AgentManager, but the response still reports model_name, endpoint, show_thinking, and persona from the base Config, which will be wrong whenever the default model comes from models.yaml. Surface the values from default_agent.config instead so the endpoint reflects the real model in use.

-            return AgentStatus(
-                available_tools=default_agent.list_tools(),
-                model_name=self.config.model_name,
-                endpoint=self.config.llm_endpoint,
-                thinking_enabled=self.config.show_thinking,
-                current_persona=self.config.persona,
+            agent_config = default_agent.config
+            return AgentStatus(
+                available_tools=default_agent.list_tools(),
+                model_name=agent_config.model_name,
+                endpoint=agent_config.llm_endpoint,
+                thinking_enabled=agent_config.show_thinking,
+                current_persona=agent_config.persona,
🧹 Nitpick comments (6)
chat/.gitignore (1)

142-145: LGTM!

Correctly ignores the instance-specific models.yaml configuration file while keeping the example template in version control.

Minor: Consider using a single blank line instead of two (lines 142-143) for consistency with the rest of the file, though this is purely stylistic.

chat/requirements.txt (1)

5-13: Dependencies align with multi-model support.

The three new dependencies correctly support Claude via Vertex AI and YAML-based configuration.

Consider tightening the anthropic>=0.20.0 constraint to anthropic>=0.20.0,<1.0.0 to avoid potential breaking changes in major version updates, as the Anthropic SDK has had breaking API changes in the past.

chat/.env.example (1)

1-92: Consider documenting the models.yaml approach in comments.

The restructured configuration is well-organized and clearly documents different model provider options. However, since this PR's main feature is the multi-model models.yaml configuration system, consider adding a comment at the top mentioning that:

  • These environment variables provide defaults/fallbacks when models.yaml isn't present
  • For multi-model support, users should create a models.yaml file (see models.yaml.example)

Additionally, line 66's SIPPY_API_URL defaults to production. Consider whether this should be commented out or point to localhost to avoid accidental production API calls during development.

sippy-ng/src/chat/store/webSocketSlice.js (1)

225-246: Review the UX implications of delayed message display.

The reordering makes sense for chat history management—the current message shouldn't be included in its own history context. However, this creates a delay where the user's message only appears in the UI after the network send completes.

Traditional chat UX typically shows the user's message immediately (optimistic UI) and displays an error if the send fails. The current approach may feel less responsive since users won't see their message until after the network round-trip.

Consider:

  1. Testing the perceived responsiveness with this change
  2. If the delay is noticeable, consider adding the message optimistically before send, then removing/marking it as failed if send fails
  3. The current approach does guarantee consistency between what the user sees and what was actually sent
sippy-ng/src/chat/store/modelsSlice.js (1)

12-49: Consider adding concurrent call protection.

The loadModels function correctly fetches models and sets the default, but doesn't protect against concurrent calls. If loadModels is called multiple times in quick succession (e.g., on reconnect or user navigation), multiple fetches could race and cause state inconsistencies.

Consider one of these approaches:

  1. Track in-flight requests and return early if already loading
  2. Abort previous requests using AbortController when a new request starts
  3. Ensure the UI only calls loadModels once on mount/initialization

Example with early return:

 loadModels: () => {
+  // Skip if already loading
+  if (get().modelsLoading) {
+    return
+  }
+
   const apiUrl =
     process.env.REACT_APP_CHAT_API_URL || window.location.origin + '/api/chat'
chat/sippy_agent/agent.py (1)

708-716: Drop the redundant f-string

description=f"Model from environment configuration" has no interpolation. Please remove the f prefix for clarity.

-                description=f"Model from environment configuration",
+                description="Model from environment configuration",
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between f78f207 and 1d2bc37.

📒 Files selected for processing (16)
  • chat/.env.example (1 hunks)
  • chat/.gitignore (1 hunks)
  • chat/README.md (2 hunks)
  • chat/main.py (3 hunks)
  • chat/models.yaml.example (1 hunks)
  • chat/requirements.txt (1 hunks)
  • chat/sippy_agent/agent.py (8 hunks)
  • chat/sippy_agent/api_models.py (3 hunks)
  • chat/sippy_agent/config.py (5 hunks)
  • chat/sippy_agent/web_server.py (10 hunks)
  • sippy-ng/src/chat/ChatMessage.js (5 hunks)
  • sippy-ng/src/chat/ChatSettings.js (6 hunks)
  • sippy-ng/src/chat/store/modelsSlice.js (1 hunks)
  • sippy-ng/src/chat/store/settingsSlice.js (1 hunks)
  • sippy-ng/src/chat/store/useChatStore.js (3 hunks)
  • sippy-ng/src/chat/store/webSocketSlice.js (4 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
chat/README.md

46-46: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.14.3)
chat/sippy_agent/agent.py

710-710: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (7)
chat/README.md (1)

46-162: Excellent documentation of new multi-model features.

The documentation comprehensively covers:

  • Claude/Vertex AI authentication options
  • Extended thinking feature with important caveats (temperature=1.0 requirement, regional availability)
  • Multi-model configuration via models.yaml
  • Clear usage examples

The explanations are clear and include important warnings about limitations and requirements.

chat/models.yaml.example (1)

16-41: Well-structured model configuration examples.

The configuration examples are clear and demonstrate key features:

  • Multiple models from different providers
  • Default model selection
  • Thinking-enabled variant with required temperature: 1.0
  • Optional fields (description, temperature, extended_thinking_budget)

Ensure that the configuration loading code (likely in chat/sippy_agent/config.py) validates that only one model has default: true. The YAML format itself doesn't enforce this constraint.

sippy-ng/src/chat/store/settingsSlice.js (1)

10-10: LGTM!

Clean addition of modelId to the settings state. The comment clearly indicates the initialization strategy, and this integrates well with the modelsSlice which loads and sets the default model.

sippy-ng/src/chat/store/webSocketSlice.js (2)

55-55: LGTM!

Adding model_id to the assistant message metadata enables proper model attribution in the UI, aligning with the backend's multi-model support.


234-234: LGTM!

Correctly includes model_id in the outgoing payload, enabling the backend to use the user-selected model.

sippy-ng/src/chat/store/modelsSlice.js (2)

13-15: Base URL construction handles edge cases correctly.

The chained .replace() calls properly handle both trailing slashes and the /stream suffix.


33-37: Good UX: auto-setting default model.

Automatically setting the default model when the user hasn't selected one provides a smooth initial experience.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
sippy-ng/src/chat/ChatSettings.js (1)

339-417: Extract duplicated model resolution logic.

The model ID resolution logic is duplicated between the Select value (lines 362-376) and the description rendering (lines 389-414). This violates the DRY principle and creates a maintenance risk if one copy is updated but not the other.

Extract the resolution logic into a helper function as suggested in the past review:

+  const resolveSelectedModelId = () => {
+    if (models.length === 0) {
+      return ''
+    }
+
+    if (settings.modelId && models.find((m) => m.id === settings.modelId)) {
+      return settings.modelId
+    }
+
+    if (defaultModel && models.find((m) => m.id === defaultModel)) {
+      return defaultModel
+    }
+
+    return models[0].id
+  }
+
+  const selectedModelId = resolveSelectedModelId()
+  const selectedModel = models.find((m) => m.id === selectedModelId) || null

Then update the render to use these computed values:

               <Select
                 labelId="model-select-label"
-                value={
-                  (() => {
-                    // Resolve the effective model ID before rendering
-                    if (
-                      settings.modelId &&
-                      models.find((m) => m.id === settings.modelId)
-                    ) {
-                      return settings.modelId
-                    }
-                    if (defaultModel && models.find((m) => m.id === defaultModel)) {
-                      return defaultModel
-                    }
-                    return models.length > 0 ? models[0].id : ''
-                  })()
-                }
+                value={selectedModelId}
                 onChange={handleModelChange}

             ...

-            {(() => {
-              // Resolve model ID and look up the selected model once
-              const resolvedModelId =
-                settings.modelId && models.find((m) => m.id === settings.modelId)
-                  ? settings.modelId
-                  : defaultModel && models.find((m) => m.id === defaultModel)
-                    ? defaultModel
-                    : models.length > 0
-                      ? models[0].id
-                      : null
-
-              const selectedModel = resolvedModelId
-                ? models.find((m) => m.id === resolvedModelId)
-                : null
-
-              return (
-                selectedModel &&
-                selectedModel.description && (
-                  <Box className={classes.personaDescription}>
-                    <Typography variant="body2" color="textPrimary">
-                      {selectedModel.description}
-                    </Typography>
-                  </Box>
-                )
-              )
-            })()}
+            {selectedModel?.description && (
+              <Box className={classes.personaDescription}>
+                <Typography variant="body2" color="textPrimary">
+                  {selectedModel.description}
+                </Typography>
+              </Box>
+            )}
🧹 Nitpick comments (2)
sippy-ng/src/chat/ChatSettings.js (2)

187-189: Remove unused helper or refactor to use it.

The getSelectedModel() function is defined but never used in the render. The Model Selection UI (lines 362-376 and 389-414) uses inline resolution logic instead.

Either remove this unused function or refactor the duplicated resolution logic to use a proper helper (see next comment).

-  const getSelectedModel = () => {
-    return models.find((m) => m.id === settings.modelId) || models[0]
-  }

358-358: Consider renaming the reused style class.

The model FormControl reuses classes.personaSelect, which works functionally but is slightly misleading. Consider renaming it to a more generic name like selectControl or adding a separate modelSelect class for clarity.

This is a minor naming issue and can be deferred, but it would improve code clarity.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1d2bc37 and eb36e37.

📒 Files selected for processing (3)
  • chat/README.md (3 hunks)
  • chat/sippy_agent/config.py (5 hunks)
  • sippy-ng/src/chat/ChatSettings.js (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • chat/sippy_agent/config.py
🔇 Additional comments (8)
chat/README.md (4)

46-46: Previous markdown issue resolved.

The MD036 violation flagged in the prior review (bold emphasis instead of heading) has been corrected. Line 46 now uses a proper level-4 heading (#### Optional: Claude Models via Google Vertex AI), which is appropriate as a sub-section within the level-3 "Configuration" section.


46-92: Comprehensive Claude/Vertex AI integration documentation.

The new section clearly documents setup requirements, authentication options (gcloud vs service account), extended thinking configuration, budget controls, and important caveats around temperature and regional availability. The examples with --thinking-budget and error-handling guidance are particularly helpful.


94-128: Well-structured models.yaml configuration documentation.

The section clearly explains the YAML configuration format, lists all relevant options (id, name, model_name, endpoint, temperature, extended_thinking_budget, default), and documents the fallback behavior when models.yaml is absent. The auto-loading behavior (line 128) is valuable context for users.


159-162: Clear example coverage for Claude/Vertex AI usage.

The new example at line 161 (python main.py serve --model claude-3-5-sonnet@20240620) complements the existing OpenAI and Gemini examples, making multi-model support discoverable to users.

sippy-ng/src/chat/ChatSettings.js (4)

27-27: LGTM - Imports follow existing patterns.

The new imports for ModelIcon and useModels are consistent with the existing persona implementation.

Also applies to: 40-40


114-115: LGTM - Consistent with persona loading pattern.

The useModels hook usage follows the same pattern as usePersonas, including loading state, error state, and load action.


135-139: LGTM - Correct on-demand loading.

The effect correctly loads models on demand using the same pattern as persona loading.


177-181: LGTM - Handler matches persona pattern.

The model change handler correctly updates settings.modelId, consistent with the persona handler.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
sippy-ng/src/chat/ChatSettings.js (1)

298-349: LGTM: AI Model UI is well-structured with proper state handling.

The model selection section correctly handles loading, error, and active states. The Select is properly bound to the resolved model ID, ensuring synchronization between the dropdown and description.

Optional: Consider simplifying the description rendering to match the persona pattern.

The IIFE pattern (lines 334-346) works correctly but is more verbose than necessary. For consistency with the persona section (line 389), consider this refactor:

-            {(() => {
-              const selectedModel = getSelectedModel()
-              return (
-                selectedModel &&
-                selectedModel.description && (
-                  <Box className={classes.personaDescription}>
-                    <Typography variant="body2" color="textPrimary">
-                      {selectedModel.description}
-                    </Typography>
-                  </Box>
-                )
-              )
-            })()}
+            {getSelectedModel()?.description && (
+              <Box className={classes.personaDescription}>
+                <Typography variant="body2" color="textPrimary">
+                  {getSelectedModel().description}
+                </Typography>
+              </Box>
+            )}

This matches the persona rendering style and eliminates the IIFE wrapper.

chat/sippy_agent/config.py (2)

28-42: Clarify the endpoint handling logic.

Line 34 uses if self.endpoint to check whether to override the endpoint. Since endpoint defaults to "" (empty string for Vertex AI), this condition is falsy for Vertex AI models, causing them to fall back to base_config.llm_endpoint. While this may be intentional (Vertex AI doesn't use OpenAI-compatible endpoints), the logic would be clearer with an explicit empty-string check.

Consider making the intent more explicit:

-        config_dict["llm_endpoint"] = self.endpoint if self.endpoint else base_config.llm_endpoint
+        config_dict["llm_endpoint"] = self.endpoint if self.endpoint != "" else base_config.llm_endpoint

Alternatively, document that empty string means "use base endpoint" or make endpoint Optional with None as the sentinel value.


174-225: Consider preserving exception context in error handling.

The function has excellent validation logic and structure. However, the broad except Exception as e catch on line 224 wraps all errors in ValueError, which can hide useful debugging information from different exception types (e.g., yaml.YAMLError, ValidationError).

Consider preserving the exception chain:

     except Exception as e:
-        raise ValueError(f"Error loading models configuration: {e}")
+        raise ValueError(f"Error loading models configuration: {e}") from e

This maintains the original traceback while still providing a clear high-level error message.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between eb36e37 and 0f86963.

📒 Files selected for processing (3)
  • chat/README.md (3 hunks)
  • chat/sippy_agent/config.py (5 hunks)
  • sippy-ng/src/chat/ChatSettings.js (8 hunks)
🔇 Additional comments (14)
chat/README.md (3)

36-93: Documentation update is well-structured and comprehensive.

The additions for Claude/Vertex AI setup, extended thinking configuration, and multi-model support are clearly organized, with options for both local development (gcloud auth) and production (service account). The past markdown linting issue has been properly resolved with heading-level formatting.


94-129: Multi-model configuration section clearly documents new feature.

The models.yaml schema documentation, backward-compatibility note (line 120), and auto-loading behavior (line 128) provide good context for users. The distinction between per-model configuration and shared environment variables is clear.


145-162: Usage examples cover key scenarios effectively.

Examples progress logically from OpenAI, Gemini, to Claude/Vertex AI models, maintaining consistent command-line syntax. The final example (lines 160–161) demonstrates multi-model support integration.

sippy-ng/src/chat/ChatSettings.js (6)

27-27: LGTM: Icon and hook imports are correct.

The ModelIcon and useModels additions properly support the new AI model selection feature.

Also applies to: 40-40


91-93: LGTM: Consistent styling applied.

The fullWidthSelect style properly ensures consistent width for both model and persona selectors.

Also applies to: 317-317, 372-372


114-115: LGTM: Model loading logic mirrors personas pattern.

The useModels hook integration and lazy-loading useEffect are implemented correctly and consistently with the existing personas approach.

Also applies to: 135-139


177-181: LGTM: Model change handler is correct.

The handleModelChange implementation properly updates settings when the user selects a different model.


187-202: LGTM: Fallback logic correctly resolves model selection.

The helper functions properly implement a three-tier fallback (user setting → default → first available) and handle all edge cases including empty model lists and stale settings. This addresses the previous review concern about Select/description synchronization.


351-351: LGTM: Section divider maintains visual consistency.

The divider properly separates the AI Model and AI Persona sections, consistent with the rest of the settings drawer.

chat/sippy_agent/config.py (5)

6-8: LGTM!

The new imports are appropriate for the multi-model configuration functionality.


16-27: LGTM!

The ModelConfig fields are well-defined. The temperature and extended_thinking_budget are correctly declared as Optional with default=None, which properly addresses the past review concern about clobbering base configuration defaults.


66-74: LGTM!

The new Config fields for Google/Vertex AI integration are well-defined with appropriate defaults and environment variable bindings.

Also applies to: 119-122


138-140: LGTM!

The is_claude_model() method follows the established pattern and correctly identifies Claude models.


157-164: LGTM!

The validation for Claude/Vertex AI configuration requirements is appropriate and provides clear error messaging.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e

stbenjam and others added 4 commits November 6, 2025 07:04
- Add support for multiple LLM models including ChatVertexAnthropic
- Implement models.yaml configuration system for model management
- Add model selection UI in chat settings
- Update environment configuration and documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- ChatSettings.js: Resolve effective model ID before rendering to prevent
  unbound Select value warnings and keep picker/description in sync
- config.py: Make ModelConfig temperature and extended_thinking_budget
  optional to avoid clobbering environment/CLI defaults in to_config
- README.md: Replace bold emphasis with proper Markdown headings for
  better document structure and accessibility
- Extract duplicated model resolution into getResolvedModelId() helper
- Update getSelectedModel() to use the resolution helper
- Rename personaSelect to fullWidthSelect for semantic clarity
- Move AI Model section above AI Persona (more important)

Addresses additional CodeRabbitAI feedback for DRY principle and
code clarity.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
chat/sippy_agent/web_server.py (1)

218-265: Shared agent config mutation is not request‑safe (pre‑existing, now per‑model)

chat() mutates agent.config.show_thinking and agent.config.persona per request, then restores them in finally. Because the same SippyAgent instance is reused across concurrent HTTP requests for a model, overlapping requests can race on these fields, leading to cross‑request leakage of persona or thinking settings (this pattern already existed with the single global agent; AgentManager now applies it per model). To make overrides request‑local, avoid mutating shared config and instead pass persona/show‑thinking as parameters into the graph/LLM layer, or gate access to a given agent behind an asyncio.Lock so only one request at a time can use it. The former keeps concurrency, the latter preserves current behavior but serializes usage.

🧹 Nitpick comments (4)
sippy-ng/src/chat/ChatSettings.js (2)

187-202: Consider memoizing model resolution helpers.

The helper functions getResolvedModelId() and getSelectedModel() are called inline during render (lines 321, 335), causing them to re-execute on every render. For better performance, consider computing the resolved model ID once outside the render path.

Example approach using useMemo:

+  const resolvedModelId = React.useMemo(() => {
+    if (settings.modelId && models.find((m) => m.id === settings.modelId)) {
+      return settings.modelId
+    }
+    if (defaultModel && models.find((m) => m.id === defaultModel)) {
+      return defaultModel
+    }
+    return models.length > 0 ? models[0].id : ''
+  }, [settings.modelId, defaultModel, models])
+
+  const selectedModel = React.useMemo(() => {
+    return resolvedModelId ? models.find((m) => m.id === resolvedModelId) : null
+  }, [resolvedModelId, models])

334-346: Simplify description rendering.

The IIFE pattern here is unconventional and adds cognitive overhead. Since getSelectedModel() already handles the null case, you can simplify this block.

-            {(() => {
-              const selectedModel = getSelectedModel()
-              return (
-                selectedModel &&
-                selectedModel.description && (
-                  <Box className={classes.personaDescription}>
-                    <Typography variant="body2" color="textPrimary">
-                      {selectedModel.description}
-                    </Typography>
-                  </Box>
-                )
-              )
-            })()}
+            {getSelectedModel()?.description && (
+              <Box className={classes.personaDescription}>
+                <Typography variant="body2" color="textPrimary">
+                  {getSelectedModel().description}
+                </Typography>
+              </Box>
+            )}

Note: If you implement the memoization suggestion from the previous comment, you would use the memoized selectedModel variable instead of calling getSelectedModel() multiple times.

chat/sippy_agent/web_server.py (1)

17-31: Multi‑model wiring and /chat/models endpoint look coherent

The switch to AgentManager plus the /chat/models endpoint is wired cleanly: models are listed via AgentManager.list_models(), wrapped into ModelInfo, and the default model ID is exposed via ModelsResponse. Using agent_manager.get_agent() in /status ensures tools are initialized before calling list_tools(). One minor design question: metrics.agent_info still uses config.model_name/config.llm_endpoint, which may not reflect the default model once models.yaml is in use. Consider either documenting that these metrics refer to the base config only, or updating them to reflect get_default_model_id()/its config if that’s what you care about operationally.

Also applies to: 88-112, 139-147, 153-163

chat/sippy_agent/agent.py (1)

40-47: Async initialization pattern looks good; list_tools guard prevents misuse

Deferring tools/graph creation into _initialize() and calling it from both AgentManager.get_agent() and SippyAgent.achat() gives you lazy startup while keeping callers safe. The new _initialized check plus the list_tools() guard (returning [] when uninitialized) avoids attribute errors on status calls. If you ever expect heavy parallel startup, you may want an asyncio.Lock around the _initialize body to avoid duplicate tool/graph construction, but that’s an optimization rather than a correctness issue.

Also applies to: 48-54, 670-675

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0f86963 and 80fe8b4.

📒 Files selected for processing (16)
  • chat/.env.example (1 hunks)
  • chat/.gitignore (1 hunks)
  • chat/README.md (3 hunks)
  • chat/main.py (3 hunks)
  • chat/models.yaml.example (1 hunks)
  • chat/requirements.txt (1 hunks)
  • chat/sippy_agent/agent.py (8 hunks)
  • chat/sippy_agent/api_models.py (3 hunks)
  • chat/sippy_agent/config.py (5 hunks)
  • chat/sippy_agent/web_server.py (10 hunks)
  • sippy-ng/src/chat/ChatMessage.js (5 hunks)
  • sippy-ng/src/chat/ChatSettings.js (8 hunks)
  • sippy-ng/src/chat/store/modelsSlice.js (1 hunks)
  • sippy-ng/src/chat/store/settingsSlice.js (1 hunks)
  • sippy-ng/src/chat/store/useChatStore.js (3 hunks)
  • sippy-ng/src/chat/store/webSocketSlice.js (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • sippy-ng/src/chat/store/modelsSlice.js
  • sippy-ng/src/chat/store/settingsSlice.js
  • sippy-ng/src/chat/store/webSocketSlice.js
  • sippy-ng/src/chat/ChatMessage.js
  • chat/.gitignore
  • chat/requirements.txt
🧰 Additional context used
🪛 Ruff (0.14.5)
chat/sippy_agent/agent.py

711-711: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (4)
chat/models.yaml.example (1)

34-46: No changes needed. Model version is current and temperature constraint is properly enforced.

The claude-sonnet-4-5@20250929 version is correct and current as of November 2025. The temperature constraint concern is mitigated by the agent initialization logic in chat/sippy_agent/agent.py (line 80-86): extended thinking is only enabled when show_thinking=True AND extended_thinking_budget > 0, and when enabled, temperature is automatically set to 1.0. This prevents runtime errors and ensures the API requirement is satisfied.

chat/sippy_agent/agent.py (3)

176-204: Tool setup change (logging when DB DSN missing) is reasonable

Emitting a warning when SIPPY_READ_ONLY_DATABASE_DSN is not configured makes the absence of the DB tool explicit without changing behavior when it is present. This is a nice observability improvement and should help diagnose missing configuration in lower envs.


485-555: ---

Review comment contains incorrect assumptions about thinking content stream format

The original review assumes thinking content arrives as chunk.content parts with {"type": "thinking", "thinking": "..."} for both Gemini and Claude. This is not accurate per current LangChain documentation:

  • Gemini: Streams thinking via parts marked with a part.thought boolean flag, not a type: "thinking" key with a separate thinking field
  • Claude-on-Vertex (extended thinking): Uses reasoning_content blocks, not type: "thinking" with a thinking field

The code at lines 485–555 checks part.get("type") == "thinking" and "thinking" in part, which won't match either model's actual format. This logic requires correction to extract thinking content successfully for these models.

Likely an incorrect or invalid review comment.


55-116: Concern unfounded: ChatAnthropicVertex supports per-call thinking overrides

The review comment's concern about extended thinking being "fixed for the lifetime of the agent" is not accurate. ChatAnthropicVertex exposes thinking options (thinking_budget, include_thoughts, etc.) that can be passed per-invocation via kwargs/invoke/predict/predict_messages. This means per-request control over extended thinking is available without needing separate agents or LLMs. The thinking configuration set at construction time serves as a default, but individual calls can override these settings, making the suggested workarounds unnecessary.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
chat/sippy_agent/agent.py (1)

725-737: Fix Ruff F541 on synthetic model description; optional concurrency hardening in get_agent

The synthetic ModelConfig uses an f-string without interpolation, which Ruff flags as F541. You can resolve this by dropping the f prefix as previously suggested in the earlier review. This also keeps CI green. As this repeats an earlier comment, tagging as duplicate.

-            synthetic_model = ModelConfig(
-                id="default",
-                name=base_config.model_name,
-                description=f"Model from environment configuration",
-                model_name=base_config.model_name,
-                endpoint=base_config.llm_endpoint,
-                temperature=base_config.temperature,
-                extended_thinking_budget=base_config.extended_thinking_budget,
-                default=True,
-            )
+            synthetic_model = ModelConfig(
+                id="default",
+                name=base_config.model_name,
+                description="Model from environment configuration",
+                model_name=base_config.model_name,
+                endpoint=base_config.llm_endpoint,
+                temperature=base_config.temperature,
+                extended_thinking_budget=base_config.extended_thinking_budget,
+                default=True,
+            )

Separately, AgentManager.get_agent() is straightforward and readable, but if you expect high concurrency it might be worth guarding agent creation with a simple async lock or similar so two concurrent callers for the same model_id don’t both pay the initialization cost before the cache is populated. Not required for correctness, just an efficiency/robustness improvement. (As per static analysis hints and past review comments.)

Also applies to: 756-793

🧹 Nitpick comments (3)
chat/sippy_agent/agent.py (2)

40-55: Async lazy initialization looks good; consider guarding add_tool usage

The _initialized flag and _initialize() give you a clean lazy-init path for tools/graph, and list_tools() now safely returns [] when uninitialized. The only remaining sharp edge is add_tool(), which assumes self.tools is non‑None; if someone calls it before the first achat()/get_agent(), it will hit a NoneType append. Consider either documenting that add_tool requires a fully initialized agent or adding a defensive check there (e.g., raising a clear error if _initialized is False), so misuse fails loudly rather than with an attribute error.

Also applies to: 678-691


494-571: Thought streaming logic cleanly separates Gemini vs Claude; small optional guard

The updated _achat_streaming() correctly distinguishes Gemini’s complete-thought chunks from Claude’s token stream, buffers thoughts, and then injects each as a separate thinking_steps entry at the front, preserving chronological order. As a micro‑optimization, you could early‑exit the “thinking” parsing paths when both thinking_callback is None and show_thinking is False, since in that case the accumulated thought_buffer is never surfaced. Not a blocker, just a small potential perf/clarity win.

Also applies to: 661-668

chat/requirements.txt (1)

1-8: I need to verify the LangChain ecosystem's versioning practices and compatibility recommendations to determine if the upper-bound constraint concern is justified.

Upper-bound version constraints are optional best practice, not required.

LangChain and LangGraph follow Semantic Versioning principles, with the first stable releases (1.0.0) providing production-ready APIs. These 1.0 releases mark a commitment to stability and no breaking changes until 2.0. The pip dependency resolution test confirmed no conflicts among the specified versions.

While adding upper-bound constraints (e.g., langchain>=1.0.0,<2.0.0) would provide ultra-conservative stability and prevent unexpected major-version jumps, they are not required for safe operation. The current lower-bound-only approach is defensible given that:

  1. With LangChain 1.0, minor releases do not include breaking changes
  2. All core packages (langchain, langchain-core, langchain-openai, langgraph) are at 1.0+ with semantic versioning guarantees
  3. Legacy versions (LangChain 0.3 and LangGraph 0.4) are in MAINTENANCE mode until December 2026

Adding upper bounds is a valid optimization for reproducibility and risk-averse deployments but remains optional given the ecosystem's versioning commitments. Consider upper bounds only if: (a) stricter repeatability is required, or (b) you want to enforce explicit major-version control across your team's deployments.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 80fe8b4 and 7dbf01a.

📒 Files selected for processing (2)
  • chat/requirements.txt (1 hunks)
  • chat/sippy_agent/agent.py (8 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
chat/sippy_agent/agent.py

728-728: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (3)
chat/sippy_agent/agent.py (2)

197-203: DB tool warning improves observability

Adding an explicit warning when SIPPY_READ_ONLY_DATABASE_DSN is not configured is a nice touch; it makes it much clearer why the DB query tool is missing instead of silently omitting it.


457-459: Ensuring _initialize() runs before handling chats is the right safeguard

Awaiting self._initialize() at the top of achat() guarantees tools and graph are ready before streaming, and because _initialize() is idempotent, repeated calls are cheap. This aligns well with the new lazy-init pattern.

chat/requirements.txt (1)

5-5: Let me check the pyyaml security status to complete the verification:

Dependency versions are current, stable, and secure.

PyYAML 6.0.3 is the latest non-vulnerable version, fixing vulnerabilities that existed before version 5.4. Anthropic 0.20.0 has no known vulnerabilities, and the latest stable anthropic release is 0.73.0. PyYAML demonstrates a positive version release cadence with recent releases. All three dependencies use permissive >= version constraints, allowing automatic updates to the latest secure releases. The specified versions are current as of the PR's timeline and free from known security vulnerabilities.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e

@stbenjam
Copy link
Member Author

/hold

I think its reviewable/testable but looking at one of the coderabbit comments about a race

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2025
@stbenjam
Copy link
Member Author

stbenjam commented Dec 4, 2025

/test e2e

Copy link
Member

@smg247 smg247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 4, 2025
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 8, 2026
@stbenjam
Copy link
Member Author

stbenjam commented Jan 8, 2026

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 8, 2026
@smg247
Copy link
Member

smg247 commented Jan 8, 2026

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 8, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smg247, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 8, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @chat/sippy_agent/agent.py:
- Around line 764-778: The description string for the synthetic ModelConfig uses
an unnecessary f-string: change description=f"Model from environment
configuration" to a plain string description="Model from environment
configuration" in the synthetic_model creation (see ModelConfig instantiation
assigning to self.models["default"] and variable synthetic_model), leaving all
other fields (name=model_name, endpoint=llm_endpoint, temperature,
extended_thinking_budget, default=True) unchanged.
🧹 Nitpick comments (1)
chat/sippy_agent/config.py (1)

174-232: Robust model configuration loading with comprehensive validation.

The load_models_config() function properly validates:

  • Required models key presence
  • Duplicate model IDs
  • Multiple default models
  • Empty model list

The fallback to the first model as default when none is specified is sensible.

One consideration: the broad except Exception on line 231 catches all exceptions (including the ValueError ones raised above) and re-wraps them. This works but may double-wrap the message for intentional ValueError raises. You may want to catch only non-ValueError exceptions:

♻️ Optional: preserve ValueError messages
-    except Exception as e:
-        raise ValueError(f"Error loading models configuration: {e}")
+    except ValueError:
+        raise  # Re-raise validation errors as-is
+    except Exception as e:
+        raise ValueError(f"Error loading models configuration: {e}") from e
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 7dbf01a and c6d92d4.

📒 Files selected for processing (5)
  • chat/README.md
  • chat/requirements.txt
  • chat/sippy_agent/agent.py
  • chat/sippy_agent/config.py
  • chat/sippy_agent/web_server.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • chat/requirements.txt
🧰 Additional context used
🧬 Code graph analysis (1)
chat/sippy_agent/agent.py (2)
chat/sippy_agent/config.py (6)
  • Config (45-171)
  • ModelConfig (16-42)
  • load_models_config (174-232)
  • is_claude_model (138-140)
  • is_gemini_model (134-136)
  • to_config (28-42)
chat/sippy_agent/personas.py (1)
  • get_persona (82-84)
🪛 Ruff (0.14.10)
chat/sippy_agent/agent.py

767-767: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (11)
chat/README.md (2)

46-93: Well-structured documentation for Claude/Vertex AI integration.

The Claude configuration section clearly covers both authentication options (gcloud auth vs service account), extended thinking feature with budget controls, and important notes about temperature requirements. The troubleshooting guidance for 400 errors is a nice touch.


94-129: Comprehensive multi-model configuration documentation.

The models.yaml configuration section clearly documents all available fields, fallback behavior, and startup instructions. This aligns well with the implementation in config.py.

chat/sippy_agent/web_server.py (3)

139-146: Clean implementation of the models listing endpoint.

The /chat/models endpoint properly exposes available models and the default model ID. The use of ModelInfo for type safety is good.


220-247: Per-request model handling is well implemented.

The HTTP /chat endpoint correctly:

  1. Retrieves the appropriate agent via agent_manager.get_agent(request.model_id)
  2. Determines the actual model_id used (falling back to default)
  3. Passes per-request persona and show_thinking as parameters without mutating shared config
  4. Includes model_id in the response

290-310: Good error handling for agent initialization in WebSocket handler.

The try/except around agent_manager.get_agent() properly handles initialization failures and sends an error message to the client via the WebSocket, allowing the connection to continue for subsequent requests.

chat/sippy_agent/config.py (1)

16-42: Well-designed ModelConfig with proper inheritance pattern.

The to_config() method correctly inherits base settings and only overrides temperature and extended_thinking_budget when explicitly set (not None). This prevents unintended overrides with default values.

chat/sippy_agent/agent.py (5)

44-53: Good async initialization pattern.

The deferred initialization with _initialize() and _initialized flag allows the constructor to remain synchronous while deferring heavy tool loading and graph creation to the first use. This is a clean pattern for async initialization.


78-115: Claude extended thinking configuration is well implemented.

The extended thinking setup correctly:

  • Enables only when show_thinking=True AND extended_thinking_budget > 0
  • Sets temperature to 1.0 (required by Claude API)
  • Sets max_tokens to 64000 (Claude's max output)
  • Configures the thinking budget via model_kwargs

The verbose logging provides good visibility into the configuration.


526-540: Per-request graph creation addresses the race condition concern.

Creating request_graph = self._create_agent_graph(effective_persona) per request instead of mutating self.graph is the correct approach for thread safety. This ensures each request operates on its own graph instance with the appropriate persona, avoiding the race condition mentioned in the PR comments.


795-832: AgentManager correctly caches agents per model.

The get_agent() method properly:

  1. Falls back to default model when model_id is None
  2. Validates that the requested model exists
  3. Returns cached agents for efficiency
  4. Creates and initializes new agents on demand

The async initialization via await agent._initialize() ensures tools are loaded before returning.


726-730: Defensive guard for uninitialized agent.

The list_tools() method now safely returns an empty list if the agent hasn't been initialized, preventing potential NoneType errors.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 8, 2026

@stbenjam: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit f39785e into openshift:main Jan 8, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants