-
Notifications
You must be signed in to change notification settings - Fork 97
Add multi-model support with configuration system #3087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pipeline controller notification For optional jobs, comment |
WalkthroughAdds multi-model support via a YAML model registry, Google Vertex AI (Claude) integration with extended-thinking modes, async SippyAgent initialization, an AgentManager for per-model agents, per-request model selection/propagation (model_id), a backend /chat/models endpoint, and frontend model UI/state and attribution. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend
participant ModelsStore as Frontend:Models Store
participant WebServer as Backend:Web Server
participant AgentManager as Backend:Agent Manager
participant Agent as Backend:SippyAgent
participant LLM as External:Vertex AI / Claude
User->>Frontend: Open chat UI
Frontend->>ModelsStore: loadModels()
ModelsStore->>WebServer: GET /chat/models
WebServer->>AgentManager: list_models()
AgentManager-->>WebServer: {models, default_model}
ModelsStore->>Frontend: update models state
User->>Frontend: Select model + send message
Frontend->>WebServer: POST /chat (model_id, message)
WebServer->>AgentManager: get_agent(model_id)
AgentManager->>Agent: return cached or create + Agent._initialize()
Agent->>Agent: _create_llm() (Claude -> ChatAnthropicVertex)
Agent->>LLM: send message (project/location, thinking budget)
LLM->>Agent: stream thinking & content
Agent->>WebServer: thinking_callback(model_id, partial_thought)
Agent->>WebServer: final response(model_id, content)
WebServer->>Frontend: ChatResponse(model_id, content)
Frontend->>User: render message with model attribution
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
chat/sippy_agent/web_server.py (2)
226-261: Avoid mutating shared agent config across concurrent requests
AgentManagercaches a singleSippyAgentper model, but both the HTTP and WebSocket paths mutateagent.config.show_thinking,agent.config.persona, and even rebuildagent.graphon that shared instance. With concurrent requests targeting the same model, those mutations race: persona or thinking overrides from request A can leak into request B before eitherfinallyblock restores the original settings. A third request arriving in that window will run with the wrong persona/model behavior. Please isolate per-request overrides (e.g., clone the agent/config, add an agent-level async lock around these mutations, or extendSippyAgent.achatto accept override args so we never touch shared state).Also applies to: 330-475
153-163: Return status from the actual default agent config
/statusnow resolves the default agent viaAgentManager, but the response still reportsmodel_name,endpoint,show_thinking, andpersonafrom the baseConfig, which will be wrong whenever the default model comes frommodels.yaml. Surface the values fromdefault_agent.configinstead so the endpoint reflects the real model in use.- return AgentStatus( - available_tools=default_agent.list_tools(), - model_name=self.config.model_name, - endpoint=self.config.llm_endpoint, - thinking_enabled=self.config.show_thinking, - current_persona=self.config.persona, + agent_config = default_agent.config + return AgentStatus( + available_tools=default_agent.list_tools(), + model_name=agent_config.model_name, + endpoint=agent_config.llm_endpoint, + thinking_enabled=agent_config.show_thinking, + current_persona=agent_config.persona,
🧹 Nitpick comments (6)
chat/.gitignore (1)
142-145: LGTM!Correctly ignores the instance-specific
models.yamlconfiguration file while keeping the example template in version control.Minor: Consider using a single blank line instead of two (lines 142-143) for consistency with the rest of the file, though this is purely stylistic.
chat/requirements.txt (1)
5-13: Dependencies align with multi-model support.The three new dependencies correctly support Claude via Vertex AI and YAML-based configuration.
Consider tightening the
anthropic>=0.20.0constraint toanthropic>=0.20.0,<1.0.0to avoid potential breaking changes in major version updates, as the Anthropic SDK has had breaking API changes in the past.chat/.env.example (1)
1-92: Consider documenting the models.yaml approach in comments.The restructured configuration is well-organized and clearly documents different model provider options. However, since this PR's main feature is the multi-model
models.yamlconfiguration system, consider adding a comment at the top mentioning that:
- These environment variables provide defaults/fallbacks when
models.yamlisn't present- For multi-model support, users should create a
models.yamlfile (seemodels.yaml.example)Additionally, line 66's
SIPPY_API_URLdefaults to production. Consider whether this should be commented out or point to localhost to avoid accidental production API calls during development.sippy-ng/src/chat/store/webSocketSlice.js (1)
225-246: Review the UX implications of delayed message display.The reordering makes sense for chat history management—the current message shouldn't be included in its own history context. However, this creates a delay where the user's message only appears in the UI after the network send completes.
Traditional chat UX typically shows the user's message immediately (optimistic UI) and displays an error if the send fails. The current approach may feel less responsive since users won't see their message until after the network round-trip.
Consider:
- Testing the perceived responsiveness with this change
- If the delay is noticeable, consider adding the message optimistically before send, then removing/marking it as failed if send fails
- The current approach does guarantee consistency between what the user sees and what was actually sent
sippy-ng/src/chat/store/modelsSlice.js (1)
12-49: Consider adding concurrent call protection.The
loadModelsfunction correctly fetches models and sets the default, but doesn't protect against concurrent calls. IfloadModelsis called multiple times in quick succession (e.g., on reconnect or user navigation), multiple fetches could race and cause state inconsistencies.Consider one of these approaches:
- Track in-flight requests and return early if already loading
- Abort previous requests using
AbortControllerwhen a new request starts- Ensure the UI only calls
loadModelsonce on mount/initializationExample with early return:
loadModels: () => { + // Skip if already loading + if (get().modelsLoading) { + return + } + const apiUrl = process.env.REACT_APP_CHAT_API_URL || window.location.origin + '/api/chat'chat/sippy_agent/agent.py (1)
708-716: Drop the redundant f-string
description=f"Model from environment configuration"has no interpolation. Please remove thefprefix for clarity.- description=f"Model from environment configuration", + description="Model from environment configuration",
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (16)
chat/.env.example(1 hunks)chat/.gitignore(1 hunks)chat/README.md(2 hunks)chat/main.py(3 hunks)chat/models.yaml.example(1 hunks)chat/requirements.txt(1 hunks)chat/sippy_agent/agent.py(8 hunks)chat/sippy_agent/api_models.py(3 hunks)chat/sippy_agent/config.py(5 hunks)chat/sippy_agent/web_server.py(10 hunks)sippy-ng/src/chat/ChatMessage.js(5 hunks)sippy-ng/src/chat/ChatSettings.js(6 hunks)sippy-ng/src/chat/store/modelsSlice.js(1 hunks)sippy-ng/src/chat/store/settingsSlice.js(1 hunks)sippy-ng/src/chat/store/useChatStore.js(3 hunks)sippy-ng/src/chat/store/webSocketSlice.js(4 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
chat/README.md
46-46: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.3)
chat/sippy_agent/agent.py
710-710: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (7)
chat/README.md (1)
46-162: Excellent documentation of new multi-model features.The documentation comprehensively covers:
- Claude/Vertex AI authentication options
- Extended thinking feature with important caveats (temperature=1.0 requirement, regional availability)
- Multi-model configuration via
models.yaml- Clear usage examples
The explanations are clear and include important warnings about limitations and requirements.
chat/models.yaml.example (1)
16-41: Well-structured model configuration examples.The configuration examples are clear and demonstrate key features:
- Multiple models from different providers
- Default model selection
- Thinking-enabled variant with required
temperature: 1.0- Optional fields (description, temperature, extended_thinking_budget)
Ensure that the configuration loading code (likely in
chat/sippy_agent/config.py) validates that only one model hasdefault: true. The YAML format itself doesn't enforce this constraint.sippy-ng/src/chat/store/settingsSlice.js (1)
10-10: LGTM!Clean addition of
modelIdto the settings state. The comment clearly indicates the initialization strategy, and this integrates well with themodelsSlicewhich loads and sets the default model.sippy-ng/src/chat/store/webSocketSlice.js (2)
55-55: LGTM!Adding
model_idto the assistant message metadata enables proper model attribution in the UI, aligning with the backend's multi-model support.
234-234: LGTM!Correctly includes
model_idin the outgoing payload, enabling the backend to use the user-selected model.sippy-ng/src/chat/store/modelsSlice.js (2)
13-15: Base URL construction handles edge cases correctly.The chained
.replace()calls properly handle both trailing slashes and the/streamsuffix.
33-37: Good UX: auto-setting default model.Automatically setting the default model when the user hasn't selected one provides a smooth initial experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
sippy-ng/src/chat/ChatSettings.js (1)
339-417: Extract duplicated model resolution logic.The model ID resolution logic is duplicated between the Select value (lines 362-376) and the description rendering (lines 389-414). This violates the DRY principle and creates a maintenance risk if one copy is updated but not the other.
Extract the resolution logic into a helper function as suggested in the past review:
+ const resolveSelectedModelId = () => { + if (models.length === 0) { + return '' + } + + if (settings.modelId && models.find((m) => m.id === settings.modelId)) { + return settings.modelId + } + + if (defaultModel && models.find((m) => m.id === defaultModel)) { + return defaultModel + } + + return models[0].id + } + + const selectedModelId = resolveSelectedModelId() + const selectedModel = models.find((m) => m.id === selectedModelId) || nullThen update the render to use these computed values:
<Select labelId="model-select-label" - value={ - (() => { - // Resolve the effective model ID before rendering - if ( - settings.modelId && - models.find((m) => m.id === settings.modelId) - ) { - return settings.modelId - } - if (defaultModel && models.find((m) => m.id === defaultModel)) { - return defaultModel - } - return models.length > 0 ? models[0].id : '' - })() - } + value={selectedModelId} onChange={handleModelChange} ... - {(() => { - // Resolve model ID and look up the selected model once - const resolvedModelId = - settings.modelId && models.find((m) => m.id === settings.modelId) - ? settings.modelId - : defaultModel && models.find((m) => m.id === defaultModel) - ? defaultModel - : models.length > 0 - ? models[0].id - : null - - const selectedModel = resolvedModelId - ? models.find((m) => m.id === resolvedModelId) - : null - - return ( - selectedModel && - selectedModel.description && ( - <Box className={classes.personaDescription}> - <Typography variant="body2" color="textPrimary"> - {selectedModel.description} - </Typography> - </Box> - ) - ) - })()} + {selectedModel?.description && ( + <Box className={classes.personaDescription}> + <Typography variant="body2" color="textPrimary"> + {selectedModel.description} + </Typography> + </Box> + )}
🧹 Nitpick comments (2)
sippy-ng/src/chat/ChatSettings.js (2)
187-189: Remove unused helper or refactor to use it.The
getSelectedModel()function is defined but never used in the render. The Model Selection UI (lines 362-376 and 389-414) uses inline resolution logic instead.Either remove this unused function or refactor the duplicated resolution logic to use a proper helper (see next comment).
- const getSelectedModel = () => { - return models.find((m) => m.id === settings.modelId) || models[0] - }
358-358: Consider renaming the reused style class.The model FormControl reuses
classes.personaSelect, which works functionally but is slightly misleading. Consider renaming it to a more generic name likeselectControlor adding a separatemodelSelectclass for clarity.This is a minor naming issue and can be deferred, but it would improve code clarity.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (3)
chat/README.md(3 hunks)chat/sippy_agent/config.py(5 hunks)sippy-ng/src/chat/ChatSettings.js(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- chat/sippy_agent/config.py
🔇 Additional comments (8)
chat/README.md (4)
46-46: Previous markdown issue resolved.The MD036 violation flagged in the prior review (bold emphasis instead of heading) has been corrected. Line 46 now uses a proper level-4 heading (
#### Optional: Claude Models via Google Vertex AI), which is appropriate as a sub-section within the level-3 "Configuration" section.
46-92: Comprehensive Claude/Vertex AI integration documentation.The new section clearly documents setup requirements, authentication options (gcloud vs service account), extended thinking configuration, budget controls, and important caveats around temperature and regional availability. The examples with
--thinking-budgetand error-handling guidance are particularly helpful.
94-128: Well-structured models.yaml configuration documentation.The section clearly explains the YAML configuration format, lists all relevant options (id, name, model_name, endpoint, temperature, extended_thinking_budget, default), and documents the fallback behavior when models.yaml is absent. The auto-loading behavior (line 128) is valuable context for users.
159-162: Clear example coverage for Claude/Vertex AI usage.The new example at line 161 (
python main.py serve --model claude-3-5-sonnet@20240620) complements the existing OpenAI and Gemini examples, making multi-model support discoverable to users.sippy-ng/src/chat/ChatSettings.js (4)
27-27: LGTM - Imports follow existing patterns.The new imports for
ModelIconanduseModelsare consistent with the existing persona implementation.Also applies to: 40-40
114-115: LGTM - Consistent with persona loading pattern.The
useModelshook usage follows the same pattern asusePersonas, including loading state, error state, and load action.
135-139: LGTM - Correct on-demand loading.The effect correctly loads models on demand using the same pattern as persona loading.
177-181: LGTM - Handler matches persona pattern.The model change handler correctly updates
settings.modelId, consistent with the persona handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
sippy-ng/src/chat/ChatSettings.js (1)
298-349: LGTM: AI Model UI is well-structured with proper state handling.The model selection section correctly handles loading, error, and active states. The Select is properly bound to the resolved model ID, ensuring synchronization between the dropdown and description.
Optional: Consider simplifying the description rendering to match the persona pattern.
The IIFE pattern (lines 334-346) works correctly but is more verbose than necessary. For consistency with the persona section (line 389), consider this refactor:
- {(() => { - const selectedModel = getSelectedModel() - return ( - selectedModel && - selectedModel.description && ( - <Box className={classes.personaDescription}> - <Typography variant="body2" color="textPrimary"> - {selectedModel.description} - </Typography> - </Box> - ) - ) - })()} + {getSelectedModel()?.description && ( + <Box className={classes.personaDescription}> + <Typography variant="body2" color="textPrimary"> + {getSelectedModel().description} + </Typography> + </Box> + )}This matches the persona rendering style and eliminates the IIFE wrapper.
chat/sippy_agent/config.py (2)
28-42: Clarify the endpoint handling logic.Line 34 uses
if self.endpointto check whether to override the endpoint. Sinceendpointdefaults to""(empty string for Vertex AI), this condition is falsy for Vertex AI models, causing them to fall back tobase_config.llm_endpoint. While this may be intentional (Vertex AI doesn't use OpenAI-compatible endpoints), the logic would be clearer with an explicit empty-string check.Consider making the intent more explicit:
- config_dict["llm_endpoint"] = self.endpoint if self.endpoint else base_config.llm_endpoint + config_dict["llm_endpoint"] = self.endpoint if self.endpoint != "" else base_config.llm_endpointAlternatively, document that empty string means "use base endpoint" or make endpoint Optional with None as the sentinel value.
174-225: Consider preserving exception context in error handling.The function has excellent validation logic and structure. However, the broad
except Exception as ecatch on line 224 wraps all errors inValueError, which can hide useful debugging information from different exception types (e.g.,yaml.YAMLError,ValidationError).Consider preserving the exception chain:
except Exception as e: - raise ValueError(f"Error loading models configuration: {e}") + raise ValueError(f"Error loading models configuration: {e}") from eThis maintains the original traceback while still providing a clear high-level error message.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (3)
chat/README.md(3 hunks)chat/sippy_agent/config.py(5 hunks)sippy-ng/src/chat/ChatSettings.js(8 hunks)
🔇 Additional comments (14)
chat/README.md (3)
36-93: Documentation update is well-structured and comprehensive.The additions for Claude/Vertex AI setup, extended thinking configuration, and multi-model support are clearly organized, with options for both local development (
gcloud auth) and production (service account). The past markdown linting issue has been properly resolved with heading-level formatting.
94-129: Multi-model configuration section clearly documents new feature.The
models.yamlschema documentation, backward-compatibility note (line 120), and auto-loading behavior (line 128) provide good context for users. The distinction between per-model configuration and shared environment variables is clear.
145-162: Usage examples cover key scenarios effectively.Examples progress logically from OpenAI, Gemini, to Claude/Vertex AI models, maintaining consistent command-line syntax. The final example (lines 160–161) demonstrates multi-model support integration.
sippy-ng/src/chat/ChatSettings.js (6)
27-27: LGTM: Icon and hook imports are correct.The ModelIcon and useModels additions properly support the new AI model selection feature.
Also applies to: 40-40
91-93: LGTM: Consistent styling applied.The fullWidthSelect style properly ensures consistent width for both model and persona selectors.
Also applies to: 317-317, 372-372
114-115: LGTM: Model loading logic mirrors personas pattern.The useModels hook integration and lazy-loading useEffect are implemented correctly and consistently with the existing personas approach.
Also applies to: 135-139
177-181: LGTM: Model change handler is correct.The handleModelChange implementation properly updates settings when the user selects a different model.
187-202: LGTM: Fallback logic correctly resolves model selection.The helper functions properly implement a three-tier fallback (user setting → default → first available) and handle all edge cases including empty model lists and stale settings. This addresses the previous review concern about Select/description synchronization.
351-351: LGTM: Section divider maintains visual consistency.The divider properly separates the AI Model and AI Persona sections, consistent with the rest of the settings drawer.
chat/sippy_agent/config.py (5)
6-8: LGTM!The new imports are appropriate for the multi-model configuration functionality.
16-27: LGTM!The
ModelConfigfields are well-defined. Thetemperatureandextended_thinking_budgetare correctly declared asOptionalwithdefault=None, which properly addresses the past review concern about clobbering base configuration defaults.
66-74: LGTM!The new Config fields for Google/Vertex AI integration are well-defined with appropriate defaults and environment variable bindings.
Also applies to: 119-122
138-140: LGTM!The
is_claude_model()method follows the established pattern and correctly identifies Claude models.
157-164: LGTM!The validation for Claude/Vertex AI configuration requirements is appropriate and provides clear error messaging.
|
Scheduling required tests: |
- Add support for multiple LLM models including ChatVertexAnthropic - Implement models.yaml configuration system for model management - Add model selection UI in chat settings - Update environment configuration and documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- ChatSettings.js: Resolve effective model ID before rendering to prevent unbound Select value warnings and keep picker/description in sync - config.py: Make ModelConfig temperature and extended_thinking_budget optional to avoid clobbering environment/CLI defaults in to_config - README.md: Replace bold emphasis with proper Markdown headings for better document structure and accessibility - Extract duplicated model resolution into getResolvedModelId() helper - Update getSelectedModel() to use the resolution helper - Rename personaSelect to fullWidthSelect for semantic clarity - Move AI Model section above AI Persona (more important) Addresses additional CodeRabbitAI feedback for DRY principle and code clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
chat/sippy_agent/web_server.py (1)
218-265: Shared agent config mutation is not request‑safe (pre‑existing, now per‑model)
chat()mutatesagent.config.show_thinkingandagent.config.personaper request, then restores them infinally. Because the sameSippyAgentinstance is reused across concurrent HTTP requests for a model, overlapping requests can race on these fields, leading to cross‑request leakage of persona or thinking settings (this pattern already existed with the single global agent;AgentManagernow applies it per model). To make overrides request‑local, avoid mutating shared config and instead pass persona/show‑thinking as parameters into the graph/LLM layer, or gate access to a given agent behind anasyncio.Lockso only one request at a time can use it. The former keeps concurrency, the latter preserves current behavior but serializes usage.
🧹 Nitpick comments (4)
sippy-ng/src/chat/ChatSettings.js (2)
187-202: Consider memoizing model resolution helpers.The helper functions
getResolvedModelId()andgetSelectedModel()are called inline during render (lines 321, 335), causing them to re-execute on every render. For better performance, consider computing the resolved model ID once outside the render path.Example approach using useMemo:
+ const resolvedModelId = React.useMemo(() => { + if (settings.modelId && models.find((m) => m.id === settings.modelId)) { + return settings.modelId + } + if (defaultModel && models.find((m) => m.id === defaultModel)) { + return defaultModel + } + return models.length > 0 ? models[0].id : '' + }, [settings.modelId, defaultModel, models]) + + const selectedModel = React.useMemo(() => { + return resolvedModelId ? models.find((m) => m.id === resolvedModelId) : null + }, [resolvedModelId, models])
334-346: Simplify description rendering.The IIFE pattern here is unconventional and adds cognitive overhead. Since
getSelectedModel()already handles the null case, you can simplify this block.- {(() => { - const selectedModel = getSelectedModel() - return ( - selectedModel && - selectedModel.description && ( - <Box className={classes.personaDescription}> - <Typography variant="body2" color="textPrimary"> - {selectedModel.description} - </Typography> - </Box> - ) - ) - })()} + {getSelectedModel()?.description && ( + <Box className={classes.personaDescription}> + <Typography variant="body2" color="textPrimary"> + {getSelectedModel().description} + </Typography> + </Box> + )}Note: If you implement the memoization suggestion from the previous comment, you would use the memoized
selectedModelvariable instead of callinggetSelectedModel()multiple times.chat/sippy_agent/web_server.py (1)
17-31: Multi‑model wiring and /chat/models endpoint look coherentThe switch to
AgentManagerplus the/chat/modelsendpoint is wired cleanly: models are listed viaAgentManager.list_models(), wrapped intoModelInfo, and the default model ID is exposed viaModelsResponse. Usingagent_manager.get_agent()in/statusensures tools are initialized before callinglist_tools(). One minor design question:metrics.agent_infostill usesconfig.model_name/config.llm_endpoint, which may not reflect the default model oncemodels.yamlis in use. Consider either documenting that these metrics refer to the base config only, or updating them to reflectget_default_model_id()/its config if that’s what you care about operationally.Also applies to: 88-112, 139-147, 153-163
chat/sippy_agent/agent.py (1)
40-47: Async initialization pattern looks good;list_toolsguard prevents misuseDeferring tools/graph creation into
_initialize()and calling it from bothAgentManager.get_agent()andSippyAgent.achat()gives you lazy startup while keeping callers safe. The new_initializedcheck plus thelist_tools()guard (returning[]when uninitialized) avoids attribute errors on status calls. If you ever expect heavy parallel startup, you may want anasyncio.Lockaround the_initializebody to avoid duplicate tool/graph construction, but that’s an optimization rather than a correctness issue.Also applies to: 48-54, 670-675
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (16)
chat/.env.example(1 hunks)chat/.gitignore(1 hunks)chat/README.md(3 hunks)chat/main.py(3 hunks)chat/models.yaml.example(1 hunks)chat/requirements.txt(1 hunks)chat/sippy_agent/agent.py(8 hunks)chat/sippy_agent/api_models.py(3 hunks)chat/sippy_agent/config.py(5 hunks)chat/sippy_agent/web_server.py(10 hunks)sippy-ng/src/chat/ChatMessage.js(5 hunks)sippy-ng/src/chat/ChatSettings.js(8 hunks)sippy-ng/src/chat/store/modelsSlice.js(1 hunks)sippy-ng/src/chat/store/settingsSlice.js(1 hunks)sippy-ng/src/chat/store/useChatStore.js(3 hunks)sippy-ng/src/chat/store/webSocketSlice.js(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- sippy-ng/src/chat/store/modelsSlice.js
- sippy-ng/src/chat/store/settingsSlice.js
- sippy-ng/src/chat/store/webSocketSlice.js
- sippy-ng/src/chat/ChatMessage.js
- chat/.gitignore
- chat/requirements.txt
🧰 Additional context used
🪛 Ruff (0.14.5)
chat/sippy_agent/agent.py
711-711: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (4)
chat/models.yaml.example (1)
34-46: No changes needed. Model version is current and temperature constraint is properly enforced.The claude-sonnet-4-5@20250929 version is correct and current as of November 2025. The temperature constraint concern is mitigated by the agent initialization logic in
chat/sippy_agent/agent.py(line 80-86): extended thinking is only enabled whenshow_thinking=TrueANDextended_thinking_budget > 0, and when enabled, temperature is automatically set to 1.0. This prevents runtime errors and ensures the API requirement is satisfied.chat/sippy_agent/agent.py (3)
176-204: Tool setup change (logging when DB DSN missing) is reasonableEmitting a warning when
SIPPY_READ_ONLY_DATABASE_DSNis not configured makes the absence of the DB tool explicit without changing behavior when it is present. This is a nice observability improvement and should help diagnose missing configuration in lower envs.
485-555: ---Review comment contains incorrect assumptions about thinking content stream format
The original review assumes thinking content arrives as
chunk.contentparts with{"type": "thinking", "thinking": "..."}for both Gemini and Claude. This is not accurate per current LangChain documentation:
- Gemini: Streams thinking via parts marked with a
part.thoughtboolean flag, not atype: "thinking"key with a separatethinkingfield- Claude-on-Vertex (extended thinking): Uses
reasoning_contentblocks, nottype: "thinking"with athinkingfieldThe code at lines 485–555 checks
part.get("type") == "thinking" and "thinking" in part, which won't match either model's actual format. This logic requires correction to extract thinking content successfully for these models.Likely an incorrect or invalid review comment.
55-116: Concern unfounded: ChatAnthropicVertex supports per-call thinking overridesThe review comment's concern about extended thinking being "fixed for the lifetime of the agent" is not accurate. ChatAnthropicVertex exposes thinking options (thinking_budget, include_thoughts, etc.) that can be passed per-invocation via kwargs/invoke/predict/predict_messages. This means per-request control over extended thinking is available without needing separate agents or LLMs. The thinking configuration set at construction time serves as a default, but individual calls can override these settings, making the suggested workarounds unnecessary.
Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
chat/sippy_agent/agent.py (1)
725-737: Fix Ruff F541 on synthetic model description; optional concurrency hardening inget_agentThe synthetic
ModelConfiguses an f-string without interpolation, which Ruff flags as F541. You can resolve this by dropping thefprefix as previously suggested in the earlier review. This also keeps CI green. As this repeats an earlier comment, tagging as duplicate.- synthetic_model = ModelConfig( - id="default", - name=base_config.model_name, - description=f"Model from environment configuration", - model_name=base_config.model_name, - endpoint=base_config.llm_endpoint, - temperature=base_config.temperature, - extended_thinking_budget=base_config.extended_thinking_budget, - default=True, - ) + synthetic_model = ModelConfig( + id="default", + name=base_config.model_name, + description="Model from environment configuration", + model_name=base_config.model_name, + endpoint=base_config.llm_endpoint, + temperature=base_config.temperature, + extended_thinking_budget=base_config.extended_thinking_budget, + default=True, + )Separately,
AgentManager.get_agent()is straightforward and readable, but if you expect high concurrency it might be worth guarding agent creation with a simple async lock or similar so two concurrent callers for the samemodel_iddon’t both pay the initialization cost before the cache is populated. Not required for correctness, just an efficiency/robustness improvement. (As per static analysis hints and past review comments.)Also applies to: 756-793
🧹 Nitpick comments (3)
chat/sippy_agent/agent.py (2)
40-55: Async lazy initialization looks good; consider guardingadd_toolusageThe
_initializedflag and_initialize()give you a clean lazy-init path for tools/graph, andlist_tools()now safely returns[]when uninitialized. The only remaining sharp edge isadd_tool(), which assumesself.toolsis non‑None; if someone calls it before the firstachat()/get_agent(), it will hit aNoneTypeappend. Consider either documenting thatadd_toolrequires a fully initialized agent or adding a defensive check there (e.g., raising a clear error if_initializedisFalse), so misuse fails loudly rather than with an attribute error.Also applies to: 678-691
494-571: Thought streaming logic cleanly separates Gemini vs Claude; small optional guardThe updated
_achat_streaming()correctly distinguishes Gemini’s complete-thought chunks from Claude’s token stream, buffers thoughts, and then injects each as a separatethinking_stepsentry at the front, preserving chronological order. As a micro‑optimization, you could early‑exit the “thinking” parsing paths when boththinking_callbackisNoneandshow_thinkingisFalse, since in that case the accumulatedthought_bufferis never surfaced. Not a blocker, just a small potential perf/clarity win.Also applies to: 661-668
chat/requirements.txt (1)
1-8: I need to verify the LangChain ecosystem's versioning practices and compatibility recommendations to determine if the upper-bound constraint concern is justified.Upper-bound version constraints are optional best practice, not required.
LangChain and LangGraph follow Semantic Versioning principles, with the first stable releases (1.0.0) providing production-ready APIs. These 1.0 releases mark a commitment to stability and no breaking changes until 2.0. The pip dependency resolution test confirmed no conflicts among the specified versions.
While adding upper-bound constraints (e.g.,
langchain>=1.0.0,<2.0.0) would provide ultra-conservative stability and prevent unexpected major-version jumps, they are not required for safe operation. The current lower-bound-only approach is defensible given that:
- With LangChain 1.0, minor releases do not include breaking changes
- All core packages (langchain, langchain-core, langchain-openai, langgraph) are at 1.0+ with semantic versioning guarantees
- Legacy versions (LangChain 0.3 and LangGraph 0.4) are in MAINTENANCE mode until December 2026
Adding upper bounds is a valid optimization for reproducibility and risk-averse deployments but remains optional given the ecosystem's versioning commitments. Consider upper bounds only if: (a) stricter repeatability is required, or (b) you want to enforce explicit major-version control across your team's deployments.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (2)
chat/requirements.txt(1 hunks)chat/sippy_agent/agent.py(8 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
chat/sippy_agent/agent.py
728-728: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (3)
chat/sippy_agent/agent.py (2)
197-203: DB tool warning improves observabilityAdding an explicit warning when
SIPPY_READ_ONLY_DATABASE_DSNis not configured is a nice touch; it makes it much clearer why the DB query tool is missing instead of silently omitting it.
457-459: Ensuring_initialize()runs before handling chats is the right safeguardAwaiting
self._initialize()at the top ofachat()guarantees tools and graph are ready before streaming, and because_initialize()is idempotent, repeated calls are cheap. This aligns well with the new lazy-init pattern.chat/requirements.txt (1)
5-5: Let me check the pyyaml security status to complete the verification:Dependency versions are current, stable, and secure.
PyYAML 6.0.3 is the latest non-vulnerable version, fixing vulnerabilities that existed before version 5.4. Anthropic 0.20.0 has no known vulnerabilities, and the latest stable anthropic release is 0.73.0. PyYAML demonstrates a positive version release cadence with recent releases. All three dependencies use permissive
>=version constraints, allowing automatic updates to the latest secure releases. The specified versions are current as of the PR's timeline and free from known security vulnerabilities.
|
Scheduling required tests: |
|
/hold I think its reviewable/testable but looking at one of the coderabbit comments about a race |
|
/test e2e |
smg247
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/hold cancel |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smg247, stbenjam The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @chat/sippy_agent/agent.py:
- Around line 764-778: The description string for the synthetic ModelConfig uses
an unnecessary f-string: change description=f"Model from environment
configuration" to a plain string description="Model from environment
configuration" in the synthetic_model creation (see ModelConfig instantiation
assigning to self.models["default"] and variable synthetic_model), leaving all
other fields (name=model_name, endpoint=llm_endpoint, temperature,
extended_thinking_budget, default=True) unchanged.
🧹 Nitpick comments (1)
chat/sippy_agent/config.py (1)
174-232: Robust model configuration loading with comprehensive validation.The
load_models_config()function properly validates:
- Required
modelskey presence- Duplicate model IDs
- Multiple default models
- Empty model list
The fallback to the first model as default when none is specified is sensible.
One consideration: the broad
except Exceptionon line 231 catches all exceptions (including theValueErrorones raised above) and re-wraps them. This works but may double-wrap the message for intentionalValueErrorraises. You may want to catch only non-ValueErrorexceptions:♻️ Optional: preserve ValueError messages
- except Exception as e: - raise ValueError(f"Error loading models configuration: {e}") + except ValueError: + raise # Re-raise validation errors as-is + except Exception as e: + raise ValueError(f"Error loading models configuration: {e}") from e
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (5)
chat/README.mdchat/requirements.txtchat/sippy_agent/agent.pychat/sippy_agent/config.pychat/sippy_agent/web_server.py
🚧 Files skipped from review as they are similar to previous changes (1)
- chat/requirements.txt
🧰 Additional context used
🧬 Code graph analysis (1)
chat/sippy_agent/agent.py (2)
chat/sippy_agent/config.py (6)
Config(45-171)ModelConfig(16-42)load_models_config(174-232)is_claude_model(138-140)is_gemini_model(134-136)to_config(28-42)chat/sippy_agent/personas.py (1)
get_persona(82-84)
🪛 Ruff (0.14.10)
chat/sippy_agent/agent.py
767-767: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (11)
chat/README.md (2)
46-93: Well-structured documentation for Claude/Vertex AI integration.The Claude configuration section clearly covers both authentication options (gcloud auth vs service account), extended thinking feature with budget controls, and important notes about temperature requirements. The troubleshooting guidance for 400 errors is a nice touch.
94-129: Comprehensive multi-model configuration documentation.The models.yaml configuration section clearly documents all available fields, fallback behavior, and startup instructions. This aligns well with the implementation in
config.py.chat/sippy_agent/web_server.py (3)
139-146: Clean implementation of the models listing endpoint.The
/chat/modelsendpoint properly exposes available models and the default model ID. The use ofModelInfofor type safety is good.
220-247: Per-request model handling is well implemented.The HTTP
/chatendpoint correctly:
- Retrieves the appropriate agent via
agent_manager.get_agent(request.model_id)- Determines the actual model_id used (falling back to default)
- Passes per-request
personaandshow_thinkingas parameters without mutating shared config- Includes
model_idin the response
290-310: Good error handling for agent initialization in WebSocket handler.The try/except around
agent_manager.get_agent()properly handles initialization failures and sends an error message to the client via the WebSocket, allowing the connection to continue for subsequent requests.chat/sippy_agent/config.py (1)
16-42: Well-designed ModelConfig with proper inheritance pattern.The
to_config()method correctly inherits base settings and only overridestemperatureandextended_thinking_budgetwhen explicitly set (not None). This prevents unintended overrides with default values.chat/sippy_agent/agent.py (5)
44-53: Good async initialization pattern.The deferred initialization with
_initialize()and_initializedflag allows the constructor to remain synchronous while deferring heavy tool loading and graph creation to the first use. This is a clean pattern for async initialization.
78-115: Claude extended thinking configuration is well implemented.The extended thinking setup correctly:
- Enables only when
show_thinking=TrueANDextended_thinking_budget > 0- Sets temperature to 1.0 (required by Claude API)
- Sets max_tokens to 64000 (Claude's max output)
- Configures the thinking budget via model_kwargs
The verbose logging provides good visibility into the configuration.
526-540: Per-request graph creation addresses the race condition concern.Creating
request_graph = self._create_agent_graph(effective_persona)per request instead of mutatingself.graphis the correct approach for thread safety. This ensures each request operates on its own graph instance with the appropriate persona, avoiding the race condition mentioned in the PR comments.
795-832: AgentManager correctly caches agents per model.The
get_agent()method properly:
- Falls back to default model when model_id is None
- Validates that the requested model exists
- Returns cached agents for efficiency
- Creates and initializes new agents on demand
The async initialization via
await agent._initialize()ensures tools are loaded before returning.
726-730: Defensive guard for uninitialized agent.The
list_tools()method now safely returns an empty list if the agent hasn't been initialized, preventing potentialNoneTypeerrors.
|
Scheduling required tests: |
|
@stbenjam: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.