feat(providers): add Google Vertex AI inference provider#1568
Draft
maxamillion wants to merge 15 commits into
Draft
feat(providers): add Google Vertex AI inference provider#1568maxamillion wants to merge 15 commits into
maxamillion wants to merge 15 commits into
Conversation
Add Google Vertex AI as an inference provider with support for: - Anthropic Claude models via the native rawPredict endpoint - All other models (Gemini, Llama, Mistral, etc.) via the OpenAI-compatible endpoint - Credential bootstrapping from gcloud Application Default Credentials via --from-gcloud-adc - Automatic publisher inference from model ID prefixes - VERTEX_AI_BASE_URL escape hatch for custom deployments New proto fields: model_in_path (bool) and request_path_override (optional string) on ResolvedRoute, enabling per-route URL construction and body injection. Adds anthropic-version header injection for rawPredict/streamRawPredict requests.
Remove nine non-Vertex AI provider YAML profiles (anthropic, claude, codex, copilot, google-drive, gitlab, openai, opencode, outlook) that were bundled into the Vertex AI feature commit. These profiles are additive catalog expansion for pre-existing provider plugins and will land in a separate branch. Restore BUILT_IN_PROFILE_YAMLS to the original three entries plus the new google-vertex-ai.yaml. Revert test assertions that were adjusted solely because the catalog entries changed: - credential_env_vars_are_deduplicated_in_profile_order: back to claude-code - list_provider_profiles_returns_built_in_profile_categories: back to 4-entry list
- Replace fragile body injection heuristic (substring match on request_path_override) with semantic check on model_in_path and anthropic_messages protocol; add two negative tests confirming non-Vertex Anthropic and Vertex Gemini routes do not inject - Replace silent let _ = rollback in provider_create with an eprintln warning that includes manual deletion instructions - Improve infer_vertex_publisher doc comment: clarify only 'anthropic' result is consumed by routing logic today - Add tracing::warn! in resolve_vertex_ai_route when VERTEX_AI_BASE_URL escape hatch is used with an Anthropic model - Fix clippy doc_markdown lints in new test doc comments
johntmyers
reviewed
May 26, 2026
|
|
||
| pub struct VertexAiProvider; | ||
|
|
||
| fn discover_with_context(ctx: &dyn DiscoveryContext) -> Option<DiscoveredProvider> { |
Collaborator
There was a problem hiding this comment.
This is specific to the v1 of providers. In general I'd like to only support providers v2 moving forward which would rely on a provider profile and using the discovery capabilities we have for v2 profiles. Is there a specific need to support the "legacy" providers?
johntmyers
reviewed
May 26, 2026
Comment on lines
+4
to
+7
| title: "Google Vertex AI" | ||
| sidebar-title: "Google Vertex AI" | ||
| description: "Configure OpenShell to route inference traffic through Google Vertex AI, including Anthropic Claude and Gemini models." | ||
| keywords: "Generative AI, Cybersecurity, AI Agents, Sandboxing, Google Vertex AI, Anthropic Claude, Inference Routing" |
Collaborator
There was a problem hiding this comment.
Could this be under tutorials? We already have one there: https://docs.nvidia.com/openshell/latest/get-started/tutorials/microsoft-graph-provider-refresh
Align Vertex routing, discovery, and credential refresh behavior with the documented setup, and harden git sync helpers against hook-time Git environment leakage so pre-commit stays reliable in worktrees. Co-authored-by: Cursor <cursoragent@cursor.com>
- Add runtime mutual exclusivity guard for --from-gcloud-adc in
provider_create; clap enforces this at parse time but the guard was
missing for programmatic callers
- Replace fragile unwrap_or_default + if-chain in read_gcloud_adc with
an explicit match on Option<&str>, distinguishing service_account,
authorized_user, unknown type, and missing type field; corrects the
error message for service account ADC to reference the actual workflow
- Normalize leading slash in build_provider_url (Some(override), false)
arm; prevents silent URL corruption when override_path lacks a leading
slash; hoist trim_end_matches('/') before the match to remove 3x
duplication
- Populate model in non-Vertex ResolvedProviderRoute at construction
instead of leaving it empty; removes the clone-and-patch in
verify_provider_endpoint
- Replace allow_fallback: bool in find_provider_api_key with a
CredentialLookup enum (PreferredOnly / PreferredThenAny) so the call
site is self-documenting and the negation logic is gone
- Add tests: build_provider_url_override_path_normalizes_missing_leading_slash,
vertex_ai_body_preserves_client_anthropic_version,
resolve_vertex_ai_route_google_prefixed_base_url_override,
resolve_vertex_ai_route_base_url_priority_google_wins
Support global and multi-region Vertex hosts and reject Anthropic base URL overrides so inference routing keeps the correct request shaping, headers, and operator guidance. Signed-off-by: Adam Miller <admiller@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
S-1: add validate_gcp_project_id/validate_gcp_region in openshell-server
to reject malformed project IDs and regions before URL interpolation.
DRY-1: introduce normalize_inference_provider_type in openshell-core as
the single source of truth for inference provider alias resolution.
profile_for delegates to it; normalize_provider_type in openshell-providers
delegates inference cases to core, eliminating the duplicated alias list.
DRY-2: extract VERTEX_AI_CREDENTIAL_KEY_NAMES const into openshell-core.
VERTEX_AI_PROFILE and vertex.rs (discover_with_context + credential_env_vars)
all reference the same constant instead of three independent copies.
R-1: change read_gcloud_adc return type from anyhow::Result to miette::Result
for consistency with the rest of the CLI crate. Remove anyhow dependency
from openshell-cli entirely.
R-2: replace the unreachable unwrap_or_else fallback in body mutation with
expect(); refactor match to map_or to satisfy clippy::option_if_let_else.
C-1: route_headers_for_route now extends profile passthrough headers rather
than replacing them, preserving any future profile-level entries.
C-4: filter empty GOOGLE_APPLICATION_CREDENTIALS before using it as a path,
falling through to the default ADC location instead of PathBuf::from("").
C-5: document the JSON-body invariant above the body mutation block.
C-6: debug_assert in build_provider_url (Some(suffix), true) arm that suffix
does not start with '/', guarding against future misuse of the API.
DRY-3: extract VERTEX_AI_PROVIDER_TYPE const in run.rs; replace 4 hardcoded
string comparisons with the constant.
T-2: add resolve_vertex_ai_route_whitespace_only_project_fails test.
T-3: add read_gcloud_adc_malformed_json_errors test.
T-4: add upsert_cluster_inference_route_vertex_ai_anthropic_sets_model_in_path
test verifying model_in_path=true and request_path_override persistence.
Reject unsafe Vertex base URL overrides, canonicalize provider aliases, and drop Anthropic model discovery until the route contract is correct. Tighten ADC flag validation and align docs with the supported behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
The vertex-provider branch diverged from main on three unrelated PRs: - PR NVIDIA#1526: OCSF builder macro and shared driver helpers (reverted here to match main's macro-based approach) - PR NVIDIA#1547: Python SDK FileNotFoundError -> SandboxError translation (restores user-friendly error messages for missing gateway files) - PR NVIDIA#1539: bash 3.2-compatible read loop in helm-k3s-local.sh (restores mapfile -> while IFS= read for macOS compat)
Avoid unnecessary String allocation in passthrough header comparison, validate model IDs for all Vertex routes (not just Anthropic) as defense-in-depth, document :rawPredict forward-compat in is_vertex_anthropic_rawpredict_route, and collapse duplicate IPv4/IPv6 match arms in validate_vertex_base_url.
Keep Vertex Claude requests on the rawPredict contract and only upgrade streaming calls to streamRawPredict. Mint the initial ADC-backed access token during provider creation so successful Vertex bootstrap yields an immediately usable provider. Co-authored-by: Cursor <cursoragent@cursor.com>
The VertexAiProvider plugin (discover_with_context, credential_env_vars) was the only V1-specific code added by this branch. Vertex AI discovery now relies entirely on V2 profile-based discovery: - Credentials are scanned by discover_from_profile() via the google-vertex-ai.yaml profile's discovery.credentials list. - Config keys (project ID, region, base URL, publisher) are scanned directly from VERTEX_AI_CONFIG_KEY_NAMES in the V2 path of discover_existing_provider_data(). --from-gcloud-adc and --credential flows are unaffected; they never used the plugin. --from-existing now requires providers_v2_enabled=true on the gateway, which is the correct V2-only posture. Remove the V1 registry fallback test for Vertex and update the config-only credential error test to run with V2 enabled.
When a google-vertex-ai provider is attached to a sandbox, resolve_provider_environment now derives agent-specific environment variables from the provider's config and injects them alongside the credential env vars. Static flags (always present): CLAUDE_CODE_USE_VERTEX=1 GOOSE_PROVIDER=gcp_vertex_ai Derived from VERTEX_AI_PROJECT_ID (when set): ANTHROPIC_VERTEX_PROJECT_ID GCP_PROJECT_ID GOOGLE_CLOUD_PROJECT Derived from VERTEX_AI_REGION (when set): CLOUD_ML_REGION GCP_LOCATION VERTEX_LOCATION Injected values use entry().or_insert() so explicit credentials take precedence. Sandbox --env overrides are applied at the process level after environment installation, so they naturally shadow these values. Non-Vertex providers are unaffected.
Blanket-blocking AF_NETLINK prevented getifaddrs(3) from working inside sandboxes. glibc and musl both use socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE) internally — there is no /proc fallback. This caused runtimes such as Node.js, Python, and Go to fail with errors like "getifaddrs returned an error" at startup. Replace the unconditional AF_NETLINK domain block with a two-condition seccomp rule that blocks socket() only when arg0==AF_NETLINK AND arg2!=0. This allows NETLINK_ROUTE (protocol 0) while keeping every other netlink protocol (NETLINK_SOCK_DIAG, NETLINK_NETFILTER, NETLINK_AUDIT, NETLINK_GENERIC, etc.) blocked with EPERM. Risk remains low: write operations via NETLINK_ROUTE require CAP_NET_ADMIN which the sandbox does not grant, and the network namespace scopes all reads to sandbox-local interfaces only.
Vertex AI rawPredict encodes the model in the URL path and rejects a 'model' field in the request body with HTTP 400 'Extra inputs are not permitted'. Anthropic SDK clients (including Claude Code) always include 'model' in the body for the standard Anthropic API format. Remove any client-supplied 'model' key from the body when the route targets a Vertex AI Anthropic rawPredict endpoint, complementing the existing anthropic-beta header stripping fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add Google Vertex AI as a first-class inference provider, supporting both service account (JWT) and gcloud ADC (OAuth2 refresh token) credential flows. Routes Anthropic models through the Vertex AI Anthropic Messages endpoint and Gemini/other models through the OpenAI-compatible endpoint.
Related Issue
Changes
Core provider (3 commits):
feat(providers): add Google Vertex AI inference providerproviders/google-vertex-ai.yamlprovider profile (two credential definitions: service account key + gcloud ADC)crates/openshell-providers/src/providers/vertex.rs— provider discovery fromGOOGLE_APPLICATION_CREDENTIALS/GOOGLE_CLOUD_PROJECT/GOOGLE_CLOUD_LOCATIONenv varscrates/openshell-core/src/inference.rs—VERTEX_AI_PROFILEstatic,AuthHeader::ServiceAccountJwt/OAuth2Tokenvariants,profile_fordispatchproto/inference.proto—ResolvedRoutefields 8 (model_in_path bool) and 9 (request_path_override optional string)crates/openshell-router/src/config.rs—ResolvedRoutestruct additionscrates/openshell-server/src/inference.rs—resolve_vertex_ai_routewith 4-case dispatch (Anthropic model, Gemini model, explicit publisher override, unknown → OpenAI compat),infer_vertex_publisherfor 6 model families, full test suitecrates/openshell-router/src/backend.rs—build_provider_urlVertex AI case: URL construction + body injection ofanthropic-beta/anthropic-versionfor Anthropic Vertex routescrates/openshell-cli/src/run.rs—read_gcloud_adc,--from-gcloud-adcflag onprovider createdocs/providers/google-vertex-ai.mdx— user-facing docs for both auth pathsrefactor(providers): scope vertex-provider branch to vertex ai onlyfix(providers): address vertex-ai code review findingsrequest_path_override) with semantic check onmodel_in_pathandanthropic_messagesprotocollet _ =rollback inprovider_createwith aneprintlnwarning including manual deletion instructionsinfer_vertex_publisherdoc comment to clarify only"anthropic"is consumed by routing logic todaytracing::warn!whenVERTEX_AI_BASE_URLescape hatch is used with an Anthropic modeldoc_markdownlints in new test doc commentsTesting
mise run pre-commitpasses (lint, format, license headers)cargo test -p openshell-router -p openshell-cli -p openshell-server— all pass)Checklist