Skip to content

Commit 43ea5a4

Browse files
authored
Merge pull request #129 from redbus-labs/sarvam-ai
Sarvam ai integration
2 parents 5572734 + 8802ce8 commit 43ea5a4

31 files changed

Lines changed: 4498 additions & 68 deletions

contrib/sarvam-ai/CAPABILITIES.md

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
# Sarvam AI - ADK Integration Capabilities
2+
3+
## Overview
4+
5+
The Sarvam AI module provides a comprehensive, production-grade integration of Sarvam AI services into the Google Agent Development Kit (ADK) for Java. It spans five service domains -- Chat, Speech-to-Text, Text-to-Speech, Vision, and Live Connections -- covering both REST and WebSocket protocols with full observability, resilience, and multi-turn agentic support.
6+
7+
**Module path:** `contrib/sarvam-ai`
8+
**Package:** `com.google.adk.models.sarvamai`
9+
**Branch:** `sarvam-ai`
10+
11+
---
12+
13+
## 1. Chat Completions (LLM)
14+
15+
**Class:** `SarvamAi` extends `BaseLlm`
16+
**Endpoint:** `POST /v1/chat/completions` (OpenAI-compatible)
17+
18+
| Capability | Details |
19+
|---|---|
20+
| Blocking (non-streaming) | Full request/response cycle via `generateContent(request, false)` |
21+
| SSE Streaming | Real-time token-by-token delivery via `generateContent(request, true)` with backpressure (RxJava `Flowable`) |
22+
| Function / Tool Calling | ADK `FunctionDeclaration` serialized to OpenAI `tools` JSON with `tool_choice: auto` |
23+
| Multi-turn Tool History | Prior `tool_calls` correctly formatted as assistant messages with `tool_call_id`, `function.name`, `function.arguments`; tool responses sent as `role: tool` |
24+
| Streaming Function Calls | Chunked `name` and `arguments` accumulated across SSE deltas, emitted as final `FunctionCall` Part |
25+
| Token Usage Tracking | `prompt_tokens`, `completion_tokens`, `total_tokens` extracted for both blocking and streaming modes. Streaming uses `stream_options: {"include_usage": true}` |
26+
| System Instructions | ADK `GenerateContentConfig.systemInstruction` mapped to OpenAI `system` role message |
27+
| Temperature Control | Forwarded from `GenerateContentConfig.temperature` (default 0.7) |
28+
| Max Output Tokens | `GenerateContentConfig.maxOutputTokens` forwarded as `max_tokens` |
29+
| Top-P Sampling | Configurable via `SarvamAiConfig.topP()` |
30+
| Frequency / Presence Penalty | Configurable via `SarvamAiConfig` builder |
31+
| Reasoning Effort | Sarvam-specific `reasoning_effort` parameter (low / medium / high) |
32+
| Wiki Grounding | Sarvam-specific `wiki_grounding` toggle for factual grounding |
33+
| Role Translation | ADK `model` -> OpenAI `assistant`, `user` -> `user`, `functionResponse` -> `tool` |
34+
| Schema Normalization | Type strings lowercased, nested `items.properties` recursively normalized for OpenAI schema compatibility |
35+
| Graceful Degradation | Empty choices return empty text response instead of crashing |
36+
37+
### Dual Implementation
38+
39+
| Implementation | Location | Use Case |
40+
|---|---|---|
41+
| `SarvamBaseLM` | `core/src/main/java/.../models/SarvamBaseLM.java` | Lightweight, env-var driven. Used by `AgentModelConfig` and `LlmRegistry` for `Sarvam\|model` config strings |
42+
| `SarvamAi` | `contrib/sarvam-ai/src/.../SarvamAi.java` | Full-featured, Builder-pattern, OkHttp-based. Supports all chat parameters plus subservice access |
43+
44+
---
45+
46+
## 2. Speech-to-Text (STT)
47+
48+
**Class:** `SarvamSttService` implements `TranscriptionService`
49+
**Model:** `saaras:v3`
50+
51+
| Capability | Details |
52+
|---|---|
53+
| REST Synchronous | `transcribe(byte[] audioData, TranscriptionConfig)` via `POST /speech-to-text` with multipart/form-data |
54+
| REST Async | `transcribeAsync()` executes on RxJava IO scheduler |
55+
| WebSocket Streaming | Real-time streaming via `wss://api.sarvam.ai/speech-to-text/streaming` with VAD (Voice Activity Detection) signals |
56+
| Transcription Modes | `transcribe`, `translate`, `verbatim`, `translit`, `codemix` |
57+
| Language Detection | Auto-detection supported; explicit BCP-47 codes (e.g., `hi-IN`, `en-IN`) also accepted |
58+
| VAD Signals | `speech_start` and `speech_end` events for voice activity boundaries |
59+
| ADK TranscriptionService | Full implementation of ADK's `TranscriptionService` interface including `isAvailable()`, `getServiceType()`, `getHealth()` |
60+
61+
---
62+
63+
## 3. Text-to-Speech (TTS)
64+
65+
**Class:** `SarvamTtsService`
66+
**Model:** `bulbul:v3`
67+
68+
| Capability | Details |
69+
|---|---|
70+
| REST Synchronous | `synthesize(text, languageCode)` returns decoded WAV audio bytes |
71+
| REST Async | `synthesizeAsync()` on IO scheduler |
72+
| WebSocket Streaming | `synthesizeStream()` via `wss://api.sarvam.ai/text-to-speech/streaming` for low-latency progressive audio chunk delivery |
73+
| 30+ Speaker Voices | Configurable via `SarvamAiConfig.ttsSpeaker()` (default: `shubh`) |
74+
| Pace Control | Adjustable speech pace (0.5x to 2.0x) |
75+
| Sample Rate | Configurable output sample rate |
76+
| Base64 Decoding | Audio chunks automatically decoded from base64 to raw bytes |
77+
| WebSocket Lifecycle | Config frame -> text frame -> flush frame -> audio chunks -> final event -> close |
78+
79+
---
80+
81+
## 4. Vision / Document Intelligence
82+
83+
**Class:** `SarvamVisionService`
84+
**Model:** Sarvam Vision 3B VLM
85+
86+
| Capability | Details |
87+
|---|---|
88+
| Multi-Language OCR | 23 languages (22 Indian + English) |
89+
| Input Formats | PDF, PNG, JPG, ZIP |
90+
| Output Formats | HTML or Markdown |
91+
| Async Job Pipeline | `createJob` -> `uploadDocument` (presigned URL) -> `startJob` -> `getJobStatus` (poll) -> `downloadResults` |
92+
| Convenience Method | `processDocument(filePath, languageCode, outputFormat)` runs the full pipeline with adaptive exponential backoff polling |
93+
| Polling Backoff | Starts at 2s, doubles up to 10s cap, max 60 polls (~2 min timeout) |
94+
95+
---
96+
97+
## 5. Live Bidirectional Connection
98+
99+
**Class:** `SarvamAiLlmConnection` implements `BaseLlmConnection`
100+
101+
| Capability | Details |
102+
|---|---|
103+
| Multi-Turn Context | Maintains conversation history across turns, accumulates full model responses |
104+
| sendHistory | Replace full conversation context |
105+
| sendContent | Append a single turn and trigger streaming response |
106+
| receive | Returns `Flowable<LlmResponse>` via `PublishSubject` for reactive consumers |
107+
| Thread Safety | History list synchronized for concurrent access |
108+
| Realtime Guard | `sendRealtime(Blob)` throws `UnsupportedOperationException` with guidance to use STT/TTS services |
109+
110+
---
111+
112+
## 6. Resilience & Configuration
113+
114+
### Retry with Exponential Backoff
115+
116+
**Class:** `SarvamRetryInterceptor` (OkHttp `Interceptor`)
117+
118+
| Parameter | Value |
119+
|---|---|
120+
| Retryable codes | 429 (rate limit), 503, 5xx (server errors) |
121+
| Base delay | 500ms |
122+
| Max delay | 30s |
123+
| Strategy | Exponential backoff with 20% jitter |
124+
| Default max retries | 3 |
125+
126+
### Immutable Configuration
127+
128+
**Class:** `SarvamAiConfig` (Builder pattern)
129+
130+
| Parameter | Default |
131+
|---|---|
132+
| Chat endpoint | `https://api.sarvam.ai/v1/chat/completions` |
133+
| STT endpoint | `https://api.sarvam.ai/speech-to-text` |
134+
| STT WebSocket | `wss://api.sarvam.ai/speech-to-text/streaming` |
135+
| TTS endpoint | `https://api.sarvam.ai/text-to-speech` |
136+
| TTS WebSocket | `wss://api.sarvam.ai/text-to-speech/streaming` |
137+
| Vision endpoint | `https://api.sarvam.ai/document-intelligence` |
138+
| Connect timeout | 30s |
139+
| Read timeout | 120s |
140+
| Max retries | 3 |
141+
| API key resolution | Explicit value > `SARVAM_API_KEY` env var |
142+
143+
### Structured Error Handling
144+
145+
**Class:** `SarvamAiException` extends `RuntimeException`
146+
147+
| Field | Purpose |
148+
|---|---|
149+
| `statusCode` | HTTP status code from API |
150+
| `errorCode` | Sarvam-specific error code |
151+
| `requestId` | Sarvam request ID for support tracing |
152+
| `isRetryable()` | Programmatic check (429, 503, 5xx) |
153+
154+
---
155+
156+
## 7. Authentication
157+
158+
| Method | Header | Used By |
159+
|---|---|---|
160+
| API Subscription Key | `api-subscription-key: <key>` | `SarvamAi`, STT, TTS, Vision (contrib module) |
161+
| Bearer Token | `Authorization: Bearer <key>` | `SarvamBaseLM` (core module, OpenAI-compatible) |
162+
| Key Resolution | `SARVAM_API_KEY` env var or explicit via Builder | Both |
163+
| Fail-Fast Validation | Warning logged at construction if key is missing | `SarvamBaseLM` |
164+
165+
---
166+
167+
## 8. Test Coverage
168+
169+
| Test Class | Tests | Scope |
170+
|---|---|---|
171+
| `SarvamBaseLMTest` | 10 | Response parsing (text, null, tool calls), construction, connection type |
172+
| `SarvamAiTest` | - | Chat completion blocking and streaming |
173+
| `SarvamAiConfigTest` | - | Config builder validation, defaults, env var resolution |
174+
| `ChatRequestTest` | - | Request serialization from LlmRequest |
175+
| `SarvamSttServiceTest` | - | STT REST and WebSocket transcription |
176+
| `SarvamTtsServiceTest` | - | TTS REST and WebSocket synthesis |
177+
| `SarvamRetryInterceptorTest` | - | Retry logic, delay calculation, jitter |
178+
| `SarvamIntegrationTest` (rae) | 20 | End-to-end config wiring across properties, YAML, LlmRegistry |
179+
180+
---
181+
182+
## 9. RAE Integration (Consumer Project)
183+
184+
| Integration Point | Mechanism | File |
185+
|---|---|---|
186+
| Code-based agents | `AgentModelConfig` recognizes `Sarvam\|` prefix, instantiates `SarvamBaseLM` | `AgentModelConfig.java` |
187+
| YAML-based agents | `LlmRegistry.registerLlm("Sarvam\\|.*", ...)` factory | `ApplicationRegistry.java` |
188+
| Model metadata | `sarvam:` provider in `models.yaml` with feature declarations | `models.yaml` |
189+
| Config format | `Sarvam\|sarvam-m` -- single string works across both paths | `agent-models.properties` + `*.yaml` |
190+
| Global coverage | 43 code-based + 28 YAML agent configs switched to Sarvam | All agent config files |
191+
192+
---
193+
194+
## Architecture Summary
195+
196+
```
197+
contrib/sarvam-ai/
198+
src/main/java/com/google/adk/models/sarvamai/
199+
SarvamAi.java # BaseLlm (chat, Builder pattern, OkHttp)
200+
SarvamAiConfig.java # Immutable config for all services
201+
SarvamAiException.java # Structured error with status/code/requestId
202+
SarvamAiLlmConnection.java # Live bidirectional multi-turn connection
203+
SarvamRetryInterceptor.java # Exponential backoff with jitter
204+
chat/
205+
ChatRequest.java # OpenAI-compatible request model
206+
ChatResponse.java # Response deserialization
207+
ChatChoice.java # Choice wrapper
208+
ChatMessage.java # Message model
209+
ChatUsage.java # Token usage tracking
210+
stt/
211+
SarvamSttService.java # REST + WebSocket STT (TranscriptionService)
212+
tts/
213+
SarvamTtsService.java # REST + WebSocket TTS
214+
TtsRequest.java # TTS request model
215+
TtsResponse.java # TTS response model
216+
vision/
217+
SarvamVisionService.java # Async job pipeline for document OCR
218+
219+
core/src/main/java/com/google/adk/models/
220+
SarvamBaseLM.java # Lightweight BaseLlm for agent config integration
221+
```

contrib/sarvam-ai/pom.xml

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!--
3+
Copyright 2025 Google LLC
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
18+
<modelVersion>4.0.0</modelVersion>
19+
20+
<parent>
21+
<groupId>com.google.adk</groupId>
22+
<artifactId>google-adk-parent</artifactId>
23+
<version>0.5.1-SNAPSHOT</version><!-- {x-version-update:google-adk:current} -->
24+
<relativePath>../../pom.xml</relativePath>
25+
</parent>
26+
27+
<artifactId>google-adk-sarvam-ai</artifactId>
28+
<name>Agent Development Kit - Sarvam AI</name>
29+
<description>Sarvam AI integration for the Agent Development Kit.</description>
30+
31+
<dependencies>
32+
<!-- Main dependencies -->
33+
<dependency>
34+
<groupId>com.google.adk</groupId>
35+
<artifactId>google-adk</artifactId>
36+
<version>${project.version}</version>
37+
</dependency>
38+
<dependency>
39+
<groupId>com.google.adk</groupId>
40+
<artifactId>google-adk-dev</artifactId>
41+
<version>${project.version}</version>
42+
</dependency>
43+
<dependency>
44+
<groupId>com.squareup.okhttp3</groupId>
45+
<artifactId>okhttp</artifactId>
46+
<version>${okhttp.version}</version>
47+
</dependency>
48+
<dependency>
49+
<groupId>com.google.guava</groupId>
50+
<artifactId>guava</artifactId>
51+
</dependency>
52+
<dependency>
53+
<groupId>com.google.errorprone</groupId>
54+
<artifactId>error_prone_annotations</artifactId>
55+
</dependency>
56+
57+
<!-- Test dependencies -->
58+
<dependency>
59+
<groupId>org.junit.jupiter</groupId>
60+
<artifactId>junit-jupiter-api</artifactId>
61+
<scope>test</scope>
62+
</dependency>
63+
<dependency>
64+
<groupId>org.junit.jupiter</groupId>
65+
<artifactId>junit-jupiter-params</artifactId>
66+
<scope>test</scope>
67+
</dependency>
68+
<dependency>
69+
<groupId>org.junit.jupiter</groupId>
70+
<artifactId>junit-jupiter-engine</artifactId>
71+
<scope>test</scope>
72+
</dependency>
73+
<dependency>
74+
<groupId>com.google.truth</groupId>
75+
<artifactId>truth</artifactId>
76+
<scope>test</scope>
77+
</dependency>
78+
<dependency>
79+
<groupId>org.assertj</groupId>
80+
<artifactId>assertj-core</artifactId>
81+
<scope>test</scope>
82+
</dependency>
83+
<dependency>
84+
<groupId>org.mockito</groupId>
85+
<artifactId>mockito-junit-jupiter</artifactId>
86+
<version>${mockito.version}</version>
87+
<scope>test</scope>
88+
</dependency>
89+
<dependency>
90+
<groupId>com.squareup.okhttp3</groupId>
91+
<artifactId>mockwebserver</artifactId>
92+
<version>${okhttp.version}</version>
93+
<scope>test</scope>
94+
</dependency>
95+
</dependencies>
96+
<build>
97+
<plugins>
98+
<plugin>
99+
<artifactId>maven-surefire-plugin</artifactId>
100+
<version>3.5.2</version>
101+
<dependencies>
102+
<dependency>
103+
<groupId>me.fabriciorby</groupId>
104+
<artifactId>maven-surefire-junit5-tree-reporter</artifactId>
105+
<version>0.1.0</version>
106+
</dependency>
107+
<!-- Explicitly add JUnit Jupiter Engine for Surefire's classpath -->
108+
<dependency>
109+
<groupId>org.junit.jupiter</groupId>
110+
<artifactId>junit-jupiter-engine</artifactId>
111+
<version>${junit.version}</version>
112+
</dependency>
113+
<!-- Explicitly add Mockito JUnit Jupiter for Surefire's classpath -->
114+
<dependency>
115+
<groupId>org.mockito</groupId>
116+
<artifactId>mockito-junit-jupiter</artifactId>
117+
<version>${mockito.version}</version>
118+
</dependency>
119+
</dependencies>
120+
<configuration>
121+
<reportFormat>plain</reportFormat>
122+
<statelessTestsetInfoReporter
123+
implementation="org.apache.maven.plugin.surefire.extensions.junit5.JUnit5StatelessTestsetInfoTreeReporter" />
124+
<includes>
125+
<include>**/*Test.java</include>
126+
</includes>
127+
<!-- Explicitly define test source directory -->
128+
<testSourceDirectory>${project.basedir}/src/test/java</testSourceDirectory>
129+
</configuration>
130+
</plugin>
131+
</plugins>
132+
</build>
133+
</project>

0 commit comments

Comments
 (0)