-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Problem
The zai-sdk currently only provides synchronous clients (ZaiClient, ZhipuAiClient). When used in an async Python application (e.g. FastAPI, aiohttp), every SDK call blocks the event loop thread.
Workarounds required today
Non-streaming calls must be wrapped in asyncio.to_thread():
response = await asyncio.to_thread(
client.chat.completions.create,
model="glm-4",
messages=messages,
)Streaming calls are worse — not only the initial request but also each __next__() call on the stream iterator blocks. This requires a background thread + asyncio.Queue bridge:
stream = await asyncio.to_thread(client.chat.completions.create, ..., stream=True)
queue = asyncio.Queue()
loop = asyncio.get_running_loop()
def _iter_in_thread():
try:
for chunk in stream:
loop.call_soon_threadsafe(queue.put_nowait, chunk)
except Exception as exc:
loop.call_soon_threadsafe(queue.put_nowait, exc)
finally:
loop.call_soon_threadsafe(queue.put_nowait, None)
loop.run_in_executor(None, _iter_in_thread)
while True:
item = await queue.get()
if item is None:
break
if isinstance(item, Exception):
raise item
yield itemThis adds ~30 lines of thread-safety boilerplate per provider, increases thread pool pressure, and makes error propagation fragile.
How other SDKs solve this
All major LLM SDKs provide a native async client alongside the sync one:
| SDK | Sync client | Async client |
|---|---|---|
openai |
OpenAI |
AsyncOpenAI |
anthropic |
Anthropic |
AsyncAnthropic |
groq |
Groq |
AsyncGroq |
mistralai |
Mistral |
AsyncMistral (via httpx.AsyncClient) |
cerebras-cloud-sdk |
Cerebras |
AsyncCerebras |
zai-sdk |
ZaiClient / ZhipuAiClient |
❌ missing |
Proposal
Add AsyncZaiClient and AsyncZhipuAiClient with the same API surface but using httpx.AsyncClient under the hood.
from zai import AsyncZaiClient
client = AsyncZaiClient(api_key="...")
# Non-streaming
response = await client.chat.completions.create(model="glm-4", messages=[...])
# Streaming
async for chunk in await client.chat.completions.create(model="glm-4", messages=[...], stream=True):
print(chunk.choices[0].delta.content)The SDK already uses httpx internally, so this should be achievable by swapping httpx.Client → httpx.AsyncClient and making the method signatures async.
Environment
zai-sdkversion: 0.2.2- Python: 3.12+
- Use case: high-concurrency async web service with streaming SSE responses