Skip to content

Feature request: native async client (AsyncZaiClient) #65

@clemlesne

Description

@clemlesne

Problem

The zai-sdk currently only provides synchronous clients (ZaiClient, ZhipuAiClient). When used in an async Python application (e.g. FastAPI, aiohttp), every SDK call blocks the event loop thread.

Workarounds required today

Non-streaming calls must be wrapped in asyncio.to_thread():

response = await asyncio.to_thread(
    client.chat.completions.create,
    model="glm-4",
    messages=messages,
)

Streaming calls are worse — not only the initial request but also each __next__() call on the stream iterator blocks. This requires a background thread + asyncio.Queue bridge:

stream = await asyncio.to_thread(client.chat.completions.create, ..., stream=True)

queue = asyncio.Queue()
loop = asyncio.get_running_loop()

def _iter_in_thread():
    try:
        for chunk in stream:
            loop.call_soon_threadsafe(queue.put_nowait, chunk)
    except Exception as exc:
        loop.call_soon_threadsafe(queue.put_nowait, exc)
    finally:
        loop.call_soon_threadsafe(queue.put_nowait, None)

loop.run_in_executor(None, _iter_in_thread)

while True:
    item = await queue.get()
    if item is None:
        break
    if isinstance(item, Exception):
        raise item
    yield item

This adds ~30 lines of thread-safety boilerplate per provider, increases thread pool pressure, and makes error propagation fragile.

How other SDKs solve this

All major LLM SDKs provide a native async client alongside the sync one:

SDK Sync client Async client
openai OpenAI AsyncOpenAI
anthropic Anthropic AsyncAnthropic
groq Groq AsyncGroq
mistralai Mistral AsyncMistral (via httpx.AsyncClient)
cerebras-cloud-sdk Cerebras AsyncCerebras
zai-sdk ZaiClient / ZhipuAiClient missing

Proposal

Add AsyncZaiClient and AsyncZhipuAiClient with the same API surface but using httpx.AsyncClient under the hood.

from zai import AsyncZaiClient

client = AsyncZaiClient(api_key="...")

# Non-streaming
response = await client.chat.completions.create(model="glm-4", messages=[...])

# Streaming
async for chunk in await client.chat.completions.create(model="glm-4", messages=[...], stream=True):
    print(chunk.choices[0].delta.content)

The SDK already uses httpx internally, so this should be achievable by swapping httpx.Clienthttpx.AsyncClient and making the method signatures async.

Environment

  • zai-sdk version: 0.2.2
  • Python: 3.12+
  • Use case: high-concurrency async web service with streaming SSE responses

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions