Feature request: native async client (AsyncZaiClient)

## Problem

The `zai-sdk` currently only provides synchronous clients (`ZaiClient`, `ZhipuAiClient`). When used in an async Python application (e.g. FastAPI, aiohttp), every SDK call blocks the event loop thread.

### Workarounds required today

**Non-streaming calls** must be wrapped in `asyncio.to_thread()`:

```python
response = await asyncio.to_thread(
    client.chat.completions.create,
    model="glm-4",
    messages=messages,
)
```

**Streaming calls** are worse — not only the initial request but also each `__next__()` call on the stream iterator blocks. This requires a background thread + `asyncio.Queue` bridge:

```python
stream = await asyncio.to_thread(client.chat.completions.create, ..., stream=True)

queue = asyncio.Queue()
loop = asyncio.get_running_loop()

def _iter_in_thread():
    try:
        for chunk in stream:
            loop.call_soon_threadsafe(queue.put_nowait, chunk)
    except Exception as exc:
        loop.call_soon_threadsafe(queue.put_nowait, exc)
    finally:
        loop.call_soon_threadsafe(queue.put_nowait, None)

loop.run_in_executor(None, _iter_in_thread)

while True:
    item = await queue.get()
    if item is None:
        break
    if isinstance(item, Exception):
        raise item
    yield item
```

This adds ~30 lines of thread-safety boilerplate per provider, increases thread pool pressure, and makes error propagation fragile.

### How other SDKs solve this

All major LLM SDKs provide a native async client alongside the sync one:

| SDK | Sync client | Async client |
|-----|------------|--------------|
| `openai` | `OpenAI` | `AsyncOpenAI` |
| `anthropic` | `Anthropic` | `AsyncAnthropic` |
| `groq` | `Groq` | `AsyncGroq` |
| `mistralai` | `Mistral` | `AsyncMistral` (via `httpx.AsyncClient`) |
| `cerebras-cloud-sdk` | `Cerebras` | `AsyncCerebras` |
| **`zai-sdk`** | `ZaiClient` / `ZhipuAiClient` | ❌ **missing** |

## Proposal

Add `AsyncZaiClient` and `AsyncZhipuAiClient` with the same API surface but using `httpx.AsyncClient` under the hood.

```python
from zai import AsyncZaiClient

client = AsyncZaiClient(api_key="...")

# Non-streaming
response = await client.chat.completions.create(model="glm-4", messages=[...])

# Streaming
async for chunk in await client.chat.completions.create(model="glm-4", messages=[...], stream=True):
    print(chunk.choices[0].delta.content)
```

The SDK already uses `httpx` internally, so this should be achievable by swapping `httpx.Client` → `httpx.AsyncClient` and making the method signatures `async`.

## Environment

- `zai-sdk` version: 0.2.2
- Python: 3.12+
- Use case: high-concurrency async web service with streaming SSE responses

SDK	Sync client	Async client
`openai`	`OpenAI`	`AsyncOpenAI`
`anthropic`	`Anthropic`	`AsyncAnthropic`
`groq`	`Groq`	`AsyncGroq`
`mistralai`	`Mistral`	`AsyncMistral` (via `httpx.AsyncClient`)
`cerebras-cloud-sdk`	`Cerebras`	`AsyncCerebras`
`zai-sdk`	`ZaiClient` / `ZhipuAiClient`	❌ missing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: native async client (AsyncZaiClient) #65

Problem

Workarounds required today

How other SDKs solve this

Proposal

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: native async client (AsyncZaiClient) #65

Description

Problem

Workarounds required today

How other SDKs solve this

Proposal

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions