400 on /api/embed: db_schema embedding receives asyncio Task repr instead of chunk text (Ollama embedder)

## Summary

Model deploy fails during **semantics preparation** with `400 Bad Request` on `http://.../api/embed`. The root cause is that the **embedding pipeline passes an asyncio `Task` object** (or its string representation) as the `chunk` input to the embedder instead of the actual chunk text. Ollama's `/api/embed` endpoint rejects this invalid input and returns 400.

## Environment

- **Wren AI version:** WREN_AI_SERVICE_VERSION=0.29.0, WREN_UI_VERSION=0.32.2, WREN_ENGINE_VERSION=0.22.0
- **Deployment:** Self-hosted via Docker Compose on ARM64 (DGX Spark)
- **Embedder:** Ollama `nomic-embed-text:latest` (api_base: Ollama container on port 11434)
- **LLM:** Ollama `qwen2.5-coder:7b`
- **Data source:** MySQL; MDL includes multiple tables and **many relationships** between them

## When it appears

- Deploy **succeeded** earlier with a **simpler** MDL (fewer or no relationships).
- After defining **all relationships** between tables in the model, deploy started failing every time.
- So the bug appears when the schema is **more complex** (many relationships → more chunks / different async scheduling in the pipeline).

## Steps to reproduce

1. Create an MDL with several tables and **many relationships** (e.g. 20+ tables, 30+ relationships).
2. Configure Wren AI with Ollama as default embedder (e.g. `ollama/nomic-embed-text:latest`) and Ollama as default LLM.
3. Click **Deploy** on the model.
4. Semantics preparation starts, then fails; in `wren-ai-service` logs you see 400 on `/api/embed` and the Node inputs show `chunk` as a Task repr.

## Expected vs actual

- **Expected:** The pipeline passes **chunk text** (schema/DDL content) to the embedder. Ollama returns 200 and indexing continues.
- **Actual:** The pipeline passes a **string representation of an asyncio Task** (e.g. `"<Task finished name='Task-1479' coro=<AsyncGraphAd...") as the chunk. Ollama returns 400 Bad Request. Deploy fails with "Failed to prepare semantics".

## Relevant log excerpt

```
********************************************************************************
> embedding [src.pipelines.indexing.db_schema.embedding()] encountered an error<
> Node inputs:
{'chunk': "<Task finished name='Task-1479' coro=<AsyncGraphAd...",
 'embedder': '<src.providers.embedder.litellm.AsyncDocumentEmbed...'}
********************************************************************************
Traceback (most recent call last):
  File "/app/.venv/lib/python3.12/site-packages/litellm/main.py", line 3882, in aembedding
    response = await init_response  # type: ignore
  ...
  File "/app/.venv/lib/python3.12/site-packages/httpx/_models.py", line 763, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'http://kitsune_ollama:11434/api/embed'
...
  File "/src/pipelines/indexing/db_schema.py", line 318, in embedding
    return await embedder.run(documents=chunk["documents"])
...
E0228 13:21:03.276 7 wren-ai-service:100] Failed to prepare semantics: litellm.APIConnectionError: OllamaException - Client error '400 Bad Request' for url 'http://kitsune_ollama:11434/api/embed'
```

So the **input** to the embedding node is wrong: `chunk` is the string repr of a Task, not the actual documents/chunk text. This likely comes from an upstream step in the Hamilton graph returning or passing a Task without awaiting it.

## Config (relevant parts only)

**Embedder (default):**
```yaml
  - alias: default
    api_base: http://kitsune_ollama:11434
    model: ollama/nomic-embed-text:latest
    timeout: 300
    kwargs:
      num_gpu: -1
```

**LLM (default):**
```yaml
  - alias: default
    api_base: http://kitsune_ollama:11434
    model: ollama_chat/qwen2.5-coder:7b
    timeout: 900
```

## Additional context

- Ollama is reachable and works for chat and for embeddings when called directly with valid text.
- The failure is reproducible with the same MDL every time.
- Workaround used so far: none that fixes the root cause; reducing model complexity (fewer relationships) might avoid the failing code path but is not a real fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

400 on /api/embed: db_schema embedding receives asyncio Task repr instead of chunk text (Ollama embedder) #2138

Summary

Environment

When it appears

Steps to reproduce

Expected vs actual

Relevant log excerpt

Config (relevant parts only)

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

400 on /api/embed: db_schema embedding receives asyncio Task repr instead of chunk text (Ollama embedder) #2138

Description

Summary

Environment

When it appears

Steps to reproduce

Expected vs actual

Relevant log excerpt

Config (relevant parts only)

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions