Feature Request: Support embeddings as a native input modality in the Messages API

## Feature Request: Embeddings as Native LLM Input

### The Problem — The Lossy Round-Trip

Every retrieval-augmented system today follows the same pipeline:

```
text → embedding → vector search → retrieve text → feed to LLM
```

The embedding step captures semantic meaning in high-dimensional space (e.g., 4096 dimensions). The retrieval step then **converts it back to flat text** — discarding the geometric relationships, cluster positions, and distance signals that the vector space already computed. The LLM then re-encodes that text into its own internal representations, reconstructing what the embedding already knew.

This is a lossy round-trip. The information exists in vector form, gets serialized to text, and then gets re-vectorized internally by the model. The intermediate text step is a bottleneck — both in fidelity and in token cost.

### The Proposal — Vector Prompt Interface

**What if the Messages API accepted embedding vectors as a native input modality, alongside text and images?**

Conceptually, this would look like a new content block type:

```python
response = client.messages.create(
    model="claude-opus-4-6",
    messages=[{
        "role": "user",
        "content": [
            # Traditional text context
            {"type": "text", "text": "Given these retrieved memories, answer my question:"},
            # NEW: embedding vectors injected at the input-encoding layer
            {
                "type": "embedding",
                "vectors": [
                    {"data": [0.0234, -0.0891, ...],  "dimensions": 4096, "label": "memory_1"},
                    {"data": [0.0112, -0.0453, ...],  "dimensions": 4096, "label": "memory_2"},
                ],
                "model": "qwen3-embedding-8b",  # or voyage-3, text-embedding-3-large, etc.
                "metadata": {
                    "distances": [0.12, 0.34],  # cosine distances from query
                    "retrieval_scores": [0.95, 0.82]  # fusion scores if available
                }
            },
            {"type": "text", "text": "What was the key breakthrough last week?"}
        ]
    }]
)
```

### Why This Matters

1. **Lossless retrieval context** — Geometric relationships between retrieved items (cluster distances, traversal paths, similarity scores) arrive intact instead of being serialized to text descriptions.

2. **Token efficiency** — A 4096-dimensional embedding carries the semantic weight of thousands of tokens in a single vector. Systems with 32K+ retrievable items could provide richer context without hitting token limits.

3. **Native agent-to-agent communication** — Multi-agent systems increasingly use embeddings as inter-agent signals (e.g., binary-quantized vectors streamed via Kafka). Accepting these natively eliminates the serialization/deserialization overhead.

4. **Retrieval metadata preservation** — Fusion scores, distance metrics, graph traversal paths, and other retrieval signals could be passed directly rather than described in natural language.

### Precedent

Images proved that LLMs can process non-text modalities at the input-encoding layer. The architecture already supports multimodal input. Embeddings are semantically closer to the model's internal representations than pixels are — they're a more natural fit for this pattern.

### Real-World Use Case

We operate a multi-agent system (UCIS) with:
- **32,000+ memories** in a graph database, each with 4096d embeddings
- **6-signal retrieval fusion** (vector search, keyword, temporal, Q-value, foresight, ACT-R decay) producing ranked results with rich scoring metadata
- **Binary-quantized embedding streaming** between 12 agents via Kafka
- **3 embedding pipelines** (nightly batch, real-time streaming, on-demand)

The entire infrastructure produces rich vector representations — and then throws them away at the last mile, converting back to text for the API call. Every system doing RAG at scale has this same bottleneck.

### Summary

This is a feature request, not a research problem. The embedding infrastructure exists across the ecosystem. The multimodal input architecture exists in the model. What's missing is the API surface to connect them.

Thank you for considering this.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support embeddings as a native input modality in the Messages API #1351

Feature Request: Embeddings as Native LLM Input

The Problem — The Lossy Round-Trip

The Proposal — Vector Prompt Interface

Why This Matters

Precedent

Real-World Use Case

Summary

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature Request: Support embeddings as a native input modality in the Messages API #1351

Description

Feature Request: Embeddings as Native LLM Input

The Problem — The Lossy Round-Trip

The Proposal — Vector Prompt Interface

Why This Matters

Precedent

Real-World Use Case

Summary

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions