[Bug] DashScope native API: cache_control placed at message level instead of content block level

## Description

The explicit cache control implementation in both `DashScopeChatModel` and `OpenAIChatModel` places `cache_control` at the **message level**, but the [DashScope official documentation](https://help.aliyun.com/zh/model-studio/context-cache) requires it to be placed **inside the content block** (within the `content` array). This applies to both the DashScope native protocol and the OpenAI-compatible protocol.

## Current Behavior

When `GenerateOptions.cacheControl(true)` is enabled, both `DashScopeChatFormatter.applyCacheControl()` and `OpenAIBaseFormatter.applyCacheControl()` set `cache_control` on the message object directly (message level).

**DashScope formatter:**

```java
public void applyCacheControl(List<DashScopeMessage> messages) {
    for (DashScopeMessage msg : messages) {
        if ("system".equals(msg.getRole()) && msg.getCacheControl() == null) {
            msg.setCacheControl(EPHEMERAL_CACHE_CONTROL);  // message level
        }
    }
    DashScopeMessage lastMsg = messages.get(messages.size() - 1);
    if (lastMsg.getCacheControl() == null) {
        lastMsg.setCacheControl(EPHEMERAL_CACHE_CONTROL);  // message level
    }
}
```

**OpenAI formatter** has the same logic in `OpenAIBaseFormatter.applyCacheControl()`.

This produces the following JSON for both protocols:

```json
{
  "role": "system",
  "content": "You are a helpful assistant.",
  "cache_control": {"type": "ephemeral"}
}
```

## Expected Behavior

Per the [official documentation](https://help.aliyun.com/zh/model-studio/context-cache), `cache_control` must be placed **inside a content block**, and `content` must be in **array format**:

```json
{
  "role": "system",
  "content": [
    {
      "type": "text",
      "text": "You are a helpful assistant.",
      "cache_control": {"type": "ephemeral"}
    }
  ]
}
```

The documentation states:

> 需将 content 字段改为数组形式，并添加 cache_control 字段

This format requirement applies to both OpenAI-compatible and DashScope native protocols when calling DashScope models.

## Issues Identified

1. **`cache_control` is placed at the wrong level** — should be inside content blocks, not at message level. This affects both `DashScopeChatFormatter` and `OpenAIBaseFormatter`.
2. **Content part DTOs lack a `cache_control` field** — both `DashScopeContentPart` and `OpenAIMessage` content parts have no way to carry `cache_control` at the content block level.
3. **Multimodal messages are not handled** — when content is already a list of content parts, the `cache_control` still goes to message level and won't be recognized by the API.
4. **No guard for the 4-marker limit** — the documentation states a maximum of 4 `cache_control` markers per request. If there are multiple system messages (e.g., injected by SkillHook or LongTermMemoryHook), the limit may be exceeded silently.

## Suggested Fix

### DashScope native protocol

1. Add a `cache_control` field (`Map<String, String>`) to `DashScopeContentPart`.
2. Modify `DashScopeChatFormatter.applyCacheControl()` to:
   - Convert string `content` to array format (`List<DashScopeContentPart>`) for target messages.
   - Set `cache_control` on the **last content block** within each target message.
3. Apply the same fix to `DashScopeMultiAgentFormatter.applyCacheControl()`.

### OpenAI-compatible protocol

1. Modify `OpenAIBaseFormatter.applyCacheControl()` with the same content-block-level approach.
2. Add `cache_control` support to OpenAI content part DTOs.

### Common

1. Add a guard to ensure no more than 4 `cache_control` markers per request.
2. Keep the existing message-level `cacheControl` fields for backward compatibility (manual metadata marking).

## Affected Classes

- `DashScopeChatFormatter`
- `DashScopeMultiAgentFormatter`
- `DashScopeContentPart`
- `DashScopeMessage`
- `OpenAIBaseFormatter`
- `OpenAIMessage`

## Discussion: Should `applyCacheControl()` auto-mark messages?

The current `applyCacheControl()` strategy is: **"all system messages + last message"**. Given DashScope's prefix-matching caching mechanism, the strategy pattern itself is sound in theory:

- **Marking system messages** creates layered prefix cache blocks (A, AB, ABC…). Even if later messages change, the shorter prefix (e.g., just the stable system prompt) can still be hit — a reasonable tiered caching approach.
- **Marking the last message** caches the entire messages array as a complete prefix, which aligns with the official "continuous multi-turn dialog" pattern.

However, there are two practical concerns:

### 1. The 4-marker limit

The API enforces a hard limit of **4 `cache_control` markers per request**. If more than 4 markers are present, only the last 4 take effect. In AgentScope, multiple hooks can dynamically inject system messages (e.g., SkillHook, LongTermMemoryHook, RAGHook), making the number of system messages unpredictable at the formatter level. When the total marker count exceeds 4, the earliest system messages — which are typically the most stable and most valuable to cache — will lose their markers and fall out of the cache.

### 2. Dynamic content defeats prefix caching

The framework cannot distinguish between **stable** system messages (e.g., the user's own system prompt) and **dynamic** ones (e.g., RAG-retrieved knowledge, long-term memory summaries). In AgentScope's hook architecture, hooks like `GenericRAGHook` and `StaticLongTermMemoryHook` inject system messages whose **content changes on every request**. Marking these with `cache_control` means each request creates a new cache block (at 125% of standard input cost) that will likely **never be hit** — the prefix changes every time.

Only the user knows which parts of their messages are stable and worth caching. A blanket "mark all system messages" strategy applied at the formatter level cannot make this distinction.

### Suggestion

Consider making cache control **user-driven** rather than automatic:

- The existing `MessageMetadataKeys.CACHE_CONTROL` mechanism already allows users to mark individual `Msg` objects for caching via metadata, which flows through `applyCacheControlFromMetadata()`.
- The automatic `applyCacheControl()` strategy could be **removed or made opt-in**, letting users who understand their caching needs and cost tolerance decide which messages to mark.
- If keeping an automatic strategy, add a guard to enforce the 4-marker limit and prioritize stable prefixes (first system message + last message).

## References

- DashScope Context Cache documentation: https://help.aliyun.com/zh/model-studio/context-cache
- DashScope Java SDK correct usage: `MessageContentText.builder().text(...).cacheControl(CacheControl.builder().type("ephemeral").build()).build()`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] DashScope native API: cache_control placed at message level instead of content block level #1363

Description

Current Behavior

Expected Behavior

Issues Identified

Suggested Fix

DashScope native protocol

OpenAI-compatible protocol

Common

Affected Classes

Discussion: Should `applyCacheControl()` auto-mark messages?

1. The 4-marker limit

2. Dynamic content defeats prefix caching

Suggestion

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] DashScope native API: cache_control placed at message level instead of content block level #1363

Description

Description

Current Behavior

Expected Behavior

Issues Identified

Suggested Fix

DashScope native protocol

OpenAI-compatible protocol

Common

Affected Classes

Discussion: Should applyCacheControl() auto-mark messages?

1. The 4-marker limit

2. Dynamic content defeats prefix caching

Suggestion

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Discussion: Should `applyCacheControl()` auto-mark messages?