feat/deepseek-v4-tokenizer by ADX15xs · Pull Request #981 · esengine/DeepSeek-Reasonix

ADX15xs · 2026-05-15T20:16:37Z

What

Upgrade the tokenizer from DeepSeek V3 to V4 chat template format, update pricing constants, and adjust context-manager fold ordering to use content-only token counts.

Why

DeepSeek V4 uses a different chat template (DSML tool-call framing, \u1f60 generation suffix, merged tool-result blocks). The V3-only tokenizer produces inaccurate prompt_tokens estimates for V4 models (deepseek-v4-flash, deepseek-v4-pro), causing wrong context-window calculations and premature or missed auto-folding.

How to verify

npm run verify — lint, typecheck, tests, comment-policy all pass
npx tsx src/tokenizer.ts (if there's a smoke entry) or run reasonix chat with a V4 model and observe correct token estimates

Checklist

npm run verify passes locally (lint + typecheck + tests + comment-policy gate)
No Co-Authored-By: Claude trailer in commits
Comments follow CONTRIBUTING.md (no module-essay headers, no incident history)
No edits to CHANGELOG.md — release notes are maintainer-written at release time

Reference:
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/tokenizer.json
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/encoding/encoding_dsv4.py

Closes #982

This PR conflicts with #978 at the official price value in src/telemetry/stats.ts, but it is easy to resolve.
It is recommended to merge his PR first, and I will take charge of rebasing and resolving the conflicts.
Alternatively, if @esengine has other ideas, we can realign our plans.

esengine · 2026-05-16T00:03:19Z

Thanks! #978 is in — go ahead and rebase whenever you're ready, the conflict in src/telemetry/stats.ts should be trivial (keep the new constants from main). The tokenizer work itself looks great at a glance; I'll do a proper pass once the rebase lands.

- Update tokenizer data to use DeepSeek V4 vocabulary - Update token IDs for `<think>` and `</think>` special tokens - Update test suites to reflect V4 tokenization behavior and CJK compression characteristics - Tokenizer From: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/tokenizer.json

- Reference: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/encoding/encoding_dsv4.py - Implement `formatDeepSeekPrompt` to include BOS, EOS, and role-specific separators (`<｜User｜>`, `<｜Assistant｜>`) - Add support for DeepSeek Machine Learning Language (DSML) tool-calling syntax in prompt estimation - Update `estimateRequestTokens` to account for chat template framing and tool schema overhead - Refactor `ContextManager` to use more accurate token counting for message folding logic

- Update `V4Message` interface to include `_textParts` for better content tracking - Improve `mergeToolMessages` to handle both text parts and tool blocks during message folding - Simplify JSDoc comments for better readability - Fix type casting in message merging to ensure consistency with new internal properties

ADX15xs · 2026-05-16T03:26:56Z

已执行变基，冲突部分完全遵照main分支

ADX15xs changed the title ~~Feat/deepseek v4 tokenizer~~ feat/deepseek v4 tokenizer May 15, 2026

ADX15xs changed the title ~~feat/deepseek v4 tokenizer~~ feat/deepseek-v4-tokenizer May 15, 2026

This was referenced May 15, 2026

deepseek-v4-tokenizer #982

Closed

修复 DeepSeek 计费价格与新会话花费重置 #978

Merged

ADX15xs added 3 commits May 16, 2026 11:15

ADX15xs force-pushed the feat/deepseek-v4-tokenizer branch from 330390f to 0fe0bed Compare May 16, 2026 03:19

esengine merged commit 30bdc51 into esengine:main May 16, 2026
5 checks passed

ADX15xs deleted the feat/deepseek-v4-tokenizer branch May 16, 2026 04:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/deepseek-v4-tokenizer#981

feat/deepseek-v4-tokenizer#981
esengine merged 3 commits into
esengine:mainfrom
ADX15xs:feat/deepseek-v4-tokenizer

ADX15xs commented May 15, 2026 •

edited

Loading

Uh oh!

esengine commented May 16, 2026

Uh oh!

ADX15xs commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ADX15xs commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How to verify

Checklist

Uh oh!

esengine commented May 16, 2026

Uh oh!

ADX15xs commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ADX15xs commented May 15, 2026 •

edited

Loading