Is Aggregate Key table suitable for OLTP-like middle state lookup/update workload? #63405

liuanxin · 2026-05-19T10:19:27Z

liuanxin
May 19, 2026

Aggregate Key table becomes bottleneck for point lookup and frequent small writes at ~30M rows

Problem

We are using Apache Doris as a statistics/analytics store. One of our intermediate state tables is an Aggregate Key table, but it has become a severe bottleneck for point queries and frequent small writes.

The table is used as a middle state table for user statistics. The logical key is (app_id, user_id). The workload looks like OLTP-style state lookup/update, but the table was originally created in Doris because the downstream statistics are all in Doris.

Recently, the table grew to about 30 million rows. After that, simple point queries and small insert/delete operations became very slow. Some single-row operations took 2-7 seconds, which caused our Kafka consumer to exceed max.poll.interval.ms and repeatedly rebalance.

Environment

Doris version: 2.1.10
Cluster: 1 FE, 1 BE
Server spec: 32C 64G
Table model: Aggregate Key
Row count: about 30 million
Key columns: (app_id, user_id)
Query pattern: point lookup by key
Write pattern: frequent small insert/delete/insert-select

Table Usage

The table stores user-level middle state, for example:

create table middle_user (
    app_id varchar(40),
    user_id varchar(64),
    first_login_time datetimev2 min,
    last_login_time datetimev2 max,
    first_role_time datetimev2 min,
    first_pay_time datetimev2 min,
    last_pay_time datetimev2 max
)
aggregate key(app_id, user_id)
distributed by hash(app_id) buckets auto
properties (
    "estimate_partition_size" = "2G",
    "replication_allocation" = "tag.location.default: 1"
);

The typical query pattern is:

select ...
from middle_user
where app_id = ?
and user_id = ?

For a special business rule, when a user logs in again after more than 30 days, we need to treat the login as a new user. The old row is backed up with a new user_id suffix, then the original row is deleted before writing the new login event.

Example:

insert into middle_user(
  app_id, user_id, first_login_time, last_login_time,
  first_role_time, first_pay_time, last_pay_time
)
select
  app_id,
  concat(user_id, '_', date_format(curdate(), '%Y%m%d')),
  first_login_time,
  last_login_time,
  first_role_time,
  first_pay_time,
  last_pay_time
from middle_user
where app_id = ?
and user_id = ?;

delete from middle_user
where app_id = ?
and user_id = ?

Symptoms

After the table reached around 30 million rows:

Point query by (app_id, user_id) became slow.
Small insert into ... select ... where app_id = ? and user_id = ? became slow.
Frequent small writes caused tablet version count errors:

tablet writer write failed
failed to init rowset builder
version count: 4001, exceed limit: 4000
Please reduce the frequency of loading data or adjust the max_tablet_version_num in be.conf to a larger value.

Kafka consumers were blocked by Doris operations and started rebalancing:

CommitFailedException: Commit cannot be completed since the group has already rebalanced

Question

Is this expected for an Aggregate Key table in Doris?

For this type of table:

around 30 million rows
point lookup by (app_id, user_id)
frequent small insert/delete/update-like operations
used as intermediate state for statistics

What is the recommended Doris table model and design?

Should this type of state table not be stored in Doris at all, and instead be stored in MySQL/OLTP storage, while Doris only handles analytical/statistical aggregation?

If Doris can support this pattern, what table design, bucket strategy, compaction settings, or write pattern should we use?

What we tried

We moved this middle state table to MySQL and only kept final analytical/statistical processing in Doris. After moving the point lookup/state update part to MySQL, the Kafka lag disappeared.

This suggests the bottleneck is specifically the Doris middle state table workload, not Kafka or application logic.

Aggregate Key 表在 3000 万行左右的点查和频繁小写入场景下成为瓶颈

问题描述

我们现在把 Apache Doris 用作统计分析库, 其中有一张中间状态表是 Aggregate Key 表. 这张表主要用于用户统计的中间状态维护, 逻辑主键是 (app_id, user_id).

这个访问模式比较接近 OLTP 风格的状态查询和状态更新. 当初把这张表放在 Doris 里, 是因为后续统计分析也都在 Doris 中完成.

最近这张表增长到大约 3000 万行后, 出现了比较严重的性能问题. 一些简单的点查和小写入操作耗时达到了 2-7 秒, 导致应用侧 Kafka consumer 超过 max.poll.interval.ms, 然后 consumer group 反复 rebalance.

环境信息

Doris 版本: 2.1.10
集群规模: 1 FE, 1 BE
服务器配置: 32C 64G
表模型: Aggregate Key
数据量: 约 3000 万行
逻辑 key: (app_id, user_id)
查询模式: 按 key 点查
写入模式: 频繁小批量 insert/delete/insert-select

表使用方式

这张表存储用户级别的中间状态, 字段大致包括:

create table middle_user (
    app_id varchar(40),
    user_id varchar(64),
    first_login_time datetimev2 min,
    last_login_time datetimev2 max,
    first_role_time datetimev2 min,
    first_pay_time datetimev2 min,
    last_pay_time datetimev2 max
)
aggregate key(app_id, user_id)
distributed by hash(app_id) buckets auto
properties (
    "estimate_partition_size" = "2G",
    "replication_allocation" = "tag.location.default: 1"
);

典型查询是按 (app_id, user_id) 做点查:

select ...
from middle_user
where app_id = ?
and user_id = ?

另外有一个特殊业务逻辑: 如果用户超过 30 天再次登录, 需要把这次登录当成新用户. 当前做法是先把原来的 user_id 追加当天日期后备份一条记录, 再删除原来的记录, 然后再写入新的登录事件.

示例:

insert into middle_user(
  app_id, user_id, first_login_time, last_login_time,
  first_role_time, first_pay_time, last_pay_time
)
select
  app_id,
  concat(user_id, '_', date_format(curdate(), '%Y%m%d')),
  first_login_time,
  last_login_time,
  first_role_time,
  first_pay_time,
  last_pay_time
from middle_user
where app_id = ?
and user_id = ?;

-- cpu 被打完的问题极大可能应该就出在这条 delete 语句上, 上面的 insert into 应该也是

delete from middle_user
where app_id = ?
and user_id = ?

现象

当这张表增长到大约 3000 万行后, 出现了以下问题:

按 (app_id, user_id) 的点查变慢.
单条 insert into ... select ... where app_id = ? and user_id = ? 变慢.
频繁小写入后, Doris 出现 tablet version count 超限错误:

tablet writer write failed
failed to init rowset builder
version count: 4001, exceed limit: 4000
Please reduce the frequency of loading data or adjust the max_tablet_version_num in be.conf to a larger value.

应用侧 Kafka consumer 因为等待 Doris 操作太久, 触发 rebalance:

CommitFailedException: Commit cannot be completed since the group has already rebalanced

想请教的问题

Aggregate Key 表出现这种情况是符合预期的吗?

对于这种场景:

约 3000 万行
按 (app_id, user_id) 点查
频繁小批量 insert/delete/update-like 操作
作为统计系统的中间状态表

Doris 推荐的表模型和设计方式是什么?

这种中间状态数据是否本来就不应该放在 Doris 中, 而应该放在 MySQL 这类 OLTP 存储里, Doris 只负责最终统计分析?

如果 Doris 可以支持这种场景, 应该如何设计表结构, 分桶策略, compaction 参数, 或写入方式?

我们尝试过的处理

我们后来把这类中间状态表迁移到了 MySQL, Doris 只保留最终统计分析类数据. 迁移后 Kafka 堆积很快消失.

这说明瓶颈主要在 Doris 中间状态表的访问模式上, 不是 Kafka 或应用逻辑本身.

liuanxin · 2026-05-19T10:34:21Z

liuanxin
May 19, 2026
Author

During the incident, BE CPU usage was also very high. The machine is 32C 64G, but the CPU stayed close to 100% for a long time.

We later checked the BE runtime config and found that the compaction related settings had been changed to very conservative values:

max_base_compaction_threads=1
max_cumu_compaction_threads=1
total_permits_for_compaction_score=10
max_tablet_version_num=4000
disable_auto_compaction=0

At that time, frequent small writes caused tablet versions to accumulate quickly, and Doris started rejecting writes with:

version count: 4001, exceed limit: 4000

We then adjusted the runtime BE config without restarting Doris:

curl -X POST 'http://be_host:8040/api/update_config?max_cumu_compaction_threads=8'
curl -X POST 'http://be_host:8040/api/update_config?max_base_compaction_threads=4'
curl -X POST 'http://be_host:8040/api/update_config?total_permits_for_compaction_score=20'
curl -X POST 'http://be_host:8040/api/update_config?max_tablet_version_num=10000'

After this, the failed writes started to recover, but the BE load was still very high. Example metrics:

doris_be_compaction_used_permits 21
doris_be_tablet_base_max_compaction_score 3976
doris_be_tablet_cumulative_max_compaction_score 539
doris_be_load_average{mode="1_minutes"} 123.15
doris_be_load_average{mode="5_minutes"} 112.56
doris_be_load_average{mode="15_minutes"} 103.94
doris_be_disks_compaction_num{path="..."} 1

So the full picture is:

The table had about 30 million rows.
The workload had frequent point lookups and frequent small writes/deletes.
Tablet versions accumulated faster than compaction could catch up.
CPU/load became very high.
Doris started rejecting writes due to max_tablet_version_num.
Increasing compaction threads and max_tablet_version_num helped the system recover, but the workload still seems unsuitable for this kind of intermediate state table.

Questions:

For a single BE machine with 32C 64G, what are the recommended values for max_cumu_compaction_threads, max_base_compaction_threads, and total_permits_for_compaction_score?
Is max_tablet_version_num=10000 safe as a temporary mitigation, or does it only hide the real compaction problem?
For this workload, should we avoid frequent small writes to Doris entirely and move the state table to an OLTP store?

补充一些现场信息:

事故期间, BE 的 CPU 也一直很高. 这台机器是 32C 64G, 但是 CPU 长时间接近 100%. 始终掉不下去

后面我们检查 BE 运行时配置, 发现 compaction 相关配置被调得比较保守:

max_base_compaction_threads=1
max_cumu_compaction_threads=1
total_permits_for_compaction_score=10
max_tablet_version_num=4000
disable_auto_compaction=0

当时频繁小写入导致 tablet version 快速堆积, Doris 开始拒绝写入:

version count: 4001, exceed limit: 4000

随后我们在不重启 Doris 的情况下动态调整了 BE 配置:

curl -X POST 'http://be_host:8040/api/update_config?max_cumu_compaction_threads=8'
curl -X POST 'http://be_host:8040/api/update_config?max_base_compaction_threads=4'
curl -X POST 'http://be_host:8040/api/update_config?total_permits_for_compaction_score=20'
curl -X POST 'http://be_host:8040/api/update_config?max_tablet_version_num=10000'

调整后, 失败写入开始恢复, 但是 BE 的负载依然很高. 当时 metrics 大致如下:

doris_be_compaction_used_permits 21
doris_be_tablet_base_max_compaction_score 3976
doris_be_tablet_cumulative_max_compaction_score 539
doris_be_load_average{mode="1_minutes"} 123.15
doris_be_load_average{mode="5_minutes"} 112.56
doris_be_load_average{mode="15_minutes"} 103.94
doris_be_disks_compaction_num{path="..."} 1

所以完整情况是:

表数据量约 3000 万行.
访问模式是频繁点查和频繁小写入/小删除.
tablet version 堆积速度超过了 compaction 处理速度.
CPU/load 变得很高.
Doris 因为 max_tablet_version_num 拒绝写入.
调大 compaction 线程和 max_tablet_version_num 后系统开始恢复, 但这个 workload 看起来仍然不太适合作为 Doris 中的中间状态表.

想补充请教:

单 BE, 32C 64G 的机器, max_cumu_compaction_threads, max_base_compaction_threads, total_permits_for_compaction_score 推荐怎么配置?
max_tablet_version_num=10000 作为临时止血是否安全? 还是只是掩盖 compaction 跟不上的问题?
对于这种 workload, 是否应该完全避免频繁小写入 Doris, 把状态表迁移到 MySQL 这类 OLTP 存储?

0 replies

liuanxin · 2026-05-19T12:52:05Z

liuanxin
May 19, 2026
Author

codex 给的建议是:

这次不是业务量本身大, 是访问模式正好踩中 Doris 的弱点:

单 BE
Aggregate Key 大表
热点 app_id
distributed by hash(app_id)
实时单条 insert-select
实时 delete
同一个 tablet 版本数暴涨
compaction 追不上
CPU 被 compaction 和查询打满

所以看起来是“几百条小业务”, 实际对 Doris 是“持续往同一个 tablet 制造小 rowset/version + delete bitmap”.

Doris 适合的是:

批量写入
追加写入
分散 tablet
离线/准实时聚合
少量大事务

这次链路是:

一条业务消息 -> 点查/insert-select/delete/再 insert

这就更像 OLTP 状态库, 不是 OLAP 引擎舒服的模式.

现在结论可以定下来:

Doris 不要再承接实时状态变更.
middle_user 这种状态表要么迁 MySQL, 要么 Doris 只接受批量异步同步.
如果继续留 Doris, 必须改 distributed by hash(app_id, user_id), 并且避免单条实时 delete.

这不是你业务设计离谱, 是 Doris 在这个使用方式下容错空间确实很小. 以后所有 Doris 写入都按一个原则审:

能批量, 不单条.
能 append, 不 delete.
能分散 key, 不压单 app_id.
能离线补, 不阻塞 Kafka consumer.

想不出来有什么办法可以缓解, cpu 几天了, 一直在 90% 以上下不来, 内存也一直在 70% ~ 85% 之间

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is Aggregate Key table suitable for OLTP-like middle state lookup/update workload? #63405

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is Aggregate Key table suitable for OLTP-like middle state lookup/update workload? #63405

Uh oh!

Uh oh!

liuanxin May 19, 2026

Aggregate Key table becomes bottleneck for point lookup and frequent small writes at ~30M rows

Problem

Environment

Table Usage

Symptoms

Question

What we tried

Aggregate Key 表在 3000 万行左右的点查和频繁小写入场景下成为瓶颈

问题描述

环境信息

表使用方式

现象

想请教的问题

我们尝试过的处理

Replies: 2 comments

Uh oh!

Uh oh!

liuanxin May 19, 2026 Author

Uh oh!

liuanxin May 19, 2026 Author

liuanxin
May 19, 2026

liuanxin
May 19, 2026
Author

liuanxin
May 19, 2026
Author