Skip to content

[BUG] Timestamp column nullable type mismatch causes coredump in RowBinlogSegmentWriter::_fill_binlog_columns #64494

@heguanhui

Description

@heguanhui

Search before asking

  • I had searched in the issues and found no similar issues.

Version

master

What's Wrong?

Issue 标题
text
[BUG] Timestamp column nullable type mismatch causes coredump in RowBinlogSegmentWriter::_fill_binlog_columns
Issue 正文
markdown

Problem Description

After fixing the binlog column index issue (PR #xxxxx), another coredump occurs when running GroupRowsetWriterTest.sub_writer_rollback:
[ RUN ] GroupRowsetWriterTest.sub_writer_rollback
F20260614 02:56:27.868108 376 status.h:472] Bad cast from type:doris::ColumnVector<(doris::PrimitiveType)6>* to doris::ColumnNullable*
*** Check failure stack trace: ***
@ 0x5566859035cf google::LogMessage::SendToLog()
@ 0x5566858f9be0 google::LogMessage::Flush()
@ 0x5566858fd2d9 google::LogMessageFatal::~LogMessageFatal()
@ 0x5566640864af doris::Status::FatalError<>()
@ 0x55666d9a69b7 ZZ11assert_castIPN5doris14ColumnNullableEL18TypeCheckOnRelease1ERPNS0_7IColumnEET_OT1_ENKUlOS7_E_clIS6_EES2_SA
@ 0x55666d9a5e75 assert_cast<>()
@ 0x55668225e916 doris::segment_v2::RowBinlogSegmentWriter::_fill_binlog_columns()
@ 0x55668225a39a doris::segment_v2::RowBinlogSegmentWriter::append_block()
@ 0x556681d53a74 doris::SegmentFlusher::_add_rows()
@ 0x556681d4e445 doris::SegmentFlusher::flush_single_block()
@ 0x556681d5742a doris::SegmentCreator::flush_single_block()
@ 0x556681c9950c doris::SegmentCreator::flush_single_block()
@ 0x556681c59c5f doris::BaseBetaRowsetWriter::flush_single_block()
@ 0x556681cae1bb doris::GroupRowsetWriter::flush_single_block()
@ 0x55666c293f62 doris::GroupRowsetWriterTest_sub_writer_rollback_Test::TestBody()

text

Steps to Reproduce

cd be/ut_build_dir
./run-be-ut.sh --run GroupRowsetWriterTest.sub_writer_rollback
Root Cause
In RowBinlogSegmentWriter::_fill_binlog_columns(), the timestamp column (__DORIS_BINLOG_TIMESTAMP__) is cast to ColumnNullable* using assert_cast:

cpp
IColumn* ts_col_ptr = binlog_prefix_columns[2].get();
auto* ts_nullable_column = assert_cast<ColumnNullable*>(ts_col_ptr);
ts_nullable_column->insert_many_defaults(num_rows);
However, in the unit test environment, the timestamp column is not wrapped as ColumnNullable (it's a plain ColumnVector), causing the assert_cast to fail.

Expected Behavior
The timestamp column should always be nullable by design. But in the current unit test setup, it is not. The code should handle both cases gracefully or the test environment should be fixed to match production behavior.

Proposed Fix
Use check_and_get_column to safely handle both nullable and non-nullable cases:

cpp
IColumn* ts_col_ptr = binlog_prefix_columns[2].get();
auto* ts_nullable_column = check_and_get_column<ColumnNullable>(ts_col_ptr);
if (ts_nullable_column != nullptr) {
    ts_nullable_column->insert_many_defaults(num_rows);
} else {
    ts_col_ptr->insert_many_defaults(num_rows);
}
Additional Context
This issue was discovered while testing PR #xxxxx (binlog column index fix)

The timestamp column is defined as nullable in the schema

The problem only occurs in unit tests, not in production

Environment
Doris version: master branch

Build with ASAN enabled
### Labels
type/bug

component/be

feature/binlog

UT

### What You Expected?

ut executed successfully

### How to Reproduce?

_No response_

### Anything Else?

_No response_

### Are you willing to submit PR?

- [x] Yes I am willing to submit a PR!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions