[opt](txn) Per-transaction locking and parallel publish for DatabaseTransactionMgr#59990
[opt](txn) Per-transaction locking and parallel publish for DatabaseTransactionMgr#59990dataroaring wants to merge 3 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This pull request introduces two major changes: (1) reduces lock contention in DatabaseTransactionMgr by moving edit log operations outside write locks, and (2) adds support for flexible partial update mode in routine load operations.
Changes:
- Refactored
DatabaseTransactionMgrto split in-memory state updates from edit log persistence for reduced lock contention - Added comprehensive support for
unique_key_update_modeproperty with newUPDATE_FLEXIBLE_COLUMNSmode - Introduced validation framework for flexible partial update constraints in
OlapTable
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| fe/fe-core/src/main/java/org/apache/doris/transaction/DatabaseTransactionMgr.java | Core refactoring: splits unprotectUpsertTransactionState into unprotectUpdateInMemoryState (inside lock) and persistTransactionState (outside lock); applies pattern to transaction lifecycle methods |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/CreateRoutineLoadInfo.java | Adds parsing, validation, and property management for unique_key_update_mode with backward compatibility for partial_columns |
| fe/fe-core/src/main/java/org/apache/doris/load/routineload/RoutineLoadJob.java | Updates job properties handling to support new update mode; adds validation logic for ALTER operations |
| fe/fe-core/src/main/java/org/apache/doris/catalog/OlapTable.java | Adds validateForFlexiblePartialUpdate() method to centralize table-level constraint validation |
| fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsStreamLoadPlanner.java | Refactors flexible partial update validation to use centralized OlapTable method |
| fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsRoutineLoadTaskInfo.java | Updates constructor to accept TUniqueKeyUpdateMode instead of boolean isPartialUpdate flag |
| fe/fe-core/src/main/java/org/apache/doris/load/routineload/KafkaRoutineLoadJob.java | Updates method signature and calls to support UserException for ALTER operations |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/AlterRoutineLoadCommand.java | Adds validation for unique_key_update_mode property during ALTER ROUTINE LOAD |
| regression-test/suites/load_p0/routine_load/test_routine_load_flexible_partial_update.groovy | Comprehensive 1414-line test suite with 21 test cases covering flexible partial update feature |
| regression-test/data/load_p0/routine_load/test_routine_load_flexible_partial_update.out | Expected test outputs for flexible partial update test cases |
| fe/fe-core/src/test/java/org/apache/doris/load/routineload/RoutineLoadJobTest.java | Unit tests for unique_key_update_mode parsing, validation, and backward compatibility |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| writeUnlock(); | ||
| } | ||
| // Persist edit log outside lock to reduce lock contention | ||
| persistTransactionState(transactionState); |
There was a problem hiding this comment.
The persistTransactionState method is called outside the write lock with a transactionState variable that could potentially be null if an exception is thrown inside the try block before line 372. If any of the checks before line 372 throw an exception (such as checkRunningTxnExceedLimit at line 369), transactionState will remain null and calling persistTransactionState(transactionState) at line 384 would result in a NullPointerException. Consider adding a null check before calling persistTransactionState, or wrapping the call in a condition that ensures transactionState is not null.
| persistTransactionState(transactionState); | |
| if (transactionState != null) { | |
| persistTransactionState(transactionState); | |
| } |
| ( | ||
| "max_batch_interval" = "10", | ||
| "format" = "json", | ||
| "jsonpaths" = '["\\$.id", "\\$.name", "\\$.score"]', |
There was a problem hiding this comment.
Inconsistent string escaping for jsonpaths property. Line 276 uses single quotes with backslash escaping '["\$.id", ...]', while line 982 uses double quotes with triple backslash escaping "[\"\$.id\", ...]". Both tests should use the same escaping pattern for consistency and readability. Consider standardizing on one approach throughout the test file.
| "jsonpaths" = '["\\$.id", "\\$.name", "\\$.score"]', | |
| "jsonpaths" = "[\\\"\\$.id\\", \\\"\\$.name\\", \\\"\\$.score\\"]", |
TPC-H: Total hot run time: 31451 ms |
TPC-DS: Total hot run time: 174418 ms |
ClickBench: Total hot run time: 27.05 s |
FE UT Coverage ReportIncrement line coverage |
Move edit log operations outside the write lock to improve transaction throughput for concurrent operations on different tables within the same database. Changes: - Add unprotectUpdateInMemoryState() for in-memory updates inside lock - Add persistTransactionState() for edit log writes outside lock - Refactor beginTransaction, commitTransaction, finishTransaction, abortTransaction, and removeUselessTxns to use the new pattern - Refactor unprotectUpsertTransactionState to delegate to new methods This reduces lock hold time from milliseconds (I/O bound) to microseconds (memory only), enabling higher concurrency for multi-table workloads. Safety is maintained because: - In-memory state is updated atomically within the write lock - Edit log failures call System.exit(-1), preventing inconsistent state - Replay path remains unchanged (uses isReplay=true) Fixes: apache#53642
96357c8 to
a105828
Compare
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. Reviewed the lock contention optimization in 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
2d07df9 to
10ed461
Compare
|
run buildall |
TPC-H: Total hot run time: 29843 ms |
ClickBench: Total hot run time: 28.06 s |
FE Regression Coverage ReportIncrement line coverage |
d4fe88e to
24e0b10
Compare
|
run buildall |
cda43c3 to
d4a38b8
Compare
|
run buildall |
TPC-H: Total hot run time: 30683 ms |
d4a38b8 to
44fbda7
Compare
|
run buildall |
44fbda7 to
859f56d
Compare
ClickBench: Total hot run time: 28.08 s |
FE UT Coverage ReportIncrement line coverage |
…ock optimization Add enable_txn_log_outside_lock config (default true) to control whether transaction edit log writes happen inside or outside the write lock. When enabled, edit log entries are enqueued (FIFO) inside the write lock and awaited outside it, preserving ordering via the batch queue while reducing lock hold time. This resolves the potential out-of-order edit log issue from the previous outside-lock optimization. - Add submitEdit() split API to EditLog for enqueue-only writes - Make EditLogItem public with await() method - Add enqueueTransactionState/awaitTransactionState helpers - Update all 8 persist call sites in DatabaseTransactionMgr
859f56d to
1bc3156
Compare
…n a database Replace the database-wide write lock with synchronized(transactionState) for commit, preCommit, abort, and finish paths, so independent transactions can proceed concurrently instead of being serialized by the db write lock. Key changes in DatabaseTransactionMgr: - Extract updateTxnLabels from unprotectUpdateInMemoryState (only needed at beginTransaction and replay time) - Convert runningTxnNums from volatile int to AtomicInteger - Replace ArrayDeque with ConcurrentLinkedDeque for final-status deques - Replace writeLock/writeUnlock with synchronized(transactionState) in preCommitTransaction2PC, commitTransaction, finishTransaction, abortTransaction, and abortTransaction2PC Key changes in PublishVersionDaemon: - Route publish executor by transactionId instead of dbId when enable_per_txn_publish=true (default), enabling intra-DB parallel publish - Fix race condition: use local variables in tryFinishTxnSync instead of shared instance fields for partitionVisibleVersions/backendPartitions - Rename dbExecutors to publishExecutors New config flag: - enable_per_txn_publish: controls publish routing (true=parallel within DB, false=sequential per DB fallback). Mutable at runtime. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
49d5051 to
db34540
Compare
|
run buildall |
Summary
synchronized(transactionState)for commit, preCommit, abort, and finish paths, so independent transactions proceed concurrentlyFixes: #53642
Changes
Per-transaction locking (DatabaseTransactionMgr.java)
updateTxnLabelsfromunprotectUpdateInMemoryState— only needed atbeginTransactionand replay time, not during state transitionsrunningTxnNumsfromvolatile inttoAtomicIntegerfor thread-safe increment/decrement without db lockArrayDequewithConcurrentLinkedDequefor final-status transaction queueswriteLock()/writeUnlock()withsynchronized(transactionState)in:preCommitTransaction2PCcommitTransaction(both overloads)finishTransactionabortTransaction/abortTransaction2PCbeginTransaction(label uniqueness),removeUselessTxns/cleanLabel(touchlabelToTxnIdsHashMap), replay pathsParallel publish within a database (PublishVersionDaemon.java)
transactionIdinstead ofdbIdwhenenable_per_txn_publish=true, so different transactions in the same DB finish in paralleltryFinishTxnSyncinstead of shared instance fields forpartitionVisibleVersions/backendPartitionsdbExecutorstopublishExecutorsEdit log outside lock (DatabaseTransactionMgr.java, EditLog.java)
enable_txn_log_outside_lockconfig flagConfig flags (Config.java)
enable_per_txn_publish(defaulttrue,mutable=true): controls publish routing. Set tofalseto fall back to sequential-per-DB publishenable_txn_log_outside_lock(defaulttrue,mutable=true): controls edit log write locationCorrectness
Two different transactions committing concurrently (the main win):
synchronized(txn1) { set COMMITTED, ConcurrentMap.put }synchronized(txn2) { set COMMITTED, ConcurrentMap.put }Same transaction: concurrent commit + abort:
synchronized(txnState)serializes them — second thread sees the first's state change and handles accordinglyNo deadlock: State transition paths only acquire per-txn locks. Cleanup/replay paths only acquire db write lock. No path acquires both.
Test plan
enable_per_txn_publishat runtime — verify both parallel and sequential modes workenable_txn_log_outside_lockat runtime — verify both paths work🤖 Generated with Claude Code