Skip to content

[CBRD-26615] Reduce I/O bottleneck when parallel heap scan#6911

Merged
xmilex-git merged 27 commits intoCUBRID:developfrom
xmilex-git:ftab
Mar 27, 2026
Merged

[CBRD-26615] Reduce I/O bottleneck when parallel heap scan#6911
xmilex-git merged 27 commits intoCUBRID:developfrom
xmilex-git:ftab

Conversation

@xmilex-git
Copy link
Copy Markdown
Contributor

http://jira.cubrid.org/browse/CBRD-26615

Purpose

Parallel Heap Scan 수행 시 발생하는 I/O 병목 현상을 해소하기 위해 스캔 분배 방식을 최적화합니다.

기존 방식은 전역 Mutex를 사용하여 page_next 로직을 통해 페이지를 하나씩 순차적으로 할당받는 구조였습니다. 이로 인해 스레드가 페이지를 할당받고 I/O를 수행하는 과정에서 동기화 호출이 잦아지고, 사실상 I/O를 기다리는 시점이 직렬화되어 다중 스레드의 이점을 충분히 활용하지 못하는 병목이 존재했습니다.

이를 개선하기 위해, 스캔 시작 단계에서 Heap File의 헤더를 한 번만 읽어 전체 데이터가 포함된 Partial Sector (64페이지 단위) 정보를 미리 수집(Fetch)합니다. 수집된 섹션 정보들을 워커 스레드들에게 미리 나누어 할당함으로써, 각 스레드가 자신에게 할당된 섹터 범위를 독립적으로 순회(Iteration)하며 별도의 락 없이 병렬적으로 I/O를 발생시키고 데이터를 처리할 수 있도록 구조를 개선합니다.

Implementation

Storage - File Manager (src/storage/)

file_manager.c: file_get_all_data_sectors 함수를 추가하여 힙 파일의 PART_FTAB(일부 사용 중인 섹터)과 FULL_FTAB(꽉 찬 섹터) 정보를 모두 순회하며 실제 데이터가 포함된 섹터 정보를 FILE_FTAB_COLLECTOR로 수집하도록 구현했습니다.
file_manager.h: 섹터 수집을 위한 구조체 및 비트맵 매크로(FILE_FULL_PAGE_BITMAP 등)와 외부 인터페이스를 선언했습니다.

px_heap_scan_input_handler_ftabs.cpp:
init_on_main: 메인 스레드에서 file_get_all_data_sectors를 호출하여 전체 섹션 위치를 파악하고 병렬도에 맞춰 분할합니다.
get_next_vpid_with_fix: 할당된 섹터 내의 비트맵을 로컬 스레드에서 직접 확인하며 페이지를 pgbuf_fix 하도록 수정하여, 전역 Mutex 없이도 독립적인 I/O 발생이 가능하도록 로직을 변경했습니다.

@github-actions
Copy link
Copy Markdown

🧪 TC Test Environment Ready

CircleCI Testing:

  • CircleCI will automatically test using the branches below.

TC Repositories & Branches:

Next Steps:

  1. Wait for CircleCI tests to complete
  2. If CircleCI tests failed, please check the test results and fix the issues.
  3. When ready to merge this PR, please merge the TC PR first, then merge this PR.

@xmilex-git
Copy link
Copy Markdown
Contributor Author

/run all

@xmilex-git
Copy link
Copy Markdown
Contributor Author

/run all

@xmilex-git xmilex-git marked this pull request as ready for review March 17, 2026 10:48
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 17, 2026

Last reviewed commit: f9d0b43

Comment thread src/query/parallel/px_heap_scan/px_heap_scan.cpp Outdated
Comment thread src/storage/file_manager.c
Comment thread src/query/parallel/px_heap_scan/px_heap_scan_input_handler_ftabs.cpp Outdated
Comment thread src/query/parallel/px_heap_scan/px_heap_scan.cpp Outdated
Comment thread src/query/parallel/px_heap_scan/px_heap_scan_input_handler_ftabs.cpp Outdated
Comment thread src/storage/file_manager.c
Comment thread src/query/parallel/px_heap_scan/px_heap_scan_input_handler_ftabs.cpp Outdated
Comment thread src/storage/file_manager.c Outdated
Comment thread src/storage/file_manager.c Outdated
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 18, 2026

Last reviewed commit: "greptile review appl..."

Comment thread src/query/parallel/px_heap_scan/px_heap_scan.cpp Outdated
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 18, 2026

Last reviewed commit: "greptile review appl..."

Comment thread src/query/parallel/px_heap_scan/px_heap_scan.cpp
Comment thread src/storage/file_manager.c
Comment thread src/query/parallel/px_heap_scan/px_heap_scan_input_handler_ftabs.cpp Outdated
Comment thread src/storage/file_manager.c
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 18, 2026

Last reviewed commit: "allow not-ordered pa..."

Comment thread src/storage/page_buffer.c Outdated
@xmilex-git
Copy link
Copy Markdown
Contributor Author

/run all

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 24, 2026

Reviews (14): Last reviewed commit: "Revert "remove old page watcher"" | Re-trigger Greptile

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 25, 2026

Reviews (15): Last reviewed commit: "there are no miss without error code" | Re-trigger Greptile

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 25, 2026

Reviews (16): Last reviewed commit: "remove redundant header" | Re-trigger Greptile

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 25, 2026

Reviews (17): Last reviewed commit: "remove pb modification" | Re-trigger Greptile

Comment thread src/storage/file_manager.c Outdated
Comment thread src/storage/file_manager.c
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 26, 2026

Reviews (18): Last reviewed commit: "review apply" | Re-trigger Greptile

@xmilex-git
Copy link
Copy Markdown
Contributor Author

/run all

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 27, 2026

Reviews (19): Last reviewed commit: "review apply : ykham" | Re-trigger Greptile

@xmilex-git
Copy link
Copy Markdown
Contributor Author

/run all

@xmilex-git xmilex-git merged commit 45730b9 into CUBRID:develop Mar 27, 2026
12 checks passed
@github-actions
Copy link
Copy Markdown

TC Branch Finalized for cubrid-testcases-private-ex

Engine PR was merged.

Cleanup Results:

TC base branch is ready for the next PR.

@github-actions
Copy link
Copy Markdown

TC Branch Finalized for cubrid-testcases

Engine PR was merged.

Cleanup Results:

TC base branch is ready for the next PR.

hyunikn added a commit to hyunikn/cubrid that referenced this pull request Mar 27, 2026
hyunikn added a commit to hyunikn/cubrid that referenced this pull request Mar 27, 2026
kwangsoochae pushed a commit to kwangsoochae/cubrid that referenced this pull request Apr 1, 2026
Purpose
Optimize the scan distribution method to resolve I/O bottlenecks occurring during Parallel Heap Scans.

The previous implementation relied on a structure where pages were allocated sequentially one by one through page_next logic using a global Mutex. This led to frequent synchronization calls as threads attempted to allocate pages and perform I/O. Consequently, the timing of I/O waits became effectively serialized, creating a bottleneck that prevented the system from fully utilizing the advantages of multi-threading.

To improve this, the Heap File header is now read only once at the start of the scan to pre-fetch Partial Sector (64-page units) information containing all data. By pre-allocating these collected sector details to worker threads, the structure is enhanced so that each thread can independently iterate through its assigned sector range. This allows for parallel I/O and data processing without additional locking.

Implementation
Storage - File Manager (src/storage/)
file_manager.c: Added the file_get_all_data_sectors function. It iterates through both PART_FTAB (partially used sectors) and FULL_FTAB (full sectors) of the heap file to collect information on sectors containing actual data into a FILE_FTAB_COLLECTOR.

file_manager.h: Declared structures and bitmap macros (e.g., FILE_FULL_PAGE_BITMAP) for sector collection, along with external interfaces.

px_heap_scan_input_handler_ftabs.cpp
init_on_main: The main thread calls file_get_all_data_sectors to identify all sector locations and partition them according to the degree of parallelism.

get_next_vpid_with_fix: Modified the logic to allow local threads to directly check the bitmap within their assigned sectors and perform pgbuf_fix. This enables independent I/O generation without relying on a global Mutex.
hgryoo pushed a commit to cubrid-systems/cubrid that referenced this pull request Apr 8, 2026
Purpose
Optimize the scan distribution method to resolve I/O bottlenecks occurring during Parallel Heap Scans.

The previous implementation relied on a structure where pages were allocated sequentially one by one through page_next logic using a global Mutex. This led to frequent synchronization calls as threads attempted to allocate pages and perform I/O. Consequently, the timing of I/O waits became effectively serialized, creating a bottleneck that prevented the system from fully utilizing the advantages of multi-threading.

To improve this, the Heap File header is now read only once at the start of the scan to pre-fetch Partial Sector (64-page units) information containing all data. By pre-allocating these collected sector details to worker threads, the structure is enhanced so that each thread can independently iterate through its assigned sector range. This allows for parallel I/O and data processing without additional locking.

Implementation
Storage - File Manager (src/storage/)
file_manager.c: Added the file_get_all_data_sectors function. It iterates through both PART_FTAB (partially used sectors) and FULL_FTAB (full sectors) of the heap file to collect information on sectors containing actual data into a FILE_FTAB_COLLECTOR.

file_manager.h: Declared structures and bitmap macros (e.g., FILE_FULL_PAGE_BITMAP) for sector collection, along with external interfaces.

px_heap_scan_input_handler_ftabs.cpp
init_on_main: The main thread calls file_get_all_data_sectors to identify all sector locations and partition them according to the degree of parallelism.

get_next_vpid_with_fix: Modified the logic to allow local threads to directly check the bitmap within their assigned sectors and perform pgbuf_fix. This enables independent I/O generation without relying on a global Mutex.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants