Skip to content

Fetch tuples in small batches in adaptive executor where possible#5195

Open
marcocitus wants to merge 2 commits intomainfrom
marcocitus/reentrant-executor
Open

Fetch tuples in small batches in adaptive executor where possible#5195
marcocitus wants to merge 2 commits intomainfrom
marcocitus/reentrant-executor

Conversation

@marcocitus
Copy link
Copy Markdown
Contributor

@marcocitus marcocitus commented Aug 20, 2021

DESCRIPTION: Fetch tuples in small batches in adaptive executor where possible

This PR makes RunDistributedExecution reentrant such that we avoid creating an excessively large tuple store for queries that do not require materialization. This is beneficial to restrict memory usage, avoid unnecessary disk I/O and thereby improve performance of queries with large result sets.

It does not cover local execution (seems somewhat important) and multi-row inserts (seems unimportant) yet.

@codecov
Copy link
Copy Markdown

codecov bot commented Aug 20, 2021

Codecov Report

❌ Patch coverage is 85.45455% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.90%. Comparing base (4d6fb1d) to head (eb1bc1d).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5195   +/-   ##
=======================================
  Coverage   88.90%   88.90%           
=======================================
  Files         286      286           
  Lines       63227    63304   +77     
  Branches     7937     7950   +13     
=======================================
+ Hits        56214    56283   +69     
- Misses       4736     4744    +8     
  Partials     2277     2277           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ResetExplainAnalyzeData(taskList);
MemoryContext memoryContext = AllocSetContextCreate(executorState->es_query_cxt,
"AdaptiveExecutor",
ALLOCSET_DEFAULT_SIZES);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: maybe reset this context at the very end since it uses es_query_ctx

execution->rowsReceivedInCurrentRun = 0;

/* TODO: GUC? be smart? */
int maxBatchSize = 10000;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: pick batch size

@DS-AdamMilazzo
Copy link
Copy Markdown

I hope this feature makes it in. :-)

@colm-mchugh colm-mchugh force-pushed the marcocitus/reentrant-executor branch from f33de24 to a8d14ee Compare April 7, 2026 17:05
@marcocitus
Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree [company="Snowflake"]

Though this work was done during my Microsoft/Citus Data tenure, so already belongs to Microsoft.

@marcocitus
Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree

@colm-mchugh colm-mchugh force-pushed the marcocitus/reentrant-executor branch 3 times, most recently from c140524 to 4200722 Compare April 9, 2026 09:31
@colm-mchugh colm-mchugh marked this pull request as ready for review April 9, 2026 10:05
@colm-mchugh colm-mchugh requested a review from tejeswarm April 9, 2026 10:05
…d execution, tests.

- Resource clean-up: AdaptiveExecutorEnd() releases sessions/connections when an error occurs between AdaptiveExecutorRun calls. Also handle early termination (cursor close, LIMIT satisfied) with proper clean-up of in-flight worker queries.

- ShouldRunTasksSequentially() check in FinishDistributedExecution() replaced with explicit sessionsCleanedUp flag on DistributedExecution struct. Fixes double CleanUpSessions on sequential path.

- Adaptive batch sizing via citus.executor_batch_size (default 0 => auto). Auto mode calculates batch size from work_mem and TupleDesc (attlen + typmod for varlena, 128B default for unbounded).  Floor 100, ceiling 1M rows.

- Remote execution uses LibPQ's chunked mode (PG17+), GUC configurable for now.

- Local execution is eager; it runs to completion.

- Regress test suite: 11 test cases covering batch sizes 1/10/100K/auto, empty results, LIMIT, aggregation, DML RETURNING, GUC behavior, between-batch error cleanup, cursor close mid-batch and cross-batch-size result consistency.
@colm-mchugh colm-mchugh force-pushed the marcocitus/reentrant-executor branch from 4200722 to eb1bc1d Compare April 9, 2026 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants