Draft
Conversation
Problem description: A query with an 'execute on initplan' function having a parameter that refers the other relation in the query caused SIGSEGV during execution. Root cause: As the function was executed in the initplan, during the execution the value of the parameter was not yet defined, as the initplan is processed before the main plan. Fix: When appending an initplan node for the function scan, set the subplan's 'parent_root' to NULL, isolating it from plan params of outer querie's. Now all such params don't get into the valid parameter list in SS_finalize_plan(), and the planner emits an error for such a query.
The project's Code of Conduct was updated in accordance with the recent changes approved by the Greengage DB architectural committee.
Reader gangs use local snapshot to access catalog, as a result, it will not synchronize with the sharedSnapshot from write gang which will lead to inconsistent visibility of catalog table on idle reader gang. Considering the case: select * from t, t t1; -- create a reader gang. begin; create role r1; set role r1; -- set command will also dispatched to idle reader gang When set role command dispatched to idle reader gang, reader gang cannot see the new tuple t1 in catalog table pg_auth. To fix this issue, we should drop the idle reader gangs after each utility statement which may modify the catalog table. Reviewed-by: Zhenghua Lyu <zlv@pivotal.io> (cherry picked from commit d1ba4da)
Previously in some cases when rescanning a hash join was required, and the hash join was executed in multiple batches that were spilled to disk, several issues occurred when rescan happened in the middle of a batch: 1. Outer batch files (which contain outer tuples for all the next batches) were never cleared, and after rescan they were filled again. When gp_workfile_compression is enabled, this caused a BufFile to be in an unexpected state (reading instead of writing), causing an error. 2. When inner batch file is read into the in-memory hash table, it is deleted. After processing the batch, the hash table is spilled again into the batch file in case rescan is possible. However, if rescan happens in the middle of a batch, the hash table was never spilled and its contents were lost, resulting in missing data in the output. To fix these issues, when rescan is executed without rebuilding the whole hash table, clear all the outer batch files and also spill the current batch to an inner batch file. Ticket: ADBDEV-8371
In some cases processing of injected panic fault can take a long time. So queries, sent after sending fault, can either complete successfully or end with an error. To prevent this uncertainty the waiting mechanism was introduced in test - before sending next commands after injecting panic fault, we wait for it to be fully processed that is to trigger SIGCHLD.
When a coordinator backend encounters an OOM error during a distributed query, it triggers a cleanup sequence that can execute twice or more times, causing a use-after-free crash. The sequence of events is as follows: during normal execution cleanup in `standard_ExecutorEnd()`, the system calls `cdbdisp_destroyDispatcherState()` to clean up the dispatcher state. This function frees the results array, including the `segdbDesc` pointers that reference executor backend connections. However, an OOM error may occur inside already occuring cleanup since we are foolishly trying to allocate more memory, `AbortTransaction()`, then `cdbcomponent_recycleIdleQE()` will be called later. This abort calls `cdbdisp_cleanupDispatcherHandle()` to perform cleanup operations again. The second cleanup attempt tries to read data from executor backends using the `segdbDesc` pointers that were already freed during the first cleanup in `checkDispatchResult()` to cancel the query, resulting in a segmentation fault when dereferencing invalid memory. Additionally, the `numActiveQEs` counter can become negative during reentrant calls to `cdbcomponent_recycleIdleQE()`, causing assertion failures. When a backend encounters an OOM error, it will throw an error and abort the transaction. If the backend happens to be a coordinator, it would also attempt to cancel the distributed query first, by reading results from executor backends. It is futile, since the gang would be discarded anyways if `ERRCODE_GP_MEMPROT_KILL` is encountered, which is a sign of OOM. Fixes: - Move `numActiveQEs` decrement after a `goto`. - Prevent dispatcher handle cleanup from cancelling the dispatch in case of OOM. It is instead done by `DisconnectAndDestroyAllGangs()`, based on the new global variable. - Avoid `ABORT` dispatch for distributed transactions when the new flag is active. Without a response from coordinator, the segments will eventually trigger `ABORT` themselves. - Whether gang is destroyed depends on the error code. Specify error code for runaway cleaner cancellations, so they are treated equivalently to OOM conditions. - Unit tests are modified to avoid mocking `in_oom_error_trouble()` for every `palloc()` call. - The test makes sure there aren't any `palloc()` calls at all if the server is currently handling an OOM condition. Ticket: ADBDEV-7916 Co-authored-by: Vladimir Sarmin <v.sarmin@arenadata.io>
Fix Orca cost model to prefer hashing smaller tables
Previously in Orca it was possible to achieve bad hash join plans that hashed
a much bigger table. This happened because in Orca's cost model there is a
cost associated with columns used in the join conditions, and this cost was
smaller when tuples are hashed than when tuples fed from an outer child. This
doesn't really make sense since it could make Orca hash a bigger table if
there are enough join conditions, no matter how much bigger this table is.
To make sure this never happens, increase the cost per join column for inner
child, so that it is bigger than for outer child (same as cost per byte
already present).
Additionally, Orca increased cost per join column for outer child when
spilling was predicted, which doesn't make sense either since there is no
additional hashing when spilling is enabled. Postgres planner only imposes
additional per-byte (or rather per-page) cost when spilling hash join, so Orca
should have the same per-join-column cost for both spilling and non-spilling
cases.
A lot of tests are affected by this change, but for most of them only costs
are changed. For some, hash joins are reordered, swapping inner and outer
children, since Orca previously hashed the bigger child in some cases. In case
of LOJNullRejectingZeroPlacePredicates.mdp this actually restored the old plan
specified in the comment. Also add a new regress test.
One common change in some tests are replacing Hash Semi Join with a regular
Hash Join + Sort + GroupAggregate. There is only Left Semi Join, so swapping
the inner and outer children is impossible in case of semi joins. This means
that it's slightly cheaper to convert Hash Semi Join to regular Hash Join to
be able to swap the children. The opposite conversion also takes place where
previously GroupAggregate was used.
Another common change is replacing HashJoin(table1, Broadcast(table2)) gets
replaced with HashJoin(Redistribute(table1), Redistribute(table2)), adding
another slice. This happens because the cost for hashing is now slightly
bigger, and so Orca prefers to split hashing table2 to all segments, instead
of every segment hashing all rows as it would be with Broadcast.
Below are some notable changes in minidump files:
- ExtractPredicateFromDisjWithComputedColumns.mdp
This patch changed the join order from ((cust, sales), datedim) to ((datedim,
sales), cust). All three tables are identical from Orca's point of view: they
are all empty and all table scans are 24 bytes wide, so there is no reason for
Orca to prefer one join order over the other since they all have the same cost.
- HAWQ-TPCH-Stat-Derivation.mdp
The only change in the plan is swapping children on 3rd Hash Join in the plan,
one involving lineitem_ao_column_none_level0 and
HashJoin(partsupp_ao_column_none_level0, part_ao_column_none_level0).
lineitem_ao_column_none_level0 is predicted to have approximately 22 billion
rows and the hash join is predicted to have approximately 10 billion rows, so
making the hash join the inner child is good in this case, since the smaller
relation is hashed.
- Nested-Setops-2.mdp
Same here. Two swaps were performed between dept and emp in two different
places. dept contains 1 row and emp contains 10001, so it's better if dept is
hashed. A Redistribute Motion was also replaced with Broadcast Motion in both
cases.
- TPCH-Q5.mdp
Probably the best improvement out of these plans. The previous plan had this
join order:
```
-> Hash Join (6,000,000 rows)
-> Hash Join (300,000,000 rows)
-> lineitem (1,500,000,000 rows)
-> Hash Join (500,000 rows)
-> supplier (2,500,000 rows)
-> Hash Join (5 rows)
-> nation (25 rows)
-> region (1 row)
-> Hash Join (100,000,000 rows)
-> customer (40,000,000 rows)
-> orders (100,000,000 rows)
```
which contains hashing 100 million rows twice (first order, then its hash join
with customer). The new plan has no such issues:
```
-> Hash Join (6,000,000 rows)
-> Hash Join (170,000,000 rows)
-> lineitem (1,500,000,000 rows)
-> Hash Join (20,000,000 rows)
-> orders (100,000,000 rows)
-> Hash Join (7,000,000 rows)
-> customer (40,000,000 rows)
-> Hash Join (5 rows)
-> nation (25 rows)
-> region (1 row)
-> supplier (2,500,000 rows)
```
This plan only hashes around 30 million rows in total, much better than 200
million.
Ticket: ADBDEV-8413
When appending statistics to a group, it is first checked whether statistics already exist in the group. If it exists, then new statistics are appended to the existing one using the AppendStats method. If there are no statistics in the group, the existence of statistics in the duplicate group is also checked. When appending statistics to an existing one, we take the existing statistics of the group (or its duplicate), create a copy of it, and add statistics to this copy, and release the old one. If the group does not have its own statistics and the duplicate statistics are used. Then we would add statistics to the duplicate group and try to release the statistics of the current group, which is NULL, which leads to a segmentation fault. Fix this by calling the AppendStats method on the duplicate.
During the exploration phase, new groups of equivalent expressions are created. In this process, some groups are marked as duplicates. After exploration, expressions from duplicate groups are moved into the group they duplicate. In cases where a duplicate group contains an expression that references the duplicated group, merging them results in a situation where a group contains an expression that references the very group it belongs to. This leads to infinite recursion during statistics derivation. The fix is to improve the cycle-detection logic so that it can recognize when an expression references the group it resides in.
- New CI Job for auto build deb-package - New targets for gpAux Makefile: changelog, changelog-deb, pkg, pkg-deb - New gpAux/debian folder with package description/rules for `debuild` utility - Copy `VERSION` file from source to main layer in Docker Image - Disable clean `VERSION` file if `.git` directory not exists - Deb package name gets from `Package` field in `gpAux/debian/control` file Ticket: ADBDEV-7873
Change bug_report format to YAML
Updated links to refer to the main branch
Prior to 3ce2e6a, querying pg_locks (or using pg_lock_status()), approximately 75% of backend memory allocations for resulting tuples weren't registered with Vmtracker or Resource Group Control. This memory would also leak if the query was cancelled or failed. This happened because CdbDispatchCommand(), which was previously used by pg_locks, called libpq to obtain the results that were allocated as PQresult structures with bare malloc(), even on the server side. This patch fixes both untracked memory issues by enforcing Vmtracker routines for PGresult allocations on the server-side. Including postgres.h in frontend code causes several errcode-related macro redefinition warnings. They are now un-definined first. Recursive errors due to mishandled OOM errors are addressed in c4e1085. This PR also adds an additional set of tests, building on top of the said commit. Ticket: ADBDEV-7691
Function XactLockTableWait() calls LocalXidGetDistributedXid() which may get gxid corresponding to local wait xid from distributed clog in case if the dtx (which we are waiting for) managed to commit by that time we access its gxid. And for such case there is an assertion introduced by commit 13a1f66. The assert indicates that the commited transaction was just running in parallel with current one, meaning there is no other reason to access distributed transaction history. If the transaction was commited long time ago the XactLockTableWait() would never be called. However, there is a case when we can't compare the timestamps: vacuum operation, which performs in-place update of pg_database (or pg_class) without being in distributed transaction. For this case this patch extends the assertion by allowing current timestamp to have zero value. The new test related to this case is added to file 2c2753a.
Previously, commit 8359bfa reverted changes related to a TAP tests that required hot standby functionality since it is not available in 6.X. Standby errored out before it could fully start due to several functions that threw an error. Skip the error if connection is in utility mode. Ticket: ADBDEV-7948
This reverts commit 8359bfa.
Problem description: After sequential execution of isolation2 tests 'standby_replay_dtx_info' and 'ao_unique_index' the coordinator's standby postmaster process together with its children processes were terminated. Root cause: Test 'standby_replay_dtx_info' sets fault injection 'standby_gxacts_overflow' on coordinator's standby, which updates the global var 'max_tm_gxacts' (the limit of distributed transactions) to 1, but at the reset of this fault the value of 'max_tm_gxacts' was not updated to its original value. Therefore, on any next test that created more than 2 distributed transactions that were replayed on the standby, the standby encountered the fatal error "the limit of 1 distributed transactions has been reached" and it was terminated. Fix: Set 'max_tm_gxacts' to its original value when fault injection 'standby_gxacts_overflow' is not set. (cherry picked from commit 423cc57b779bfb8f048f47425b428091a7d959a9)
The planner can execute some functions on segments. However, their contents will also be planned on segments. Planning may create motions, which is unacceptable on segments. Since the contents of functions may be unavailable when planning the initial query (for example, a C-function with a call to SQL in the SPI), it is sufficient to prevent motions from being created when planning a function on a segment. Ticket: ADBDEV-8689 (cherry picked from commit 50385e2ebc768a92cda0692a604df80981061512)
## ADBDEV-8787: (v6) Run tests for feature branches with slashes (#115) - Update pull-request branches to '**'
Target CI jobs to v8 - build - behave-tests - regression-tests - orca-tests - upload Ticket: ADBDEV-8833
If the add column + insert gets aborted, pg_aocsseg still holds the aborted column's vpinfo. We need to read only the committed columns' vpinfo and ignore all aborted column's vpinfo. Before this change during pg_aocsseg reads we were copying over the whole vpinfo which includes entries for aborted columns This creates memory corruption as we are copying over more than what is needed/commited (aborted column's vpinfo) This change edits the read of vpinfo to limit upto committed columns. All the other code paths (that assert for VARSIZE(dv) == aocs_vpinfo_size) don't encounter failure with aborted columns, as they are only reached after aocsseg is already updated in same transaction. AOCSFileSegInfoAddVpe doesn't encounter this problem as we always update aocsseg entry with empty vpe (with new number of columns) in aocs_addcol_emptyvpe before we reach AOCSFileSegInfoAddVpe in aocs_addcol_closefiles (cherry picked from commit 0a2d3cb) Changes compared to the original commit: 1. 6x specific change in the test query: changed 'USING ao_column' to 'WITH (APPENDONLY=TRUE, ORIENTATION=COLUMN)'. 2. 6x specific change in the test query: in the queries with 'gp_toolkit.__gp_aocsseg()' removed ordering by segment_id, as it doesn't exist. 3. Updated test answer file according to the changes above.
Problem description: Vacuum failed to clean up segment files for AOCS tables, that were created in a transaction, which was rolled back. Root cause: For AOCS tables, a transaction that creates segment files for the very first time, inserts a corresponding record into 'pg_aoseg.pg_aocsseg_<oid>' in 'InsertInitialAOCSFileSegInfo()'. But if the transaction was rolled back, this record in 'pg_aoseg.pg_aocsseg_<oid>' was no more available. But the physical segment files still existed. Vacuum process in AOCSTruncateToEOF() relies on the information from 'pg_aoseg.pg_aocsseg_<oid>' to get all segments vacuum needs to scan. Obviously, the segment files from the aborted transaction were not visible to it. Fix: Add 'heap_freeze_tuple_wal_logged(segrel, segtup)' into 'InsertInitialAOCSFileSegInfo()', so vacuum now can see the new segment files. It is already done so in 7x (refer to 1306d47). But this change interferes with commit 9e106f5, as freezing the tuple partially reverts its logic. Part of this interference is resolved by preceding cherry-pick of 0a2d3cb, which handles the same problem as 9e106f5, but in a different way. But for 6x additional changes are required in 'UpdateAOCSFileSegInfo()', so this patch adds usage of 'deformAOCSVPInfo()' into it in the manner intended by 0a2d3cb. Plus this change requires update of the output of the test 'uao_crash_compaction_column', because the output of 'gp_toolkit.__gp_aocsseg()' now contains records about segment files created by the interrupted vacuum command.
If you create a table and don't insert any data into it, the relation file is never fsync'd. You don't lose data, because an empty table doesn't have any data to begin with, but if you crash and lose the file, subsequent operations on the table will fail with "could not open file" error. To fix, register an fsync request in mdcreate(), like we do for mdwrite(). Per discussion, we probably should also fsync the containing directory after creating a new file. But that's a separate and much wider issue. Backpatch to all supported versions. Reviewed-by: Andres Freund, Thomas Munro Discussion: https://www.postgresql.org/message-id/d47d8122-415e-425c-d0a2-e0160829702d%40iki.fi 6X changes: In 6X, the smgr_which field is not used and not set. This patch adds setting this field. To maintain binary compatibility, this field is set outside the smgropen function. (cherry picked from commit 1b4f1c6)
Target the following reusable workflows to v9: - behave-tests - regression-tests - orca-tests - upload - package Ticket: [GG-16](https://tracker.yandex.ru/GG-16)
Fix escaping for perfmon metrics export What occurs? Whenever new log rows, containing database names with some special characters, were tried to be uploaded from _queries_history.dat file to the queries_history external table an error considering the size of a db column would occur. It would prevent any new logs to be loaded whatsoever. Why it occurs? The db column has a type of VARCHAR(64), thus it means that a larger string was tried to be put. Obvious reason for this - incorrect escaping or lack of such whatsoever. Only two symbols were observed to lead to errors: " and |. In case of a pipe - it leads to another error about incorrect row structure, which is logical, as pipes are used as delimiters inside the file. How do we fix it? Before the patch (1f67d39) db field used to be written to _queries_history.dat without double quotes, so no escaping was present. Now it has it, as we enclose every database name in double quotes. Also, we double the already present in database name double quotes. Does it fix the problem? Yes, as now we escape the whole database name string with all of its special (or not) characters - the very same method is used for the SQL query command and it works. Doubling double quotes is needed as we need to escape the quotes to not to end the string to early. What was changed? Minor code additions for escaping inside the src/backend/gpmon/gpmon.c Tests? New BDD auto test to check that before mentioned logs are added to queries_history correctly (as they appear there). But, not all symbols can be auto tested using present testing functions: ', | and UTF-8 symbols. Errors occur at different steps of a test for different type of symbol and are connected to the way commands are issued to the DBSM. Nevertheless, during hand testing these symbols passed. Observations? dbuser column also is not escaped, but I have not managed to recreate the same issue with it. Yet, it may be worth to add escaping to it in future, but now it seems like an extremely edge case scenario. Ticket: ADBDEV-7272 --------- Co-authored-by: Vladimir Sarmin <v.sarmin@arenadata.io> Co-authored-by: Georgy Shelkovy <g.shelkovy@arenadata.io>
In the case when the join condition contains AND clauses, hash join performs redistribution request by every subexpression in the condition. The problem occurred when several of these subexpressions had the same right-hand side, making the matching logic inside CPhysicalHashJoin::PdshashedMatching select the same subexpression for every request, therefore losing possible redistributions. This patch adds a parameter to the function that specifies the index of the initial distribution request index. Subexpressions matching this index are now prioritized in the matching logic. Also, as a side-effect of the research went into the patch, several explanatory comments are added to make the code easier to understand. Ticket: GG-207 (cherry picked from commit ada9b90) Changes from original commit: - Test results were adjusted to account for differences between branches
The gpssh_exkeys behave tests passed. However, they did't check anything because all scenarios were skipped. This is due to a mismatch between the file name and the tag name. In other words, we're running tests with a tag based on the file name, but there's no such tag. 1) Rename the gpssh_exkeys.feature file to gpssh-exkeys.feature to ensure an exact match between the file name and the tag name. 2) In the Docker container, the /dev/stdout and /dev/stderr files are not directly writable by an unprivileged user. Capture stdout/stderr via run_cmd. 3) Some tests require each host to have its own files in the /home/gpadmin/.ssh directory because the tests delete them, but not on all hosts. Mount the ssh_keys directory as a whole, not individual files, and mount it to /home/gpadmin/.ssh.src and before running the tests, copy .ssh.src to .ssh so that each host has its own copy of the (initially identical) files. 4) Add -x flag when calling the set command to make it clear in the logs which commands were run. Ticket: GG-121 Co-authored-by: Maxim Michkov <m.michkov@arenadata.io>
- Fix command injection in SQL dump workflow Pass `github.event.workflow_run.*` values through `env` section and use heredoc instead of direct echo interpolation. Prevents shell interpretation of backticks and $() in commit messages, branch names, and author names - Apply suggestion from @RekGRpth Use GitHubActions-like variables names Co-authored-by: Georgy Shelkovy <g.shelkovy@arenadata.io> Task: ADBDEV-9281
It makes isolation2 framework more consistent with pg_regress and isolation in terms of how it is invoked from the build system. Changes from original commit: 1. More test suites updated, including temp_tables_stat extension. 2. --bindir is replaced with --psqldir. (cherry picked from commit 085c3b2)
Isolation tests run as separate make rules, all of them outputting to the same directory. Because of this, regression.diffs file gets overwritten, and so its contents for earlier runs are lost. To fix this, use --outputdir option to put them all into separate directories, together with results directory, and also auto-generated sql and results. pg_regress also had an issue that it created these directories in the current folder, not in --outputdir, which is also fixed here for --outputdir to work properly. Changes from original commit: 1. There are less test suites in 6X, unnecessary changes removed. 2. Update path to resgroup tests output in run_resgroup_test.bash. 3. Add missing directory creation. 4. Update hardcoded .so path in resgroup tests Ticket: GG-215 (cherry picked from commit 5a495764fe5886d924d68890f86838b7af5c9cce)
* Retarget resgroup tests to v23 CI tag * Set environment variable for force Docker client API version usage DOCKER_API_VERSION=1.41 Task: CI-5305
https://bugs.python.org/issue584566 On CI machines `os.getlogin()` failed with: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError: [Errno 6] No such device or address ``` Changes to the original commit: In gpsd utility we do not support using environmental variables for getting values of parameters, so we don't need `'PGUSER' in envOpts`. Co-authored-by: Soumyadeep Chakraborty <sochakraborty@pivotal.io> (cherry picked from commit 241672b)
minirepro's behave tests are diabled on our CI and cannot pass. This patch fixes them. Steps to run minirepro's behave tests: ``` cd gpMgmt make behave flags='--tags=minirepro' ``` These tests will be enabled in a follow-up PR. Changes from original commit: 1. We need not to add anything in `environment.py` as it already has analyze. 2. Minirepro hadn't change it's error message, so we need not to adapt an as 1. (cherry picked from commit 69c306a)
One reason pygresql was previously modified was that it did not handle closing a connection very gracefully. In the process of updating pygresql, we've wrapped the connection it provides with a ClosingConnection function, which should handle gracefully closing the connection when the "with dbconn.connect as conn" syntax is used. This did, however, illustrate issues where a cursor might have been created as the result of a dbconn.execSQL() call, which seems to hold the connection open if not specifically closed. It is therefore necessary to remove the ability to get a cursor from dbconn.execSQL(). To highlight this difference, and to ensure that future calls to this library is easy to use, I've cleaned up and clarified the dbconn execution code, to include the following features. - dbconn.execSQL() closes the cursor as part of the function. It returns no rows - functions dbconn.query() is added, which behaves like dbconn.execSQL() except that it now returns a cursor - function dbconn.execQueryforSingleton() is renamed dconn.querySingleton() - function dbconn.execQueryforSingletonRow() is renamed dconn.queryRow() Changes to the original commit: We needed almost no changes from original commit, but to preserve authorship. So the only thing that is left it is openning and closing connection in `analyzedb_mgmt_utils.py` in "rows are inserted into table". Also connection was removed from context in `create_database_if_not_exists`. Authored-by: Tyler Ramer <tramer@pivotal.io> (cherry picked from commit 330db23)
This commit turns minirepro tests on.
* Fix regression SQL Dump unexpected fail Use `if-then` block code for check `CI` variable Task: ADBDEV-9294
There was a 11 years old bug in the python function fetching postmaster
PID for remote host. The bash command had unescaped $ symbol. As a
result part of awk command was interpolated as bash argument before ssh.
So awk do nothing (`awk {print }`), full ps output grepped by pid and
any part of string could match, e.g. postmaster port. Moreover, even PID
could be matched as a substring for longest PIDs. The last problem
affected function for local search too. So sometimes gprecoverseg could
kill arbitrary postgres processes on the same segment host.
We simplify bash command with proper escaping and cover these functions
by unit tests
Ticket: GG-227
Co-authored-by: Georgy Shelkovy <g.shelkovy@arenadata.io>
Update if condition from '-v' to '-z' Task: ADBDEV-9294
Python unit test test_clean_up_failed_segments was not actually validating the output due to typo: `assert_called_once_with` should be used instead of `called_once_with`. Also, the number in the output should be 2: only segments with inplace full synchronization should be cleaned, namely inplace_full1 and inplace_full3. This is confirmed by 2 segments listed in the line above: failed1 and failed3. Ticket: GG-238
Tests for gpload were disabled for ubuntu, even though they pass without errors. Remove the ubuntu check to add them to the active test suite. Also move its Python dependencies to be installed during the docker building step, not at runtime. Ticket: ADBDEV-9317
Previously, it was impossible to run isolation tests without directly building pg_isolation2_regress from source. This complicates testing infrastructure in extensions, requiring to specify ISOLATION2_ROOT to run isolation tests. This patch instead builds and installs pg_isolation2_regress during the compilation step. Regular pg_regress was already built and installed in the pgxs directory, but for some reason it wasn't actually being used. This patch instead uses the installed version of pg_regress, and adds pg_isolation2_regress to the same place. Other necessary files are added as symlinks to regress installation. test_python step is moved from `all` target to `installcheck` target, since it requires installed pygresql to work. Ticket: GG-206
When performing a join on types within the same distribution opfamily, there is no need to create a redistribution motion because the hash functions of these types are compatible with each other. ORCA, on the other hand, is not aware of this and creates a redistribution. The easy part of the fix is to detect such cases. The hard part is to integrate this logic into the existing one. It can be done in at least two ways: Inside the distribution matching logic, the current distribution can be safely changed to the desired one, as they are practically equivalent. Alternatively, the actual distribution of the scan can be returned. In this case, every piece of code that expects the derived distribution to directly match the required one should be changed as well. The full-fledged version of the first approach would require adding a new enforcing type to the EPropEnforcingType enum, as well as changing CDistributionSpec::Matches and CDistributionSpec: FSatisfies virtual functions overloads. The enforcing logic inside CDistributionSpecHashed::AppendEnforcers should also be changed. To minimize the amount of changes, only CDistributionSpecHashed::AppendEnforcers function can be modified to decide how to enforce a distribution on its own, but it doesn’t seem to fit into the current logic. The second approach requires thorough investigation to determine every piece of code that should perform distribution matching instead of direct expression matching. Each of these approaches is not trivial. The second one was chosen, because it makes the logic explicit. It allows the expression’s parent to know about the derived distribution, enabling specialized logic if necessary. In this patch, an overload of CUtils::Equals supporting distribution matching is added. Other matching functions are modified to support it as well. Where required, expression-matching versions of these functions are changed to distribution-matching ones. Also, to make the casting logic of the planners easier to observe, a new test file was added. Ticket: GG-208 (cherry picked from commit dd7b460) Changes from the original commit: - Test results were adjusted to account for differences between branches. - Missing drop commands for cst_varchar table were added. - 'optimizer_enable_nljoin' guc was substituted with 'optimizer_nestloop_factor` as the former one it not supported.
* Retarget Behave tests CI to v24 tag * Mount sqldump/dump.sql to cdw service in docker compose * Enable gpexpand behave test Task: ADBDEV-9130
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.