Skip to content

Conversation

@konstantinb
Copy link
Contributor

@konstantinb konstantinb commented Dec 16, 2025

What changes were proposed in this pull request?

HIVE-29367: fixing overflows in ConvertJoinMapJoin calculations

Why are the changes needed?

ConvertJoinMapJoin does not use StatsUtils.safeAdd()/saveMult() for all its calculations. There are some real life scenarios when it could perform a catastrophic decision to convert a join to a mapjoin after calculating negative size for the 'small" table, resulting in an OOM during query processing

Does this PR introduce any user-facing change?

No

How was this patch tested?

Via unit testing and with load testing on a custom Hive installation based of 4.0x version

You can see the test output generated by the pre-fix code here: HIVE-29637-mapjoin_stats_overflow q out original
it clearly confirms the decision of perform a mapjoin despite very large volume of data

@konstantinb konstantinb changed the title HIVE-29367: preventing Long overflows in ConvertJoinMapJoin HIVE-29367: prevent Long overflows in ConvertJoinMapJoin Dec 18, 2025
@konstantinb konstantinb marked this pull request as ready for review December 18, 2025 19:18
@sonarqubecloud
Copy link

}
Operator<? extends OperatorDesc> parentOp = joinOp.getParentOperators().get(pos);
totalSize += computeOnlineDataSize(parentOp.getStatistics());
totalSize = StatsUtils.safeAdd(totalSize, computeOnlineDataSize(parentOp.getStatistics()));
Copy link
Member

@deniskuzZ deniskuzZ Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's appropriate to use safeAdd for a table size?
on the other side hashTableDataSizeAdjustment does that as well, so I guess it's fine
cc @zabetak, @thomasrebele

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine. The total size here does not need to be 100% correct, it's just an estimation that influences the join decision. Might make sense to rename it to estimatedTotalSize.

if (cs != null) {
String colTypeLowerCase = cs.getColumnType().toLowerCase();
long nonNullCount = cs.getNumNulls() > 0 ? numRows - cs.getNumNulls() + 1 : numRows;
long nonNullCount = cs.getNumNulls() > 0 ? Math.max(1L, numRows - cs.getNumNulls() + 1) : numRows;
Copy link
Member

@deniskuzZ deniskuzZ Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe Math.max(0L, numRows - cs.getNumNulls()) + 1

@deniskuzZ
Copy link
Member

@konstantinb, do we need same for
long llapMaxSize = (long) (maxSize + (maxSize * overSubscriptionFactor * slotsPerQuery))

@konstantinb
Copy link
Contributor Author

I'm away until the end of the next week. I will respond to review comments when I'm back. Thank you.

@konstantinb
Copy link
Contributor Author

@konstantinb, do we need same for long llapMaxSize = (long) (maxSize + (maxSize * overSubscriptionFactor * slotsPerQuery))

@deniskuzZ, from my analysis, all elements of this calculation are directly derived from Hive configuration settings. I believe that overflow could only occur with (currently) extremely improbable configuration settings, such as 1TB of RAM for the mapjoin conversion threshold, overSubscriptionFactor of 1000 and slotsPerQuery of 8390

I realize that huge amounts of RAM could become available sooner rather than later. At the same time, since this functionality is not data-driven but purely config-driven, should it be better considered a separate fix?

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants