-
Notifications
You must be signed in to change notification settings - Fork 330
feat: support size() for MapType input #4580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
4f17386
0b3c451
ff01bbc
92789ac
8e8f6c5
206fbd0
22407c5
4ed725f
1921d5c
a7f7529
90d59ca
d5ecc47
678f2d5
3b1c1f8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,15 +15,29 @@ | |
| -- specific language governing permissions and limitations | ||
| -- under the License. | ||
|
|
||
| -- ConfigMatrix: spark.sql.legacy.sizeOfNull=true,false | ||
|
|
||
| statement | ||
| CREATE TABLE test_size(arr array<int>, m map<string, int>) USING parquet | ||
|
|
||
| statement | ||
| INSERT INTO test_size VALUES (array(1, 2, 3), map('a', 1, 'b', 2)), (array(), map()), (NULL, NULL) | ||
|
|
||
| query spark_answer_only | ||
| query | ||
| SELECT size(arr), size(m) FROM test_size | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you also add literal map cases below the existing literal-args query? Something like: query
SELECT size(map('a', 1, 'b', 2)), size(map()), size(cast(NULL as map<string,int>))That way the literal path is covered for both shapes. While you're here, a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think literal map is not supported yet
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done — all three items added:
ScalarValue::Map unit tests also added per review (ff01bbc), and stale docs notes cleared (b1d177a). Should I file a follow-up issue for |
||
|
|
||
| -- literal arguments | ||
| -- literal array arguments | ||
| query | ||
| SELECT size(array(1, 2, 3)), size(array()), size(cast(NULL as array<int>)) | ||
|
|
||
| -- literal map via CreateMap (falls back: Comet has no CreateMap serde; | ||
| -- cast(NULL as map) avoids CreateMap and goes through CometLiteral instead) | ||
| query spark_answer_only | ||
| SELECT size(map('a', 1, 'b', 2)), size(map()) | ||
|
|
||
| query | ||
| SELECT size(cast(NULL as map<string,int>)) | ||
|
|
||
| -- cardinality is a SQL alias for size | ||
| query | ||
| SELECT cardinality(arr), cardinality(m) FROM test_size | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -133,27 +133,43 @@ class CometMapExpressionSuite extends CometTestBase { | |
| makeParquetFileAllPrimitiveTypes(path, dictionaryEnabled = true, 100) | ||
| spark.read.parquet(path.toString).createOrReplaceTempView("t1") | ||
|
|
||
| // Use column references in maps to avoid constant folding | ||
| checkSparkAnswerAndFallbackReason( | ||
| sql("SELECT size(case when _2 < 0 then map(_8, _9) else map() end) from t1"), | ||
| "size does not support map inputs") | ||
| "map is not supported") | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // fails with "map is not supported" | ||
| ignore("size with map input") { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this test still available?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it has been promoted from |
||
| withTempDir { dir => | ||
| withTempView("t1") { | ||
| val path = new Path(dir.toURI.toString, "test.parquet") | ||
| makeParquetFileAllPrimitiveTypes(path, dictionaryEnabled = true, 100) | ||
| spark.read.parquet(path.toString).createOrReplaceTempView("t1") | ||
| test("size with map input - v2 reader") { | ||
| withTempPath { dir => | ||
| withSQLConf(CometConf.COMET_ENABLED.key -> "false") { | ||
| val df = spark | ||
| .range(100) | ||
| .select( | ||
| col("id"), | ||
| when(col("id") > 1, map(col("id"), when(col("id") > 2, col("id")))) | ||
| .alias("map1"), | ||
| when(col("id") > 5, map(lit("a"), col("id"), lit("b"), col("id") + 1)) | ||
| .alias("map2")) | ||
| df.write.parquet(dir.toString()) | ||
| } | ||
|
|
||
| // Use column references in maps to avoid constant folding | ||
| checkSparkAnswerAndOperator( | ||
| sql("SELECT size(map(_8, _9, _10, _11)) from t1 where _8 is not null")) | ||
| checkSparkAnswerAndOperator( | ||
| sql("SELECT size(case when _2 < 0 then map(_8, _9) else map() end) from t1")) | ||
| Seq("", "parquet").foreach { v1List => | ||
| withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> v1List) { | ||
| val df = spark.read.parquet(dir.toString()) | ||
| df.createOrReplaceTempView("t1") | ||
| if (v1List.isEmpty) { | ||
| checkSparkAnswer(df.select(size(col("map1")))) | ||
| checkSparkAnswer(df.select(size(col("map2")))) | ||
| checkSparkAnswer( | ||
| sql("SELECT size(CASE WHEN id < 50 THEN map1 ELSE map2 END) FROM t1")) | ||
| } else { | ||
| checkSparkAnswerAndOperator(df.select(size(col("map1")))) | ||
| checkSparkAnswerAndOperator(df.select(size(col("map2")))) | ||
| checkSparkAnswerAndOperator( | ||
| sql("SELECT size(CASE WHEN id < 50 THEN map1 ELSE map2 END) FROM t1")) | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the
spark.sql.legacy.sizeOfNullconfig change the return value here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.sql.legacy.sizeOfNulldoesn't affect the native return value. The serde layer already wrapssize(child)intoCASE WHEN isNotNull(child) THEN size(child) ELSE (-1|null) END, so the null branch is handled by CaseWhen's ELSE literal and nativesize()only receives non-null inputs. The ConfigMatrix insize.sqlalready covers both behaviors.