Skip to content

Commit 91d62ce

Browse files
committed
[SPARK-56221][SQL][PYTHON] Feature parity between spark.catalog.* vs DDL commands
### What changes were proposed in this pull request? **SQL** - `SHOW CACHED TABLES`: lists relations cached with an explicit name (`CACHE TABLE`, `catalog.cacheTable`, etc.); unnamed `Dataset.cache()` entries are not listed. **`Catalog` API (Scala / Java / PySpark)** - `listCachedTables()`: same information as `SHOW CACHED TABLES` (`CachedTable`: name, storage level). - `dropTable` / `dropView`: drop persistent table or view (with `ifExists`, `purge` where applicable). - `createDatabase` / `dropDatabase`: create or drop a namespace (with options: `ifNotExists` / `ifExists`, `cascade`, properties map). - `listPartitions`: partition strings for a table (aligned with `SHOW PARTITIONS`). - `listViews`: list views in the current or given namespace; optional name pattern. - `getTableProperties`: all table properties (aligned with `SHOW TBLPROPERTIES`). - `getCreateTableString`: DDL from `SHOW CREATE TABLE` (optional `asSerde`). - `truncateTable`: remove all table data (not for views). - `analyzeTable`: `ANALYZE TABLE ... COMPUTE STATISTICS` (optional `noScan`). ### Why are the changes needed? Gives stable programmatic ways to do what users already do with SQL (SHOW CACHED TABLES, SHOW PARTITIONS, etc.), without routing everything through raw SQL. ### Does this PR introduce any user-facing change? Yes. New SQL command, new Catalog API API. ### How was this patch tested? Unittests were added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #55025 from HyukjinKwon/SPARK-56221. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 5d6eb71 commit 91d62ce

38 files changed

Lines changed: 2464 additions & 115 deletions

File tree

common/utils/src/main/resources/error/error-conditions.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7463,6 +7463,11 @@
74637463
"The ANALYZE TABLE command does not support views."
74647464
]
74657465
},
7466+
"CATALOG_INTERFACE_METHOD" : {
7467+
"message" : [
7468+
"Catalog API <methodName> is not supported by <catalogClass>."
7469+
]
7470+
},
74667471
"CATALOG_OPERATION" : {
74677472
"message" : [
74687473
"Catalog <catalogName> does not support <operation>."

docs/sql-performance-tuning.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ Spark SQL can cache tables using an in-memory columnar format by calling `spark.
3030
Then Spark SQL will scan only required columns and will automatically tune compression to minimize
3131
memory usage and GC pressure. You can call `spark.catalog.uncacheTable("tableName")` or `dataFrame.unpersist()` to remove the table from memory.
3232

33+
To list relations cached with an explicit name, use `SHOW CACHED TABLES` in SQL or `spark.catalog.listCachedTables()`. Entries cached only via `Dataset.cache()` without a name are not included.
34+
3335
Configuration of in-memory caching can be done via `spark.conf.set` or by running
3436
`SET key=value` commands using SQL.
3537

docs/sql-ref-ansi-compliance.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -440,6 +440,7 @@ Below is a list of all the keywords in Spark SQL.
440440
|BY|non-reserved|non-reserved|reserved|
441441
|BYTE|non-reserved|non-reserved|non-reserved|
442442
|CACHE|non-reserved|non-reserved|non-reserved|
443+
|CACHED|non-reserved|non-reserved|non-reserved|
443444
|CALL|reserved|non-reserved|reserved|
444445
|CALLED|non-reserved|non-reserved|non-reserved|
445446
|CASCADE|non-reserved|non-reserved|non-reserved|

docs/sql-ref-syntax-aux-cache-cache-table.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') SELECT * FROM testDat
7979

8080
### Related Statements
8181

82+
* [SHOW CACHED TABLES](sql-ref-syntax-aux-show-cached-tables.html)
8283
* [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html)
8384
* [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
8485
* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html)

docs/sql-ref-syntax-aux-cache-uncache-table.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ UNCACHE TABLE t1;
4949
### Related Statements
5050

5151
* [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html)
52+
* [SHOW CACHED TABLES](sql-ref-syntax-aux-show-cached-tables.html)
5253
* [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html)
5354
* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html)
5455
* [REFRESH](sql-ref-syntax-aux-cache-refresh.html)
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
layout: global
3+
title: SHOW CACHED TABLES
4+
displayTitle: SHOW CACHED TABLES
5+
license: |
6+
Licensed to the Apache Software Foundation (ASF) under one or more
7+
contributor license agreements. See the NOTICE file distributed with
8+
this work for additional information regarding copyright ownership.
9+
The ASF licenses this file to You under the Apache License, Version 2.0
10+
(the "License"); you may not use this file except in compliance with
11+
the License. You may obtain a copy of the License at
12+
13+
http://www.apache.org/licenses/LICENSE-2.0
14+
15+
Unless required by applicable law or agreed to in writing, software
16+
distributed under the License is distributed on an "AS IS" BASIS,
17+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18+
See the License for the specific language governing permissions and
19+
limitations under the License.
20+
---
21+
22+
### Description
23+
24+
The `SHOW CACHED TABLES` statement returns every in-memory cache entry that was registered with an explicit table or view name, for example via [`CACHE TABLE`](sql-ref-syntax-aux-cache-cache-table.html) or `spark.catalog.cacheTable`. The result has two columns: `tableName` (the name used when caching) and `storageLevel` (a string description of how the data is cached).
25+
26+
Relations cached only through `Dataset.cache()` / `DataFrame.cache()` without assigning a catalog name are **not** listed.
27+
28+
### Syntax
29+
30+
```sql
31+
SHOW CACHED TABLES
32+
```
33+
34+
### Examples
35+
36+
```sql
37+
CACHE TABLE my_table AS SELECT * FROM src;
38+
39+
SHOW CACHED TABLES;
40+
+----------+--------------------------------------+
41+
| tableName| storageLevel|
42+
+----------+--------------------------------------+
43+
| my_table|Disk Memory Deserialized 1x Replicated|
44+
+----------+--------------------------------------+
45+
```
46+
47+
### Related Statements
48+
49+
* [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html)
50+
* [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
51+
* [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html)

docs/sql-ref-syntax-aux-show.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ license: |
1919
limitations under the License.
2020
---
2121

22+
* [SHOW CACHED TABLES](sql-ref-syntax-aux-show-cached-tables.html)
2223
* [SHOW COLUMNS](sql-ref-syntax-aux-show-columns.html)
2324
* [SHOW CREATE TABLE](sql-ref-syntax-aux-show-create-table.html)
2425
* [SHOW DATABASES](sql-ref-syntax-aux-show-databases.html)

docs/sql-ref-syntax.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,7 @@ You use SQL scripting to execute procedural logic in SQL.
124124
* [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html)
125125
* [SET](sql-ref-syntax-aux-conf-mgmt-set.html)
126126
* [SET VAR](sql-ref-syntax-aux-set-var.html)
127+
* [SHOW CACHED TABLES](sql-ref-syntax-aux-show-cached-tables.html)
127128
* [SHOW COLUMNS](sql-ref-syntax-aux-show-columns.html)
128129
* [SHOW CREATE TABLE](sql-ref-syntax-aux-show-create-table.html)
129130
* [SHOW DATABASES](sql-ref-syntax-aux-show-databases.html)

project/MimaExcludes.scala

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,10 @@ object MimaExcludes {
3535

3636
// Exclude rules for 4.2.x from 4.1.0
3737
lazy val v42excludes = v41excludes ++ Seq(
38+
// [SQL] SafeJsonSerializer.safeMapToJValue: second parameter widened from Function1 to
39+
// Function2 so the key is passed to the value serializer (progress.scala). Binary-incompatible
40+
// vs spark-sql-api 4.0.0; not part of the public supported API (private[streaming] package).
41+
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.streaming.SafeJsonSerializer.safeMapToJValue"),
3842
// Add DEBUG format to ErrorMessageFormat enum
3943
ProblemFilters.exclude[Problem]("org.apache.spark.ErrorMessageFormat*"),
4044
// [SPARK-47086][BUILD][CORE][WEBUI] Upgrade Jetty to 12.1.4

python/docs/source/reference/pyspark.sql/catalog.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,30 +25,41 @@ Catalog
2525
.. autosummary::
2626
:toctree: api/
2727

28+
Catalog.analyzeTable
2829
Catalog.cacheTable
2930
Catalog.clearCache
31+
Catalog.createDatabase
3032
Catalog.createExternalTable
3133
Catalog.createTable
3234
Catalog.currentCatalog
3335
Catalog.currentDatabase
3436
Catalog.databaseExists
37+
Catalog.dropDatabase
3538
Catalog.dropGlobalTempView
39+
Catalog.dropTable
3640
Catalog.dropTempView
41+
Catalog.dropView
3742
Catalog.functionExists
43+
Catalog.getCreateTableString
3844
Catalog.getDatabase
3945
Catalog.getFunction
4046
Catalog.getTable
47+
Catalog.getTableProperties
4148
Catalog.isCached
49+
Catalog.listCachedTables
4250
Catalog.listCatalogs
4351
Catalog.listColumns
4452
Catalog.listDatabases
4553
Catalog.listFunctions
54+
Catalog.listPartitions
4655
Catalog.listTables
56+
Catalog.listViews
4757
Catalog.recoverPartitions
4858
Catalog.refreshByPath
4959
Catalog.refreshTable
5060
Catalog.registerFunction
5161
Catalog.setCurrentCatalog
5262
Catalog.setCurrentDatabase
5363
Catalog.tableExists
64+
Catalog.truncateTable
5465
Catalog.uncacheTable

0 commit comments

Comments
 (0)