HIVE-29578: Iceberg: add support for native views#6449
HIVE-29578: Iceberg: add support for native views#6449difin wants to merge 2 commits intoapache:masterfrom
Conversation
4fdad42 to
252c608
Compare
252c608 to
e10eba5
Compare
e10eba5 to
96fa476
Compare
96fa476 to
114412a
Compare
|
|
||
| delete from src_ice where last_name in ('ln1a', 'ln2a', 'ln7a'); | ||
|
|
||
| create view v_ice as select * from src_ice stored by iceberg; |
There was a problem hiding this comment.
IMHO think the syntax should follow materialized view syntax
I checked some other database engines (Trino, Dremio) that supports Iceberg logical views, none of them adds extra keywords to the SQL syntax but they enable define the catalog where the view should be stored and that catalog should be Iceberg
| /** | ||
| * Optional trailing {@code tableFileFormat} on CREATE VIEW: only {@code STORED BY ICEBERG} is allowed | ||
| * (no serde properties or {@code STORED AS} tail). | ||
| */ | ||
| private boolean validateOptionalViewStorageClause(ASTNode storageRoot) throws SemanticException { |
There was a problem hiding this comment.
The keywords STORED BY ICEBERG are a bit confusing because no data is actually stored in the case of logical views. Some engines do not require extra keywords to specify when creating Iceberg logical views.
If we insist on using keywords, how about something like these?
create view <view_name> viewproperties(format='iceberg')
as select...;
create view <view_name> format iceberg
as select...;
If we decide to go with the STORED BY ICEBERG keywords, please create a new grammar rule specifically for views—similar to tableFileFormat—called viewMetadataFormat. This should limit the grammar to the STORED BY <identifier> syntax. By doing this, you can eliminate the need for extra validation checks in the analyzer.
I recommend checking the configuration setting hive.default.storage.handler.class when deciding where to store the view metadata. If a storage handler is set that supports views, let's use the Storage Handler API to store the metadata.
There was a problem hiding this comment.
- Created a rule
viewMetadataFormat. - Moved
STORED BY ICEBERGcloser to the view definition similar to CTAS. i.e. `STORED BY ICEBERG as SELECT. - Made STORED BY ICEBERG optional. if not specified, deducting the type based on
hive.default.storage.handler.classconf.
| result.setLastAccessTime(nowSec); | ||
| result.setRetention(Integer.MAX_VALUE); | ||
|
|
||
| boolean hiveEngineEnabled = false; |
There was a problem hiding this comment.
What is hiveEngineEnabled and why is it false?
There was a problem hiding this comment.
hiveEngineEnabled switches how HiveOperationsBase.storageDescriptor fills the Storage Desacriptor: with HiveIcebergInputFormat / HiveIcebergOutputFormat / HiveIcebergSerDe when true, or the usual placeholder FileInputFormat / FileOutputFormat / LazySimpleSerDe when false.
Why it’s false in toHiveView:
This path materializes an HMS VIRTUAL_VIEW for REST catalog that expose Iceberg view metadata through the HMS API. That row isn’t meant to drive a Hive table scan the way a real Iceberg table commit does; execution still comes from the view definition / catalog, not from wiring Iceberg MR formats on the stub. HiveViewOperations does the same thing (hiveEngineEnabled = false).
So we keep a minimal SD consistent with normal virtual views and avoid implying this HMS object is an Iceberg-backed table for the Hive engine. For tables, HiveTableOperations still turns engine integration on/off via metadata + ConfigProperties.ENGINE_HIVE_ENABLED where that actually matters.
| private static ViewBuilder applyCommentAndTblProps( | ||
| ViewBuilder builder, Map<String, String> tblProperties, String comment) { | ||
| ViewBuilder viewBuilder = builder; | ||
| if (comment != null && !comment.isEmpty()) { |
| return (ViewCatalog) catalog; | ||
| } | ||
|
|
||
| private static ViewBuilder startViewBuilder( |
There was a problem hiding this comment.
nit.: startViewBuilder, applyCommentAndTblProps, commitView doesn't add much value when we already have a builder.
There was a problem hiding this comment.
Done - replaced the too verbose methods with inline code.
| if (cat.viewExists(id)) { | ||
| cat.dropView(id); |
There was a problem hiding this comment.
What is the result of dropView when the view with the specified name doesn't exists? I thinkg about whether the cat.viewExists(id) is necessary
There was a problem hiding this comment.
The result of dropView when the view with the specified name doesn't exists is false. Else true. I removed the cat.viewExists(id) check.
| } | ||
|
|
||
| @Test | ||
| public void testIfNotExistsReturnsFalseWhenViewExists() throws Exception { |
There was a problem hiding this comment.
Isn't the test method name testIfNotExistsReturnsFalseWhenViewExists is misleading? We are testing createOrReplaceNativeView not the IfNotExists method.
There was a problem hiding this comment.
Yes, the name was vague, IfNotExists is not a method, but one of the parameters.
I renamed the method to:
testCreateOrReplaceNativeViewSkipsWhenViewExistsAndIfNotExistsFlagTrue
| create view v_ice as select * from src_ice stored by iceberg; | ||
|
|
||
| select * from v_ice; | ||
|
|
There was a problem hiding this comment.
Could you please add
- logical view which does some transformation on it's base table and query from it?
- create views when the schema is specified and not specified.
There was a problem hiding this comment.
logical view which does some transformation on it's base table and query from it?
This is not supported by Hive itself:
update v_ice set last_name = last_name + 'a'
fname=iceberg_native_view.q
See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs.
org.apache.hadoop.hive.ql.parse.SemanticException: You cannot update or delete records in a view
at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.validateTargetTable(RewriteSemanticAnalyzer.java:265)
at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyze(RewriteSemanticAnalyzer.java:84)
at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:73)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:358)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:499)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:451)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:415)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:234)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358)
at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:790)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:760)
at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:139)
create views when the schema is specified and not specified.
Done
| break; | ||
| } | ||
| } | ||
| boolean icebergNativeView = validateOptionalViewStorageClause(storageClause); |
There was a problem hiding this comment.
Please do not hardcode anything like Iceberg into compiler code. The compiler is independent from the storage handler. I'm aware that we already hove lots of code which violates this principal and it already causes lots of troubles.
There was a problem hiding this comment.
Fixed - moved all Iceberg-specific code into HiveIcebergStorageHandler and kept generic interfaces in the Compiler.
| private static final long serialVersionUID = 1L; | ||
|
|
||
| /** HMS table property set when the view is declared with {@code STORED BY ICEBERG} (native Iceberg view). */ | ||
| public static final String ICEBERG_NATIVE_VIEW_PROPERTY = "hive.iceberg.native.view"; |
There was a problem hiding this comment.
Please remove this from here.
| private final boolean ifNotExists; | ||
| private final boolean replace; | ||
| private final List<FieldSchema> partitionColumns; | ||
| private final boolean icebergNativeView; |
There was a problem hiding this comment.
Please remove this from here.
| @Explain(displayName = "iceberg native view", displayOnlyOnTrue = true) | ||
| public boolean isIcebergNativeView() { | ||
| return icebergNativeView; | ||
| } |
There was a problem hiding this comment.
Please remove this from here.
114412a to
c40d401
Compare
|



What changes were proposed in this pull request?
Added support for Iceberg native views in Hive for both HMS and REST catalogs.
There is a limitation in the current implementation: when Hive uses a REST catalog and creates a view on a partitioned Iceberg table, querying the view only works with CBO disabled. To be addressed in a follow-up PR.
Why are the changes needed?
To support Iceberg native views. This can be especially useful for REST Catalog clients.
Does this PR introduce any user-facing change?
Yes, new HQL syntax:
create view <view_name> as select * from <src_tbl> stored by iceberg;How was this patch tested?
Created new and updated exiting unit and integration tests with Iceberg native views test cases.