[FLINK-37094][hive] Adapt Hive connector to Flink 2.0 API changes#36
[FLINK-37094][hive] Adapt Hive connector to Flink 2.0 API changes#36jlalwani-amazon wants to merge 7 commits into
Conversation
|
Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html) |
bd46bd9 to
ad881a1
Compare
339a9fe to
1136fe2
Compare
…bleFactory) - Delete HiveTableFactory.java (legacy, replaced by HiveDynamicTableFactory) - Migrate HiveLookupTableSource from TableFunctionProvider to LookupFunctionProvider - Migrate FileSystemLookupFunction from TableFunction to LookupFunction - Remove HiveTableFactory import from HiveFunctionDefinitionFactory
- Delete HiveCatalogLock.java (CatalogLock removed in Flink 2.0, FLINK-37091) - Remove RequireCatalogLock checks from HiveDynamicTableFactory - Remove HiveCatalog.getTableFactory() and supportsManagedTable() overrides - Replace ManagedTableListener.isManagedTable() with false - Remove dead managedTable parameter from HiveTableUtil methods - Delete TestLockTableSinkFactory and SPI registration
- Update UniqueConstraint import: table.api.constraints -> table.catalog (5 files) - Update StreamingFileSink import to legacy package in HiveTableSink - Update SinkFunction import to legacy package
- FactoryUtil.createDynamicTableSink/Source: add enrichedOptions param - CatalogTable.of() -> CatalogTable.newBuilder() (7 call sites + 11 test files) - ShowDatabases/Tables/Views/FunctionsOperation: add catalog/database params - OutputFormat.open(int, int) -> open(InitializationContext) - CreateTableOperation now requires ResolvedCatalogTable - Remove testGenericTable (tested deleted HiveTableFactory) - Remove testCreateAndGetFlinkManagedTable (ManagedTable API removed) - Add supportsModels() to catalog metadata test bases
- Bump flink.version 1.20.0 -> 2.0.0 - Rename flink-hadoop-compatibility_2.12 -> flink-hadoop-compatibility - Remove flink-java test dependency (DataSet API removed) - Upgrade maven-shade-plugin 3.2.4 -> 3.5.1 (Java 17 class file support) - Add --add-opens JVM flags via flink.surefire.baseArgLine - Add .mvn/jvm.config for Maven JVM test discovery - Replace RestartStrategies with RestartStrategyOptions configuration - Fix SinkFunction import (moved to legacy package) - Update Parquet nullable complex type test expectations - Remove testCatalogLock and testCreateAndGetManagedTable IT tests - Remove testGenericTable from HiveCatalogITCase
1136fe2 to
d4d3163
Compare
gguptp
left a comment
There was a problem hiding this comment.
Overall LGTM! Can we confirm if we have some issue with managed tables compatibility
| HiveConf hiveConf, | ||
| boolean managedTable) { | ||
| Table newHiveTable = instantiateHiveTable(tablePath, baseTable, hiveConf, managedTable); | ||
| HiveConf hiveConf) { |
There was a problem hiding this comment.
given we're removing managedTable API which no longer exists in flink 2, will we have any issue for customers migrating from flink 1 connector to flink 2 connector? will there be hive tables which have connector=flink-managed property
There was a problem hiding this comment.
Thanks for the review, @gguptp
Good question! The ManagedTable API was removed from Flink core in 2.0 (FLINK-36539), so this isn't specific to the connector. it's a Flink-wide change.
For existing tables with connector=flink-managed in the Hive metastore: the underlying data (files in HDFS/S3) is unaffected. Those tables remain readable as regular Hive tables. The connector property becomes inert. Flink 2.0 simply won't recognize it as a managed table and will treat it as a standard Hive table instead.
The only behavioral difference is that DROP TABLE on a previously managed table will no longer trigger Flink-side data cleanup. Users would need to clean up the data files manually or rely on Hive's native managed table behavior.
Since the ManagedTable feature was experimental and the removal was a deliberate Flink core decision, I think this is expected migration behavior rather than something the connector should try to paper over. We should update the FLink 2.0 migration guide to call out the change in behavior with managed-table
There was a problem hiding this comment.
On the subject of migration, one thing that I want to highlight is that Flink 2 is incompatible with Hive 3. I have highlighted this in the description above and created a discussion thread on the mailing list. Reiterating it here for emphasis.
Flink 2 is on JDK17. Hive 3 doesn't work on JDK 17. So, essentially, migrating to Flink 2.0 will leave users high and dry unless we add support for Hive 4. I have Hive 4 support in this PR
There was a problem hiding this comment.
Thanks! This makes sense, i will review the Hive4 support PR as well
369faee to
b9c1a08
Compare
- Bump Flink version to 2.0.1 (2.0.0/2.0.2 binaries unavailable)
- Upgrade maven-shade-plugin to 3.6.0 (parquet 1.15.2 multi-release JARs)
- Update NOTICE file for parquet 1.13.1 -> 1.15.2
- Fix dependency convergence errors (pin Hive/Hadoop transitive deps)
- Fix ${flink.version} -> ${project.version} in SQL connector and e2e modules
- Remove hive3 from CI matrix (Hive 3.1 incompatible with Java 11+, HIVE-21584)
- Skip e2e tests on JDK 17 (HiveITCase timeouts)
- Add .mvn/jvm.config with --add-opens for java.security.jgss
- Update CI workflow for JDK 11/17 matrix
b9c1a08 to
17e8e2a
Compare
8e50eea to
fdd730e
Compare
What is the purpose of the change
Adapt the Flink Hive connector to compile and pass tests against Flink 2.0. Flink 2.0 removed/relocated several APIs that the connector depends on.
JIRA: FLINK-37094
Brief change log
Commit 1: Fix Flink 2.0 import changes
StreamingFileSink→legacypackageUniqueConstraint→o.a.f.table.catalogSinkFunction→legacypackageRestartStrategies→RestartStrategyOptionsCommit 2: Remove ManagedTable and CatalogLock APIs
HiveCatalogLock(only consumed by Paimon)ManagedTableListenerusage (FLIP-346)RequireCatalogLockfromHiveDynamicTableFactoryCommit 3: Fix Flink 2.0 API signature changes
FactoryUtil.createDynamicTableSink/Source— addedenrichedOptionsparamCatalogTable.of()→CatalogTable.newBuilder()OutputFormat.open(int, int)→open(InitializationContext)CreateTableOperationnow requiresResolvedCatalogTableCommit 4: Fix Java 17+ compatibility and test changes
--add-opensJVM flagsmaven-shade-plugin3.2.4 → 3.5.1Commit 5: Fix CI for Flink 2.0 branch
Verifying this change
CI passed on fork: https://github.com/jlalwani-amazon/flink-connector-hive/actions
Known limitations
SessionStatecastsAppClassLoadertoURLClassLoaderwhich fails on Java 9+ (HIVE-21584). Fixed in Hive 4 (HIVE-27508). Flink 2.0 requires Java 11+.HiveITCasetimes out on standard GitHub Actions runners (2 CPU, 7GB). Passes locally with more resources.Does this pull request potentially affect one of the following parts?