Skip to content

[FLINK-37094][hive] Adapt Hive connector to Flink 2.0 API changes#36

Open
jlalwani-amazon wants to merge 7 commits into
apache:mainfrom
jlalwani-amazon:flink2-stabilization
Open

[FLINK-37094][hive] Adapt Hive connector to Flink 2.0 API changes#36
jlalwani-amazon wants to merge 7 commits into
apache:mainfrom
jlalwani-amazon:flink2-stabilization

Conversation

@jlalwani-amazon
Copy link
Copy Markdown

@jlalwani-amazon jlalwani-amazon commented Apr 10, 2026

What is the purpose of the change

Adapt the Flink Hive connector to compile and pass tests against Flink 2.0. Flink 2.0 removed/relocated several APIs that the connector depends on.

JIRA: FLINK-37094

Brief change log

Commit 1: Fix Flink 2.0 import changes

  • StreamingFileSinklegacy package
  • UniqueConstrainto.a.f.table.catalog
  • SinkFunctionlegacy package
  • RestartStrategiesRestartStrategyOptions

Commit 2: Remove ManagedTable and CatalogLock APIs

  • Remove HiveCatalogLock (only consumed by Paimon)
  • Remove ManagedTableListener usage (FLIP-346)
  • Remove RequireCatalogLock from HiveDynamicTableFactory

Commit 3: Fix Flink 2.0 API signature changes

  • FactoryUtil.createDynamicTableSink/Source — added enrichedOptions param
  • CatalogTable.of()CatalogTable.newBuilder()
  • OutputFormat.open(int, int)open(InitializationContext)
  • CreateTableOperation now requires ResolvedCatalogTable

Commit 4: Fix Java 17+ compatibility and test changes

  • Added --add-opens JVM flags
  • Upgraded maven-shade-plugin 3.2.4 → 3.5.1
  • Fixed test expectations for Flink 2.0 behavioral changes

Commit 5: Fix CI for Flink 2.0 branch

  • Bump Flink to 2.0.1 (2.0.0/2.0.2 binaries unavailable)
  • Upgrade shade plugin to 3.6.0 (parquet 1.15.2 multi-release JARs)
  • Update NOTICE file for parquet 1.13.1 → 1.15.2
  • Fix dependency convergence errors
  • Fix ${flink.version} → ${project.version} in SQL connector modules
  • Remove hive3 from CI (Hive 3.1 incompatible with Java 11+, HIVE-21584)
  • Skip e2e tests (timeout on standard GHA runners)
  • Add .mvn/jvm.config for java.security.jgss access

Verifying this change

CI passed on fork: https://github.com/jlalwani-amazon/flink-connector-hive/actions

# Unit tests (421 passed, 0 failures)
mvn test -pl flink-connector-hive -am -Dflink.version=2.0.1 -Dsurefire.excludes='**/*ITCase*'

# Integration tests (219 passed, 0 failures)  
mvn test -pl flink-connector-hive -am -Dflink.version=2.0.1 -Dtest='*ITCase'

Known limitations

  • Hive 3.1 tests skipped on Flink 2.0: Hive 3.1's SessionState casts AppClassLoader to URLClassLoader which fails on Java 9+ (HIVE-21584). Fixed in Hive 4 (HIVE-27508). Flink 2.0 requires Java 11+.
  • E2E tests skipped in CI: HiveITCase times out on standard GitHub Actions runners (2 CPU, 7GB). Passes locally with more resources.

Does this pull request potentially affect one of the following parts?

  • Dependencies: yes (Flink 2.0.1, shade plugin 3.6.0, convergence pins)
  • The public API: no
  • The serializers: no
  • The runtime per-record code paths: no
  • Anything that affects deployment or recovery: no

@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented Apr 10, 2026

Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)

@jlalwani-amazon jlalwani-amazon changed the title [HIVE-]refactor: adapt Hive connector to Flink 2.0 API changes (WIP) [FLINK-37094]refactor: adapt Hive connector to Flink 2.0 API changes (WIP) Apr 10, 2026
@jlalwani-amazon jlalwani-amazon force-pushed the flink2-stabilization branch 4 times, most recently from bd46bd9 to ad881a1 Compare April 10, 2026 22:51
@jlalwani-amazon jlalwani-amazon changed the title [FLINK-37094]refactor: adapt Hive connector to Flink 2.0 API changes (WIP) [FLINK-37094]refactor: adapt Hive connector to Flink 2.0 API changes Apr 10, 2026
@jlalwani-amazon jlalwani-amazon force-pushed the flink2-stabilization branch 2 times, most recently from 339a9fe to 1136fe2 Compare April 13, 2026 18:58
…bleFactory)

- Delete HiveTableFactory.java (legacy, replaced by HiveDynamicTableFactory)
- Migrate HiveLookupTableSource from TableFunctionProvider to LookupFunctionProvider
- Migrate FileSystemLookupFunction from TableFunction to LookupFunction
- Remove HiveTableFactory import from HiveFunctionDefinitionFactory
- Delete HiveCatalogLock.java (CatalogLock removed in Flink 2.0, FLINK-37091)
- Remove RequireCatalogLock checks from HiveDynamicTableFactory
- Remove HiveCatalog.getTableFactory() and supportsManagedTable() overrides
- Replace ManagedTableListener.isManagedTable() with false
- Remove dead managedTable parameter from HiveTableUtil methods
- Delete TestLockTableSinkFactory and SPI registration
- Update UniqueConstraint import: table.api.constraints -> table.catalog (5 files)
- Update StreamingFileSink import to legacy package in HiveTableSink
- Update SinkFunction import to legacy package
- FactoryUtil.createDynamicTableSink/Source: add enrichedOptions param
- CatalogTable.of() -> CatalogTable.newBuilder() (7 call sites + 11 test files)
- ShowDatabases/Tables/Views/FunctionsOperation: add catalog/database params
- OutputFormat.open(int, int) -> open(InitializationContext)
- CreateTableOperation now requires ResolvedCatalogTable
- Remove testGenericTable (tested deleted HiveTableFactory)
- Remove testCreateAndGetFlinkManagedTable (ManagedTable API removed)
- Add supportsModels() to catalog metadata test bases
- Bump flink.version 1.20.0 -> 2.0.0
- Rename flink-hadoop-compatibility_2.12 -> flink-hadoop-compatibility
- Remove flink-java test dependency (DataSet API removed)
- Upgrade maven-shade-plugin 3.2.4 -> 3.5.1 (Java 17 class file support)
- Add --add-opens JVM flags via flink.surefire.baseArgLine
- Add .mvn/jvm.config for Maven JVM test discovery
- Replace RestartStrategies with RestartStrategyOptions configuration
- Fix SinkFunction import (moved to legacy package)
- Update Parquet nullable complex type test expectations
- Remove testCatalogLock and testCreateAndGetManagedTable IT tests
- Remove testGenericTable from HiveCatalogITCase
Copy link
Copy Markdown

@gguptp gguptp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! Can we confirm if we have some issue with managed tables compatibility

HiveConf hiveConf,
boolean managedTable) {
Table newHiveTable = instantiateHiveTable(tablePath, baseTable, hiveConf, managedTable);
HiveConf hiveConf) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given we're removing managedTable API which no longer exists in flink 2, will we have any issue for customers migrating from flink 1 connector to flink 2 connector? will there be hive tables which have connector=flink-managed property

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, @gguptp

Good question! The ManagedTable API was removed from Flink core in 2.0 (FLINK-36539), so this isn't specific to the connector. it's a Flink-wide change.

For existing tables with connector=flink-managed in the Hive metastore: the underlying data (files in HDFS/S3) is unaffected. Those tables remain readable as regular Hive tables. The connector property becomes inert. Flink 2.0 simply won't recognize it as a managed table and will treat it as a standard Hive table instead.

The only behavioral difference is that DROP TABLE on a previously managed table will no longer trigger Flink-side data cleanup. Users would need to clean up the data files manually or rely on Hive's native managed table behavior.

Since the ManagedTable feature was experimental and the removal was a deliberate Flink core decision, I think this is expected migration behavior rather than something the connector should try to paper over. We should update the FLink 2.0 migration guide to call out the change in behavior with managed-table

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the subject of migration, one thing that I want to highlight is that Flink 2 is incompatible with Hive 3. I have highlighted this in the description above and created a discussion thread on the mailing list. Reiterating it here for emphasis.

Flink 2 is on JDK17. Hive 3 doesn't work on JDK 17. So, essentially, migrating to Flink 2.0 will leave users high and dry unless we add support for Hive 4. I have Hive 4 support in this PR

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This makes sense, i will review the Hive4 support PR as well

@jlalwani-amazon jlalwani-amazon force-pushed the flink2-stabilization branch 2 times, most recently from 369faee to b9c1a08 Compare May 8, 2026 21:56
- Bump Flink version to 2.0.1 (2.0.0/2.0.2 binaries unavailable)
- Upgrade maven-shade-plugin to 3.6.0 (parquet 1.15.2 multi-release JARs)
- Update NOTICE file for parquet 1.13.1 -> 1.15.2
- Fix dependency convergence errors (pin Hive/Hadoop transitive deps)
- Fix ${flink.version} -> ${project.version} in SQL connector and e2e modules
- Remove hive3 from CI matrix (Hive 3.1 incompatible with Java 11+, HIVE-21584)
- Skip e2e tests on JDK 17 (HiveITCase timeouts)
- Add .mvn/jvm.config with --add-opens for java.security.jgss
- Update CI workflow for JDK 11/17 matrix
@jlalwani-amazon jlalwani-amazon force-pushed the flink2-stabilization branch from b9c1a08 to 17e8e2a Compare May 8, 2026 22:34
@jlalwani-amazon jlalwani-amazon changed the title [FLINK-37094]refactor: adapt Hive connector to Flink 2.0 API changes [FLINK-37094][hive] Adapt Hive connector to Flink 2.0 API changes May 8, 2026
@jlalwani-amazon jlalwani-amazon force-pushed the flink2-stabilization branch from 8e50eea to fdd730e Compare May 8, 2026 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants