HIVE-29417: Basic Iceberg table MSCK repair #6273

deniskuzZ · 2026-01-17T07:41:08Z

What changes were proposed in this pull request?

Remove dangling file references for the missing files

Why are the changes needed?

Addresses java.io.FileNotFoundException: File does not exist during table reads

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn test -Dtest=TestIcebergCliDriver -Dqfile=iceberg_msck_repair.q
mvn test -Dtest=TestHiveIcebergRepairTable

ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckOperation.java

Aggarwal-Raghav · 2026-01-20T16:00:30Z

Do we need to add this new action in MetastoreConf.ConfVars#TASK_THREADS_REMOTE_ONLY or that's why the title contains the keyword Dummy?

iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java

ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckResult.java

Aggarwal-Raghav · 2026-01-20T16:51:59Z

Otherwise LGTM

deniskuzZ · 2026-01-23T09:26:41Z

Thanks for the review, @Aggarwal-Raghav! I’ve been tied up with other tasks, I’ll start addressing the comments.

deniskuzZ · 2026-01-23T09:48:10Z

Do we need to add this new action in MetastoreConf.ConfVars#TASK_THREADS_REMOTE_ONLY or that's why the title contains the keyword Dummy?

i don't think so, msck is a manual cmd, user can execute it on demand

Aggarwal-Raghav · 2026-01-24T06:43:24Z

LGTM +1. Verified the changes in local

ayushtkn

Thanx @deniskuzZ for the initiative

Couple of questions:

Is this operation reversible, like tmrw, I land up restoring my files, can I rollback to the previous snapshots and kind of restore my table to original?
What happens if the Manifest* files go missing? How do we repair that.
We handle the DataFile missing scenario but for DeleteFiles/DV's?
Once a DataFile is dropped what about the DeleteFIles/DV's associated with it?
I just skimmed over the implementation, we are doing planTask on the entire table, is that batched? I am doubtful like whether at scale it will lead to OOM kind of stuff within the HS2
Did you explore rather than operating on the main table, rather get the entires from the ALL_FILES metadata table or some other relevant.
Should we like have a split batch as well, like you found 1K files missing lets hold -> commit & start again, like to avoid memory pressure.
Fundamentally, I am not sure we should discuss whether this should be within MSCK or an independent command within ALTER TABLE EXECUTE <SOME FANCY Thing>. MSCK I believe was to fix the inconsistency b/w the Metadata & Actual Data, like you ingested data or so. This is like fixing the Metadata post a Data Loss, this is bit debatable though, different people different but still we should maybe think once. I know Spark handles such things via there DeleteFile Action or some action they have, but it doesn't find the missing ones on its own

If any of the cases are handled we should extend the tests, If not already, (I had a very quick pass)

deniskuzZ · 2026-01-26T07:30:51Z

Is this operation reversible, like tmrw, I land up restoring my files, can I rollback to the previous snapshots and kind of restore my table to original?

Yes, the operation is reversible. The repair operation creates a new snapshot with updated manifests that remove references to missing data files. Iceberg maintains a complete snapshot history, allowing you to rollback to any previous snapshot.

What happens if the Manifest* files go missing? How do we repair that.

This PR does not address missing manifest files. The current implementation follows the same basic repair functionality that Impala already implemented (commit fdad9d32041a736108b876704bd0354090a88d29), which focuses on detecting and removing references to missing data files.

Missing manifest files represent a more severe form of corruption that would require reconstructing metadata from available manifests or data files, which is beyond the scope of this basic repair functionality.

We handle the DataFile missing scenario but for DeleteFiles/DV's?
Once a DataFile is dropped what about the DeleteFIles/DV's associated with it?

The repair operation cannot proceed if there are missing delete files. This is a limitation of Iceberg's DeleteFiles API, which only allows removing data files, not delete files or deletion vectors.

This aligns with the Impala implementation's scope and the fundamental constraints of the Iceberg DeleteFiles API.

I just skimmed over the implementation, we are doing planTask on the entire table, is that batched? I am doubtful like whether at scale it will lead to OOM kind of stuff within the HS2

The planFiles() method returns a CloseableIterable, which is a lazy iterator that does not load all files into memory at once. This design prevents OOM issues even for very large tables.

Did you explore rather than operating on the main table, rather get the entires from the ALL_FILES metadata table or some other relevant.

The repair operation needs to check files referenced in the current table snapshot, which planFiles() provides directly
The metadata table approach would add unnecessary complexity without performance benefits

Should we like have a split batch as well, like you found 1K files missing lets hold -> commit & start again, like to avoid memory pressure.

Batching commits is not necessary for this implementation. The operation only stores file paths (strings) in memory, not file contents or metadata, making the memory footprint minimal.

Fundamentally, I am not sure we should discuss whether this should be within MSCK or an independent command within ALTER TABLE EXECUTE <SOME FANCY Thing>. MSCK I believe was to fix the inconsistency b/w the Metadata & Actual Data, like you ingested data or so. This is like fixing the Metadata post a Data Loss, this is bit debatable though, different people different but still we should maybe think once. I know Spark handles such things via there DeleteFile Action or some action they have, but it doesn't find the missing ones on its own

Integrating repair functionality into MSCK REPAIR TABLE is the appropriate design choice for Hive, as it aligns with MSCK's core purpose of synchronizing metadata with existing data files.

MSCK is designed to repair inconsistencies between the Hive Metastore and the actual data files in storage.
For ACID tables, MSCK already handles missing writes and metadata synchronization (see TestMSCKRepairOnAcid.java).
The repair operation fundamentally performs the same function: synchronizing table metadata with the actual state of files in storage.

sonarqubecloud · 2026-01-26T12:15:07Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

asf-ci-hive added the tests pending label Jan 17, 2026

deniskuzZ changed the title ~~Dummy Msck Repair for Iceberg tables~~ Dummy Iceberg table MSCK repair Jan 17, 2026

deniskuzZ force-pushed the dummy_iceberg_msck_repair branch from aa597e8 to 51dd092 Compare January 17, 2026 07:42

asf-ci-hive added tests failed tests pending and removed tests pending tests failed labels Jan 17, 2026

deniskuzZ force-pushed the dummy_iceberg_msck_repair branch from 51dd092 to 9262146 Compare January 17, 2026 20:45

asf-ci-hive added tests pending and removed tests failed labels Jan 17, 2026

deniskuzZ force-pushed the dummy_iceberg_msck_repair branch from 9262146 to bb38578 Compare January 17, 2026 20:46

asf-ci-hive added tests failed tests pending tests unstable and removed tests pending tests failed labels Jan 17, 2026

deniskuzZ force-pushed the dummy_iceberg_msck_repair branch from bb38578 to d5f568c Compare January 18, 2026 09:55

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Jan 18, 2026

deniskuzZ force-pushed the dummy_iceberg_msck_repair branch from d5f568c to 7602e32 Compare January 19, 2026 09:12

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending labels Jan 19, 2026

Aggarwal-Raghav reviewed Jan 20, 2026

View reviewed changes

ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckOperation.java Outdated Show resolved Hide resolved

Aggarwal-Raghav reviewed Jan 20, 2026

View reviewed changes

iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java Outdated Show resolved Hide resolved

Aggarwal-Raghav reviewed Jan 20, 2026

View reviewed changes

ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckResult.java Outdated Show resolved Hide resolved

deniskuzZ closed this Jan 23, 2026

deniskuzZ reopened this Jan 23, 2026

deniskuzZ force-pushed the dummy_iceberg_msck_repair branch from 7602e32 to 740d4c1 Compare January 23, 2026 10:42

asf-ci-hive added tests pending and removed tests passed labels Jan 23, 2026

deniskuzZ changed the title ~~Dummy Iceberg table MSCK repair~~ HIVE-29417: Basic Iceberg table MSCK repair Jan 23, 2026

asf-ci-hive added tests passed and removed tests pending labels Jan 23, 2026

Aggarwal-Raghav approved these changes Jan 24, 2026

View reviewed changes

ayushtkn reviewed Jan 25, 2026

View reviewed changes

HIVE-29417: Basic Iceberg table MSCK repair

e063a9e

deniskuzZ force-pushed the dummy_iceberg_msck_repair branch from 740d4c1 to e063a9e Compare January 26, 2026 11:10

asf-ci-hive added tests pending and removed tests passed labels Jan 26, 2026

asf-ci-hive added tests passed and removed tests pending labels Jan 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-29417: Basic Iceberg table MSCK repair #6273

HIVE-29417: Basic Iceberg table MSCK repair #6273

deniskuzZ commented Jan 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Aggarwal-Raghav commented Jan 20, 2026

Uh oh!

Uh oh!

Uh oh!

Aggarwal-Raghav commented Jan 20, 2026

Uh oh!

deniskuzZ commented Jan 23, 2026

Uh oh!

deniskuzZ commented Jan 23, 2026

Uh oh!

Aggarwal-Raghav commented Jan 24, 2026

Uh oh!

ayushtkn left a comment •

edited

Loading

Uh oh!

deniskuzZ commented Jan 26, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HIVE-29417: Basic Iceberg table MSCK repair #6273

Are you sure you want to change the base?

HIVE-29417: Basic Iceberg table MSCK repair #6273

Conversation

deniskuzZ commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Aggarwal-Raghav commented Jan 20, 2026

Uh oh!

Uh oh!

Uh oh!

Aggarwal-Raghav commented Jan 20, 2026

Uh oh!

deniskuzZ commented Jan 23, 2026

Uh oh!

deniskuzZ commented Jan 23, 2026

Uh oh!

Aggarwal-Raghav commented Jan 24, 2026

Uh oh!

ayushtkn left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Jan 26, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

deniskuzZ commented Jan 17, 2026 •

edited

Loading

ayushtkn left a comment •

edited

Loading

deniskuzZ commented Jan 26, 2026 •

edited

Loading