Skip to content

add: Metadata at Scale blog by Arnav Borkar (2025-10-16)#5

Open
ArnavBorkar wants to merge 1 commit intoAlexMercedCoder:mainfrom
ArnavBorkar:main
Open

add: Metadata at Scale blog by Arnav Borkar (2025-10-16)#5
ArnavBorkar wants to merge 1 commit intoAlexMercedCoder:mainfrom
ArnavBorkar:main

Conversation

@ArnavBorkar
Copy link
Copy Markdown

Blog Details

Title: Metadata at Scale: Tackling Apache Iceberg Tables with Tens of Millions of Files
Date: 2025-10-16
URL: https://www.e6data.com/blog/apache-iceberg-million-files-metadata
Company: e6data
Author: Arnav Borkar

Summary

This blog explores how to handle Apache Iceberg tables with tens of millions of files (45M+ files, ~5TB metadata). It covers:

  • Streaming metadata reads to prevent OOM crashes
  • Layered pruning at partition and file levels
  • Treating metadata as first-class data for query optimization

Tags: Apache Iceberg, Apache Spark, Trino

@netlify
Copy link
Copy Markdown

netlify bot commented Jan 13, 2026

Deploy Preview for lakehouseblogscom ready!

Name Link
🔨 Latest commit 4bfeacb
🔍 Latest deploy log https://app.netlify.com/projects/lakehouseblogscom/deploys/696639a4a8ab570008d13138
😎 Deploy Preview https://deploy-preview-5--lakehouseblogscom.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants