-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Search before asking
- I searched in the issues and found nothing similar.
Paimon version
flink catalog:
CREATE CATALOG paimon_catalog WITH (
'type' = 'paimon',
'metastore' = 'hive',
'uri' = 'thrift://xxx:9083',
'warehouse' = 'jfs://poc-jfs/user/hive/lakehouse_paimon',
'table-default.metadata.iceberg.storage'='hive-catalog',
'table-default.metadata.iceberg.uri'='thrift://x:9083'xxx
);
USE CATALOG paimon_catalog;
i'm using following sql write data from kafka
insert into my_database.paimon_table
select *,
DATE_FORMAT(SYSTEMDATE,'yyyyMMdd') -- this is a partition field : dt
from kafka
where `TIME` IS NOT NULL
;
then i use spark-sql to query:
spark-sql --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog --conf spark.sql.catalog.spark_catalog.type=hive
select * from my_database.paimon_table where dt=20251231
and i found that spark would scan all data to find these "dt=20251231" rows
I also found that using the describe formatted my_database.paimon_table; did not display the # Metadata Columns fields that would be present when creating the paimon table using spark-sql.
like following, there is no # Metadata Columns ,which cause using partition field to filter data failed
......
dt string from deserializer
# Detailed Table Information
Catalog spark_catalog
Database paimon_flink1
Table zvos_flink_14_append2
Owner zoomspace
Created Time Thu Jan 22 17:40:42 CST 2026
Last Access Thu Jan 22 17:40:42 CST 2026
Created By Spark 2.2 or prior
Type MANAGED
Provider hive
Comment
Table Properties [metadata.iceberg.storage=hive-catalog, metadata.iceberg.uri=thrift://xxxx:9083, metadata_location=jfs://poc-jfs/user/hive/lakehouse_paimon/iceberg/paimon_flink1/zvos_flink_14_append2/metadata/v3.metadata.json, partition=dt, previous_metadata_location=jfs://poc-jfs/user/hive/lakehouse_paimon/iceberg/paimon_flink1/zvos_flink_14_append2/metadata/v2.metadata.json, storage_handler=org.apache.paimon.hive.PaimonStorageHandler, table_type=PAIMON, transient_lastDdlTime=1769074842]
Statistics 87763321 bytes
Location jfs://poc-jfs/user/hive/lakehouse_paimon/paimon_flink1.db/zvos_flink_14_append2
Serde Library org.apache.paimon.hive.PaimonSerDe
InputFormat org.apache.paimon.hive.mapred.PaimonInputFormat
OutputFormat org.apache.paimon.hive.mapred.PaimonOutputFormat
Compute Engine
paimon version: 1.4.1 snapshot
Write: using flink 1.20.1 on yarn with JuiceFS filesystem
Read: using spark3.5.2 、iceberg 1.6.1
Minimal reproduce step
use flink to write a partition-key table with iceberg metadata,
use where partition-key=xxxx to filter data
What doesn't meet your expectations?
use where partition-key=xxxx to filter data would be scan a specific path ,not scan all data
Anything else?
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!