diff --git a/website/docs/glue-catalog.md b/website/docs/glue-catalog.md index 6d1388c96..d7a23f38c 100644 --- a/website/docs/glue-catalog.md +++ b/website/docs/glue-catalog.md @@ -99,6 +99,7 @@ From your terminal, create a glue database. aws glue create-database --database-input "{\"Name\":\"xtable_synced_db\"}" ``` +#### Method 1: Using Glue Crawler From your terminal, create a glue crawler. Modify the ``, `` and ``, with appropriate values. @@ -149,6 +150,47 @@ From your terminal, run the glue crawler. Once the crawler succeeds, you’ll be able to query this Iceberg table from Athena, EMR and/or Redshift query engines. + +#### Method 2: Using XTable APIs to sync with AWS Glue Data Catalog directly +This applies for Iceberg target format only. + +**Pre-requisites:** +* Download iceberg-aws-X.X.X.jar from the [Maven repository](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws) +* Download bundle-X.X.X.jar from the [Maven repository](https://mvnrepository.com/artifact/software.amazon.awssdk/bundle) + +Create a `glue-sync-config.yaml` file: + +```yaml md title="yaml" +sourceFormat: HUDI|DELTA # choose only one +targetFormats: + - ICEBERG +datasets: + - + tableBasePath: s3://path/to/source/data + tableName: table_name + partitionSpec: partitionpath:VALUE + namespace: xtable_synced_db +``` + +Create a `glue-sync-catalog.yaml` file: + +```yaml md title="yaml" +catalogImpl: org.apache.iceberg.aws.glue.GlueCatalog +catalogName: +catalogOptions: + io-impl: org.apache.iceberg.aws.s3.S3FileIO + warehouse: s3://path/to/source +``` + +Sample command to sync the table with Glue Data Catalog: + +```shell md title="shell" +java -cp /path/to/xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:/path/to/iceberg-aws-1.3.1.jar:/path/to/bundle-2.23.9.jar org.apache.xtable.utilities.RunSync --datasetConfig glue-sync-config.yaml --icebergCatalogConfig glue-sync-catalog.yaml +``` +### Validating the results +Once the sync is complete (or in case of Glue Crawler option, once the crawler succeeds) you can inspect the catalogued tables in Glue +and also query the table in Amazon Athena like below: + -### Validating the results -After the crawler runs successfully, you can inspect the catalogued tables in Glue -and also query the table in Amazon Athena like below: - ```sql SELECT * FROM xtable_synced_db.; ``` @@ -180,9 +218,7 @@ SELECT * FROM xtable_synced_db.; -### Validating the results -After the crawler runs successfully, you can inspect the catalogued tables in Glue -and also query the table in Amazon Athena like below: + ```sql SELECT * FROM xtable_synced_db.; diff --git a/website/docs/snowflake.md b/website/docs/snowflake.md index 882f89963..d0da25eae 100644 --- a/website/docs/snowflake.md +++ b/website/docs/snowflake.md @@ -8,11 +8,6 @@ title: "Snowflake" Currently, Snowflake supports [Iceberg tables through External Tables](https://www.snowflake.com/blog/expanding-the-data-cloud-with-apache-iceberg/) and also [Native Iceberg Tables](https://www.snowflake.com/blog/iceberg-tables-powering-open-standards-with-snowflake-innovations/). -:::note NOTE: -Iceberg on Snowflake is currently supported in -[public preview](https://www.snowflake.com/blog/build-open-data-lakehouse-iceberg-tables/) -::: - ## Steps: These are high level steps to help you integrate Apache XTable™ (Incubating) synced Iceberg tables on Snowflake. For more additional information refer to the [Getting started with Iceberg tables](https://docs.snowflake.com/LIMITEDACCESS/iceberg-2023/tables-iceberg-getting-started). @@ -47,7 +42,7 @@ TABLE_FORMAT=ICEBERG ENABLED=TRUE; ``` -### Create an Iceberg table from Iceberg metadata in object storage +### Method 1: Create an Iceberg table from Iceberg metadata in object storage Refer to additional [examples](https://docs.snowflake.com/LIMITEDACCESS/iceberg-2023/create-iceberg-table#examples) in the Snowflake Create Iceberg Table guide for more information. @@ -58,4 +53,45 @@ CATALOG= METADATA_FILE_PATH='path/to/metadata/.metadata.json'; ``` -Once the table creation succeeds you can start using the Iceberg table as any other table in Snowflake. \ No newline at end of file +Once the table creation succeeds you can start using the Iceberg table as any other table in Snowflake. + +### Method 2: Using XTable APIs to sync with Snowflake Catalog directly + +#### Pre-requisites: + +* Build Apache XTable™ (Incubating) from [source](https://github.com/apache/incubator-xtable) +* Download `iceberg-aws-X.X.X.jar` from the [Maven repository](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws) +* Download `bundle-X.X.X.jar` from the [Maven repository](https://mvnrepository.com/artifact/software.amazon.awssdk/bundle) +* Download `iceberg-spark-runtime-3.X_2.12/X.X.X.jar` from [here](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.2_2.12/1.4.2/) +* Download `snowflake-jdbc-X.X.X.jar` from the [Maven repository](https://mvnrepository.com/artifact/net.snowflake/snowflake-jdbc) + +Create a `snowflake-sync-config.yaml` file: + +```yaml md title="yaml" +sourceFormat: DELTA +targetFormats: + - ICEBERG +datasets: + - + tableBasePath: s3://path/to/table + tableName: + namespace: . +``` + +Create a `snowflake-sync-catalog.yaml` file: + +```yaml md title="yaml" +catalogImpl: org.apache.iceberg.snowflake.SnowflakeCatalog +catalogName: +catalogOptions: + io-impl: org.apache.iceberg.aws.s3.S3FileIO + warehouse: s3://path/to/table + uri: jdbc:snowflake://.snowflakecomputing.com + jdbc.user: + jdbc.password: +``` + +Sample command to sync the table with Snowflake: +```shell md title="shell" +java -cp /path/to/iceberg-spark-runtime-3.2_2.12-1.4.2.jar:/path/to/xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:/path/to/snowflake-jdbc-3.13.28.jar:/path/to/iceberg-aws-1.4.2.jar:/Users/sagarl/Downloads/bundle-2.23.9.jar org.apache.xtable.utilities.RunSync --datasetConfig snowflake-sync-config.yaml --icebergCatalogConfig snowflake-sync-catalog.yaml +```