From 773b089704ac4a469ef17b2cd389af4e2908b414 Mon Sep 17 00:00:00 2001 From: "Sung Yun (CODE SIGNING KEY)" Date: Fri, 26 Jun 2026 15:03:21 -0400 Subject: [PATCH] POC: pin iceberg-testing conformance fixtures (submodule) Add sungwy/iceberg-testing as a pinned git submodule with language-neutral conformance fixtures (type strings, the Appendix B bucket hash, delete-file decode). The harness that walks them is deferred (see conformance/README.md); the PyIceberg fork has the worked example. --- .gitmodules | 3 +++ conformance/README.md | 26 ++++++++++++++++++++++++++ iceberg-testing | 1 + 3 files changed, 30 insertions(+) create mode 100644 .gitmodules create mode 100644 conformance/README.md create mode 160000 iceberg-testing diff --git a/.gitmodules b/.gitmodules new file mode 100644 index 000000000..8b7b6e8b7 --- /dev/null +++ b/.gitmodules @@ -0,0 +1,3 @@ +[submodule "iceberg-testing"] + path = iceberg-testing + url = https://github.com/sungwy/iceberg-testing.git diff --git a/conformance/README.md b/conformance/README.md new file mode 100644 index 000000000..330fdaa7e --- /dev/null +++ b/conformance/README.md @@ -0,0 +1,26 @@ +# Conformance fixtures (POC — submodule wired, test deferred) + +This pins [`sungwy/iceberg-testing`](https://github.com/sungwy/iceberg-testing) as a +git submodule at `iceberg-testing/`: language-neutral conformance fixtures for +Apache Iceberg, modeled on `apache/parquet-testing`. Three surfaces: + +- `table-spec/types/` — type-string parse + exact re-serialize (canonical) +- `table-spec/transforms/bucket/` — the Appendix B 32-bit bucket hash, as a + Known-Answer-Test, including byte-boundary decimals and a non-BMP string +- `table-spec/delete-formats/` — positional and equality delete-file decode by + field-id + +A C++ (`ctest`) harness that walks these fixtures is **not included in this POC**. +It was prepared in an environment without a C++ toolchain, so rather than commit an +unverified, possibly non-compiling test, the wiring is left as a TODO. The PyIceberg +fork has the worked example, including how a consumer keeps a local staged-adoption +list for cases it does not yet satisfy: +https://github.com/sungwy/iceberg-python/pull/1 + +To consume: `git submodule update --init`, then walk `iceberg-testing/table-spec/**` +and apply each surface README's assertion (parse + re-serialize; compute the bucket +hash and compare; decode delete-file columns by field-id). + +The bucket Known-Answer-Test is the highest-signal surface to wire first: the +expected hashes are taken from the spec, and the byte-boundary decimal cases +(`-1.28`, `-327.68`) are where minimal two's-complement encoding has diverged. diff --git a/iceberg-testing b/iceberg-testing new file mode 160000 index 000000000..0f4dcee7b --- /dev/null +++ b/iceberg-testing @@ -0,0 +1 @@ +Subproject commit 0f4dcee7bf5d94e3f31101c55f4cb3288e10ab0a