feat: Add timestamp nanosecond primitive types by zhjwpku · Pull Request #653 · apache/iceberg-cpp

zhjwpku · 2026-05-17T04:27:45Z

No description provided.

zhjwpku · 2026-05-17T04:32:03Z

I chose TypeId::kTimestampNs over TypeId::kTimestampNano (Java uses Nano) to align with the spec. @evindj Please help review the timestamp parsing part when you have time. I changed the fractional seconds handling a bit.

wgtmac · 2026-05-20T09:19:12Z

+template <>
+int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) {
+  return BucketUtils::HashLong(std::get<int64_t>(literal.value()));
+}


According to the Iceberg V3 spec and the Java implementation (BucketTimestampNano.java), nanosecond timestamps must be converted to microseconds (divided by 1000) before hashing. This ensures that bucket partitioning is consistent between microsecond and nanosecond precision types for the same logical time.

return BucketUtils::HashLong(std::get<int64_t>(literal.value()) / 1000);

I think the original review comment had hallucination. Sorry about that.

The actual workflow is as below:

private static class BucketTimestampNano extends Bucket<Long> implements SerializableFunction<Long, Integer> { private BucketTimestampNano(int numBuckets) { super(numBuckets); } @Override protected int hash(Long nanos) { return BucketUtil.hash(DateTimeUtil.nanosToMicros(nanos)); } }

We can see that it also calls floorDiv inside:

public static long nanosToMicros(long nanos) { return Math.floorDiv(nanos, NANOS_PER_MICRO); }

So my original (AI) suggestion was wrong. Please follow the same approach to use floorDiv here.

Perhaps it is worth adding a dedicated utility class/file for temporal types just like Java for reuse.

wgtmac

Thanks for adding timestamp_ns and timestamptz_ns! Here are a few findings based on Java parity and the Iceberg Spec.

wgtmac · 2026-05-22T02:57:04Z

      return Literal::Date(std::get<int32_t>(days.value()));
    }
+    case TypeId::kTimestamp:
+      return source_is_nanos ? Literal::Timestamp(timestamp_val / 1000)


C++ integer division truncates toward zero, causing incorrect results for negative timestamps (pre-1970) not evenly divisible by 1000. Java uses Math.floorDiv. We should use a floor division helper here.

wgtmac · 2026-05-22T02:57:04Z

+    case TypeId::kTimestampNs:
+      return source_is_nanos ? Literal::TimestampNs(timestamp_val)
+                             : Literal::TimestampNs(timestamp_val * 1000);
+    case TypeId::kTimestampTzNs:


Casting from Timestamp(Ns) to TimestampTz(Ns) is allowed here, but Java (TimestampNanoLiteral.to(Type)) explicitly returns null (blocking this promotion) because timezone information is missing. Should we return NotSupported to match Java?

wgtmac · 2026-05-22T02:57:04Z

+                              .expected_string = "1684137600000000001"},
+        BasicLiteralTestParam{.test_name = "TimestampTzNs",
+                              .literal = Literal::TimestampTzNs(1684137600000000001LL),
+                              .expected_type_id = TypeId::kTimestampTzNs,


Consider adding cast tests for TimestampNs and TimestampTzNs (e.g., from String, and cross-casting between TimestampNs and Timestamp), especially for negative timestamps, to ensure rounding parity with Java.

wgtmac · 2026-05-22T02:58:00Z

  kTimestamp,
  kTimestampTz,
+  kTimestampNs,
+  kTimestampTzNs,


Should we sort them as

kTimestamp, kTimestampNs, kTimestampTz, kTimestampTzNs,

wgtmac · 2026-05-22T02:58:08Z

 class TimestampBase;
 class TimestampType;
 class TimestampTzType;
+class TimestampNsType;


wgtmac · 2026-05-22T05:43:28Z

+    case TypeId::kTimestampTzNs:
      return rhs == TypeId::kLong || rhs == TypeId::kTimestamp ||
-             rhs == TypeId::kTimestampTz;
+             rhs == TypeId::kTimestampTz || rhs == TypeId::kTimestampNs ||


This looks incorrect to me. Should we be strict that only identical types are allowed to compare? It looks also dangerous to compare a timestamp value against a long value. Should we remove that support as well?

wgtmac · 2026-05-22T05:51:10Z

+template <>
+int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) {
+  return BucketUtils::HashLong(std::get<int64_t>(literal.value()));
+}


I think the original review comment had hallucination. Sorry about that.

The actual workflow is as below:

private static class BucketTimestampNano extends Bucket<Long> implements SerializableFunction<Long, Integer> { private BucketTimestampNano(int numBuckets) { super(numBuckets); } @Override protected int hash(Long nanos) { return BucketUtil.hash(DateTimeUtil.nanosToMicros(nanos)); } }

We can see that it also calls floorDiv inside:

public static long nanosToMicros(long nanos) { return Math.floorDiv(nanos, NANOS_PER_MICRO); }

So my original (AI) suggestion was wrong. Please follow the same approach to use floorDiv here.

wgtmac · 2026-05-22T05:56:12Z

+template <>
+int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) {
+  return BucketUtils::HashLong(std::get<int64_t>(literal.value()));
+}


Perhaps it is worth adding a dedicated utility class/file for temporal types just like Java for reuse.

feat: Add timestamp nanosecond primitive types

342bdcb

zhjwpku requested a review from wgtmac May 17, 2026 04:33

wgtmac reviewed May 20, 2026

View reviewed changes

Comment thread src/iceberg/util/transform_util.cc Outdated

fix: bucket transform and Human readable timestamps

e16f2df

wgtmac reviewed May 22, 2026

View reviewed changes

wgtmac requested changes May 22, 2026

View reviewed changes

Conversation

zhjwpku commented May 17, 2026

Uh oh!

zhjwpku commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wgtmac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhjwpku commented May 17, 2026 •

edited

Loading