Skip to content

feat: Add timestamp nanosecond primitive types#653

Open
zhjwpku wants to merge 2 commits into
apache:mainfrom
zhjwpku:iceberg-v3-timestamp_ns-timestamptz_ns_types
Open

feat: Add timestamp nanosecond primitive types#653
zhjwpku wants to merge 2 commits into
apache:mainfrom
zhjwpku:iceberg-v3-timestamp_ns-timestamptz_ns_types

Conversation

@zhjwpku
Copy link
Copy Markdown
Collaborator

@zhjwpku zhjwpku commented May 17, 2026

No description provided.

@zhjwpku
Copy link
Copy Markdown
Collaborator Author

zhjwpku commented May 17, 2026

I chose TypeId::kTimestampNs over TypeId::kTimestampNano (Java uses Nano) to align with the spec. @evindj Please help review the timestamp parsing part when you have time. I changed the fractional seconds handling a bit.

@zhjwpku zhjwpku requested a review from wgtmac May 17, 2026 04:33
template <>
int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) {
return BucketUtils::HashLong(std::get<int64_t>(literal.value()));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Iceberg V3 spec and the Java implementation (BucketTimestampNano.java), nanosecond timestamps must be converted to microseconds (divided by 1000) before hashing. This ensures that bucket partitioning is consistent between microsecond and nanosecond precision types for the same logical time.

return BucketUtils::HashLong(std::get<int64_t>(literal.value()) / 1000);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original review comment had hallucination. Sorry about that.

The actual workflow is as below:

  private static class BucketTimestampNano extends Bucket<Long>
      implements SerializableFunction<Long, Integer> {

    private BucketTimestampNano(int numBuckets) {
      super(numBuckets);
    }

    @Override
    protected int hash(Long nanos) {
      return BucketUtil.hash(DateTimeUtil.nanosToMicros(nanos));
    }
  }

We can see that it also calls floorDiv inside:

  public static long nanosToMicros(long nanos) {
    return Math.floorDiv(nanos, NANOS_PER_MICRO);
  }

So my original (AI) suggestion was wrong. Please follow the same approach to use floorDiv here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is worth adding a dedicated utility class/file for temporal types just like Java for reuse.

Comment thread src/iceberg/util/transform_util.cc Outdated
Copy link
Copy Markdown
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding timestamp_ns and timestamptz_ns! Here are a few findings based on Java parity and the Iceberg Spec.

return Literal::Date(std::get<int32_t>(days.value()));
}
case TypeId::kTimestamp:
return source_is_nanos ? Literal::Timestamp(timestamp_val / 1000)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ integer division truncates toward zero, causing incorrect results for negative timestamps (pre-1970) not evenly divisible by 1000. Java uses Math.floorDiv. We should use a floor division helper here.

case TypeId::kTimestampNs:
return source_is_nanos ? Literal::TimestampNs(timestamp_val)
: Literal::TimestampNs(timestamp_val * 1000);
case TypeId::kTimestampTzNs:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting from Timestamp(Ns) to TimestampTz(Ns) is allowed here, but Java (TimestampNanoLiteral.to(Type)) explicitly returns null (blocking this promotion) because timezone information is missing. Should we return NotSupported to match Java?

.expected_string = "1684137600000000001"},
BasicLiteralTestParam{.test_name = "TimestampTzNs",
.literal = Literal::TimestampTzNs(1684137600000000001LL),
.expected_type_id = TypeId::kTimestampTzNs,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding cast tests for TimestampNs and TimestampTzNs (e.g., from String, and cross-casting between TimestampNs and Timestamp), especially for negative timestamps, to ensure rounding parity with Java.

Comment thread src/iceberg/type_fwd.h
Comment on lines 47 to +50
kTimestamp,
kTimestampTz,
kTimestampNs,
kTimestampTzNs,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we sort them as

  kTimestamp,
  kTimestampNs,
  kTimestampTz,
  kTimestampTzNs,

Comment thread src/iceberg/type_fwd.h
class TimestampBase;
class TimestampType;
class TimestampTzType;
class TimestampNsType;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

case TypeId::kTimestampTzNs:
return rhs == TypeId::kLong || rhs == TypeId::kTimestamp ||
rhs == TypeId::kTimestampTz;
rhs == TypeId::kTimestampTz || rhs == TypeId::kTimestampNs ||
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect to me. Should we be strict that only identical types are allowed to compare? It looks also dangerous to compare a timestamp value against a long value. Should we remove that support as well?

template <>
int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) {
return BucketUtils::HashLong(std::get<int64_t>(literal.value()));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original review comment had hallucination. Sorry about that.

The actual workflow is as below:

  private static class BucketTimestampNano extends Bucket<Long>
      implements SerializableFunction<Long, Integer> {

    private BucketTimestampNano(int numBuckets) {
      super(numBuckets);
    }

    @Override
    protected int hash(Long nanos) {
      return BucketUtil.hash(DateTimeUtil.nanosToMicros(nanos));
    }
  }

We can see that it also calls floorDiv inside:

  public static long nanosToMicros(long nanos) {
    return Math.floorDiv(nanos, NANOS_PER_MICRO);
  }

So my original (AI) suggestion was wrong. Please follow the same approach to use floorDiv here.

template <>
int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) {
return BucketUtils::HashLong(std::get<int64_t>(literal.value()));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is worth adding a dedicated utility class/file for temporal types just like Java for reuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants