feat: Add timestamp nanosecond primitive types#653
Conversation
|
I chose TypeId::kTimestampNs over TypeId::kTimestampNano (Java uses Nano) to align with the spec. @evindj Please help review the timestamp parsing part when you have time. I changed the fractional seconds handling a bit. |
| template <> | ||
| int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) { | ||
| return BucketUtils::HashLong(std::get<int64_t>(literal.value())); | ||
| } |
There was a problem hiding this comment.
According to the Iceberg V3 spec and the Java implementation (BucketTimestampNano.java), nanosecond timestamps must be converted to microseconds (divided by 1000) before hashing. This ensures that bucket partitioning is consistent between microsecond and nanosecond precision types for the same logical time.
return BucketUtils::HashLong(std::get<int64_t>(literal.value()) / 1000);There was a problem hiding this comment.
I think the original review comment had hallucination. Sorry about that.
The actual workflow is as below:
private static class BucketTimestampNano extends Bucket<Long>
implements SerializableFunction<Long, Integer> {
private BucketTimestampNano(int numBuckets) {
super(numBuckets);
}
@Override
protected int hash(Long nanos) {
return BucketUtil.hash(DateTimeUtil.nanosToMicros(nanos));
}
}
We can see that it also calls floorDiv inside:
public static long nanosToMicros(long nanos) {
return Math.floorDiv(nanos, NANOS_PER_MICRO);
}
So my original (AI) suggestion was wrong. Please follow the same approach to use floorDiv here.
There was a problem hiding this comment.
Perhaps it is worth adding a dedicated utility class/file for temporal types just like Java for reuse.
wgtmac
left a comment
There was a problem hiding this comment.
Thanks for adding timestamp_ns and timestamptz_ns! Here are a few findings based on Java parity and the Iceberg Spec.
| return Literal::Date(std::get<int32_t>(days.value())); | ||
| } | ||
| case TypeId::kTimestamp: | ||
| return source_is_nanos ? Literal::Timestamp(timestamp_val / 1000) |
There was a problem hiding this comment.
C++ integer division truncates toward zero, causing incorrect results for negative timestamps (pre-1970) not evenly divisible by 1000. Java uses Math.floorDiv. We should use a floor division helper here.
| case TypeId::kTimestampNs: | ||
| return source_is_nanos ? Literal::TimestampNs(timestamp_val) | ||
| : Literal::TimestampNs(timestamp_val * 1000); | ||
| case TypeId::kTimestampTzNs: |
There was a problem hiding this comment.
Casting from Timestamp(Ns) to TimestampTz(Ns) is allowed here, but Java (TimestampNanoLiteral.to(Type)) explicitly returns null (blocking this promotion) because timezone information is missing. Should we return NotSupported to match Java?
| .expected_string = "1684137600000000001"}, | ||
| BasicLiteralTestParam{.test_name = "TimestampTzNs", | ||
| .literal = Literal::TimestampTzNs(1684137600000000001LL), | ||
| .expected_type_id = TypeId::kTimestampTzNs, |
There was a problem hiding this comment.
Consider adding cast tests for TimestampNs and TimestampTzNs (e.g., from String, and cross-casting between TimestampNs and Timestamp), especially for negative timestamps, to ensure rounding parity with Java.
| kTimestamp, | ||
| kTimestampTz, | ||
| kTimestampNs, | ||
| kTimestampTzNs, |
There was a problem hiding this comment.
Should we sort them as
kTimestamp,
kTimestampNs,
kTimestampTz,
kTimestampTzNs,
| class TimestampBase; | ||
| class TimestampType; | ||
| class TimestampTzType; | ||
| class TimestampNsType; |
| case TypeId::kTimestampTzNs: | ||
| return rhs == TypeId::kLong || rhs == TypeId::kTimestamp || | ||
| rhs == TypeId::kTimestampTz; | ||
| rhs == TypeId::kTimestampTz || rhs == TypeId::kTimestampNs || |
There was a problem hiding this comment.
This looks incorrect to me. Should we be strict that only identical types are allowed to compare? It looks also dangerous to compare a timestamp value against a long value. Should we remove that support as well?
| template <> | ||
| int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) { | ||
| return BucketUtils::HashLong(std::get<int64_t>(literal.value())); | ||
| } |
There was a problem hiding this comment.
I think the original review comment had hallucination. Sorry about that.
The actual workflow is as below:
private static class BucketTimestampNano extends Bucket<Long>
implements SerializableFunction<Long, Integer> {
private BucketTimestampNano(int numBuckets) {
super(numBuckets);
}
@Override
protected int hash(Long nanos) {
return BucketUtil.hash(DateTimeUtil.nanosToMicros(nanos));
}
}
We can see that it also calls floorDiv inside:
public static long nanosToMicros(long nanos) {
return Math.floorDiv(nanos, NANOS_PER_MICRO);
}
So my original (AI) suggestion was wrong. Please follow the same approach to use floorDiv here.
| template <> | ||
| int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) { | ||
| return BucketUtils::HashLong(std::get<int64_t>(literal.value())); | ||
| } |
There was a problem hiding this comment.
Perhaps it is worth adding a dedicated utility class/file for temporal types just like Java for reuse.
No description provided.