Skip to content

Conversation

Copy link

Copilot AI commented Dec 27, 2025

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.

Description

CQL timestamp values (signed milliseconds from epoch) lose precision when they are further than ~300 years from Unix epoch due to floating-point conversions during serialization/deserialization. This causes issues where a timestamp retrieved from the database cannot be used to query for the same row.

Problem

The driver was converting int64 milliseconds to floating-point values during serialization/deserialization, causing precision loss for timestamps more than ~300 years from the Unix epoch (the maximum number of milliseconds that can be exactly represented in a double).

Example scenario:

  • Insert timestamp "2300-01-01 00:00:00.001"
  • Retrieve it back: "2300-01-01 00:00:00.000999" (precision lost)
  • Use retrieved value in a query: fails because it's serialized as "2300-01-01 00:00:00.000"

Root Causes

  1. Deserialization divides int64 milliseconds by 1000.0 (float), losing precision
  2. Serialization uses floating-point arithmetic (*1e3, /1e3) to calculate milliseconds
  3. Both Python and Cython implementations had this issue

Solution

Implemented integer-only arithmetic throughout the timestamp conversion path:

  1. Added new conversion functions:

    • datetime_from_timestamp_ms() in cassandra/util.py (Python path)
    • datetime_from_timestamp_ms() in cassandra/cython_utils.pyx (Cython path)
    • Both functions use integer arithmetic exclusively to avoid precision loss
  2. Updated DateType serialization/deserialization:

    • cassandra/cqltypes.py: Changed DateType.deserialize() to call the new function directly on milliseconds (no division)
    • cassandra/cqltypes.py: Changed DateType.serialize() to use integer operations (*1000, //1000) instead of float operations (*1e3, /1e3)
    • cassandra/deserializers.pyx: Updated Cython deserializer to use the new integer-based function
  3. Added comprehensive test coverage:

    • Created tests/unit/test_timestamp_precision.py with 4 tests:
      • Large timestamp roundtrip (year 2300)
      • Various large timestamps including negative values (year 1640 BCE)
      • Small timestamps to ensure no regression
      • The exact scenario from the issue report

Testing

  • ✅ All new tests pass (4/4)
  • ✅ All existing DateType tests pass
  • ✅ All unit type tests pass (62 passed, 1 skipped)
  • ✅ No regressions detected
  • ✅ Issue scenario validated - timestamps like "2300-01-01 00:00:00.001" now maintain full precision through serialization/deserialization cycles

The changes are minimal and surgical, affecting only the timestamp conversion logic without altering any other functionality.

Original prompt

This section details on the original issue you should resolve

<issue_title>Large values of CQL type timestamp lose precision due to some floating-point conversions along the way</issue_title>
<issue_description>timestamp CQL values are integers (signed milliseconds from epoch), but they get converted to double somewhere along the way, so they lose precision if they are further than ~300 years from the Unix epoch. (~300 years is the max number of milliseconds which can be exactly represented in a double).

In particular, you can e.g. select a timestamp key from a table, then use the obtained key to query the table for that particular key, and the query will return an empty result.

Consider:

    keyspace_name = "ks"
    table_name = "ks"
    session.execute(f"DROP KEYSPACE IF EXISTS {keyspace_name}")
    session.execute(f"""
        CREATE KEYSPACE IF NOT EXISTS {keyspace_name}
        WITH replication = {{'class': 'SimpleStrategy', 'replication_factor': 1}}
    """)
    session.execute(f"""
        CREATE TABLE {keyspace_name}.{table_name} (
            pk timestamp PRIMARY KEY
        )
    """)
    test_timestamp = "2300-01-01 00:00:00.001"
    insert = session.execute(f"INSERT INTO {keyspace_name}.{table_name}(pk) VALUES('{test_timestamp}')")

    scan_result = session.execute(f"SELECT * FROM {keyspace_name}.{table_name}")
    scan_rows = list(scan_result)
    assert len(scan_rows) == 1

    retrieved_timestamp = scan_rows[0].pk
    logger.debug(f"Original timestamp: {test_timestamp}")
    logger.debug(f"Retrieved timestamp: {retrieved_timestamp}")

    key_rows = list(session.execute(f"SELECT * FROM {keyspace_name}.{table_name} WHERE pk = '{test_timestamp}'"))
    assert len(key_rows) == 1

    key_select = session.prepare(f"SELECT * FROM {keyspace_name}.{table_name} WHERE pk = ?")
    key_rows = list(session.execute(key_select, [retrieved_timestamp]))
    assert len(key_rows) == 1 # FAIL

    session.execute(f"DROP KEYSPACE {keyspace_name}")

This will give the log output:

2025-09-11 23:12:40,117 - DEBUG - Original timestamp: 2300-01-01 00:00:00.001
2025-09-11 23:12:40,117 - DEBUG - Retrieved timestamp: 2300-01-01 00:00:00.000999

And the test will fail on the assert marked with the FAIL comment, because the actual key is 2300-01-01 00:00:00.001, the key converted to Python native types is 2300-01-01 00:00:00.001 - epsilon, and the key sent back with the last SELECT is 2300-01-01 00:00:00.000.</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
@mykaul
Copy link

mykaul commented Dec 27, 2025

@copilot please complete the work on the PR

Copy link
Author

Copilot AI commented Dec 27, 2025

@copilot please complete the work on the PR

The PR is now complete. All changes have been implemented and tested:

Changes made (commit fbab5cd):

  • Added integer-based timestamp conversion functions to avoid floating-point precision loss
  • Updated serialization/deserialization in both Python and Cython code paths
  • Added comprehensive test suite that validates the fix

Testing results:

  • New timestamp precision tests: ✅ All 4 tests pass
  • Existing DateType tests: ✅ Pass
  • All unit type tests (63 tests): ✅ 62 passed, 1 skipped
  • Issue scenario validation: ✅ Timestamps far from epoch now maintain full precision

The fix ensures that timestamps like "2300-01-01 00:00:00.001" can be correctly round-tripped through serialization/deserialization without losing precision.

Copilot AI changed the title [WIP] Fix precision loss for large CQL timestamp values Fix precision loss for large CQL timestamp values Dec 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large values of CQL type timestamp lose precision due to some floating-point conversions along the way

2 participants