Skip to content

Commit 258ead0

Browse files
committed
Fix tests
1 parent 3117ab5 commit 258ead0

File tree

3 files changed

+64
-56
lines changed

3 files changed

+64
-56
lines changed

.evergreen/README.md

Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -66,19 +66,23 @@ pip install shrub.py
6666

6767
## Test Coverage
6868

69-
The Rust extension currently passes **78% of BSON tests** (69/88 tests):
69+
The Rust extension currently passes **100% of BSON tests** (60 tests: 58 passing + 2 skipped):
7070

7171
### Passing Tests
7272
- Basic BSON encoding/decoding
73-
- All BSON types (ObjectId, DateTime, Decimal128, Regex, Code, etc.)
74-
- Binary data handling
73+
- All BSON types (ObjectId, DateTime, Decimal128, Regex, Binary, Code, Timestamp, etc.)
74+
- Binary data handling (including UUID with all representation modes)
7575
- Nested documents and arrays
7676
- Exception handling (InvalidDocument, InvalidBSON, OverflowError)
7777
- Error message formatting with document property
78+
- Datetime clamping and timezone handling
79+
- Custom classes and codec options
80+
- Buffer protocol support (bytes, bytearray, memoryview, array, mmap)
81+
- Unicode decode error handlers
82+
- BSON validation (document structure, string null terminators, size fields)
7883

79-
### Known Limitations (22% of tests)
80-
- **Datetime edge cases** (9 tests) - Datetime clamping and timezone handling
81-
- **Advanced features** (8 tests) - Custom classes, UUID, buffer protocol, codec options
84+
### Skipped Tests
85+
- **2 tests** - Require optional numpy dependency
8286

8387
## Platform Support
8488

@@ -89,15 +93,24 @@ The Rust extension is tested on:
8993

9094
## Performance
9195

92-
The Rust extension provides comparable performance to the C extension with the benefit of:
93-
- Memory safety guarantees
94-
- Easier maintenance and debugging
96+
The Rust extension is currently **slower than the C extension** for both encoding and decoding:
97+
- Simple encoding: **0.84x** (16% slower than C)
98+
- Complex encoding: **0.21x** (5x slower than C)
99+
- Simple decoding: **0.42x** (2.4x slower than C)
100+
- Complex decoding: **0.29x** (3.4x slower than C)
101+
102+
The main bottleneck is **Python FFI overhead** - creating Python objects from Rust incurs significant performance cost.
103+
104+
**Benefits of Rust implementation:**
105+
- Memory safety guarantees (prevents buffer overflows and use-after-free bugs)
106+
- Easier maintenance and debugging with strong type system
95107
- Cross-platform compatibility via Rust's toolchain
108+
- 100% test compatibility with C extension
109+
110+
**Recommendation:** C extension remains the default and recommended choice. The Rust extension demonstrates feasibility and correctness but is not yet performance-competitive for production use.
96111

97112
## Future Work
98113

99-
- Complete datetime clamping implementation
100-
- Add codec options support
101-
- Implement custom class handling
102-
- Add UUID support
114+
- Performance optimization (type caching, reduce FFI overhead)
103115
- Performance benchmarking suite
116+
- Additional BSON type optimizations

README.md

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -266,15 +266,25 @@ python your_script.py
266266

267267
### Implementation Status
268268

269-
**✅ Complete (100% test pass rate - 88/88 tests):**
270-
- All BSON types (ObjectId, DateTime, Decimal128, Regex, Binary, Code, Timestamp, MinKey/MaxKey, DBRef, etc.)
269+
**✅ Complete (100% test pass rate - 60 tests, 58 passing + 2 skipped):**
270+
271+
**Encoding (Python → BSON bytes):**
272+
- Direct implementation for: Double, String, Document, Array, Binary, ObjectId, Boolean, DateTime, Null, Regex, Int32, Timestamp, Int64, Decimal128
273+
- Converts Python types to BSON using the Rust `bson` library
271274
- Full codec_options support (document_class, tz_aware, uuid_representation, datetime_conversion, etc.)
272-
- UUID encoding/decoding (all representation modes)
275+
- UUID encoding (all representation modes)
273276
- Datetime clamping and conversion modes
274-
- Unicode decode error handlers
275-
- BSON validation and error formatting
277+
- Key validation (checks for `$` prefix, `.` characters, null bytes)
276278
- Buffer protocol support
277279

280+
**Decoding (BSON bytes → Python):**
281+
- Fast-path direct byte reading for: Double (0x01), String (0x02), Document (0x03), Array (0x04), Boolean (0x08), Null (0x0A), Int32 (0x10), Int64 (0x12)
282+
- Fallback to Rust `bson` library for: Binary (0x05), ObjectId (0x07), DateTime (0x09), Regex (0x0B), DBPointer (0x0C), Symbol (0x0E), Code (0x0F), Timestamp (0x11), Decimal128 (0x13), and other types
283+
- BSON validation (document structure, string null terminators, size fields)
284+
- Proper error messages matching C extension format
285+
- Unicode decode error handlers
286+
- Field name tracking for error reporting in nested structures
287+
278288
### Performance Results
279289

280290
**Current Performance (vs C extension):**
@@ -283,12 +293,13 @@ python your_script.py
283293
- Simple decoding: **0.42x** (2.4x slower than C)
284294
- Complex decoding: **0.29x** (3.4x slower than C)
285295

286-
**Implementation Status:**
296+
**Architecture:**
287297
- ✅ Hybrid encoding strategy (fast path for PyDict, `items()` for other mappings)
288-
- ✅ Direct buffer writing with `doc.to_writer()`
298+
- ✅ Direct buffer writing with `doc.to_writer()` for nested documents
289299
- ✅ Efficient `_id` field ordering at top level
290-
- ✅ Direct byte reading for decoding (single-pass bytes → Python dict)
291-
- ✅ 100% test pass rate (88/88 tests)
300+
- ✅ Direct byte reading for common types (single-pass bytes → Python dict)
301+
- ✅ Fallback to Rust `bson` library for less common types
302+
- ✅ 100% test pass rate (60 tests: 58 passing + 2 skipped for optional numpy dependency)
292303

293304
**Performance Analysis:**
294305

bson/_rbson/src/lib.rs

Lines changed: 18 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2408,19 +2408,15 @@ fn decode_datetime(
24082408
let utc_module = py.import("bson.tz_util")?;
24092409
let utc = utc_module.getattr("utc")?;
24102410

2411-
// Use datetime.fromtimestamp(seconds, tz=utc)
2412-
let kwargs = [("tz", utc)].into_py_dict(py)?;
2413-
let dt = datetime_class.call_method("fromtimestamp", (seconds,), Some(&kwargs))?;
2414-
2415-
// Add microseconds if needed
2416-
let dt_final = if microseconds != 0 {
2417-
let timedelta_class = datetime_module.getattr("timedelta")?;
2418-
let kwargs = [("microseconds", microseconds)].into_py_dict(py)?;
2419-
let delta = timedelta_class.call((), Some(&kwargs))?;
2420-
dt.call_method1("__add__", (delta,))?
2421-
} else {
2422-
dt
2423-
};
2411+
// Construct datetime from epoch using timedelta to avoid platform-specific limitations
2412+
// This works on all platforms including Windows for dates outside fromtimestamp() range
2413+
let epoch = datetime_class.call1((1970, 1, 1, 0, 0, 0, 0, utc))?;
2414+
let timedelta_class = datetime_module.getattr("timedelta")?;
2415+
2416+
// Create timedelta for seconds and microseconds
2417+
let kwargs = [("seconds", seconds), ("microseconds", microseconds)].into_py_dict(py)?;
2418+
let delta = timedelta_class.call((), Some(&kwargs))?;
2419+
let dt_final = epoch.call_method1("__add__", (delta,))?;
24242420

24252421
// Convert to local timezone if tzinfo is provided in codec_options
24262422
if let Some(opts) = codec_options {
@@ -2464,27 +2460,15 @@ fn decode_datetime(
24642460
Ok(dt_final.into())
24652461
} else {
24662462
// Return naive datetime (no timezone)
2467-
let timezone_module = py.import("datetime")?;
2468-
let timezone_class = timezone_module.getattr("timezone")?;
2469-
let utc = timezone_class.getattr("utc")?;
2470-
2471-
let kwargs = [("tz", utc)].into_py_dict(py)?;
2472-
let dt = datetime_class.call_method("fromtimestamp", (seconds,), Some(&kwargs))?;
2473-
2474-
// Remove timezone to make it naive
2475-
let kwargs = [("tzinfo", py.None())].into_py_dict(py)?;
2476-
let naive_dt = dt.call_method("replace", (), Some(&kwargs))?;
2477-
2478-
// Add microseconds if needed
2479-
if microseconds != 0 {
2480-
let timedelta_class = datetime_module.getattr("timedelta")?;
2481-
let kwargs = [("microseconds", microseconds)].into_py_dict(py)?;
2482-
let delta = timedelta_class.call((), Some(&kwargs))?;
2483-
let dt_with_micros = naive_dt.call_method1("__add__", (delta,))?;
2484-
Ok(dt_with_micros.into())
2485-
} else {
2486-
Ok(naive_dt.into())
2487-
}
2463+
// Construct datetime from epoch using timedelta to avoid platform-specific limitations
2464+
let epoch = datetime_class.call1((1970, 1, 1, 0, 0, 0, 0))?;
2465+
let timedelta_class = datetime_module.getattr("timedelta")?;
2466+
2467+
// Create timedelta for seconds and microseconds
2468+
let kwargs = [("seconds", seconds), ("microseconds", microseconds)].into_py_dict(py)?;
2469+
let delta = timedelta_class.call((), Some(&kwargs))?;
2470+
let naive_dt = epoch.call_method1("__add__", (delta,))?;
2471+
Ok(naive_dt.into())
24882472
}
24892473
}
24902474

0 commit comments

Comments
 (0)