Skip to content

internal: use more core::hint::assert_unchecked #6132

Open
chirizxc wants to merge 14 commits into
PyO3:mainfrom
chirizxc:patch-1
Open

internal: use more core::hint::assert_unchecked #6132
chirizxc wants to merge 14 commits into
PyO3:mainfrom
chirizxc:patch-1

Conversation

@chirizxc

@chirizxc chirizxc commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@chirizxc chirizxc changed the title add assert_unchecked to BoundTupleIterator and BorrowedTupleIterator len() internal: add assert_unchecked to BoundTupleIterator and BorrowedTupleIterator len() Jun 12, 2026
@chirizxc chirizxc changed the title internal: add assert_unchecked to BoundTupleIterator and BorrowedTupleIterator len() internal: use more core::hint::assert_unchecked Jun 12, 2026
@codspeed-hq

codspeed-hq Bot commented Jun 15, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 126 untouched benchmarks


Comparing chirizxc:patch-1 (33997b5) with main (771558e)

Open in CodSpeed

@davidhewitt

Copy link
Copy Markdown
Member

Thanks for this PR. I am undecided how I feel about this. On the one hand, squeezing out efficiency and performance is always desirable.

On the other hand, this is unsafe, and quoting the std docs:

This is a situational tool for micro-optimization, and is allowed to do nothing. Any use should come with a repeatable benchmark to show the value, with the expectation to drop it later should the optimizer get smarter and no longer need it.

Accordingly I'd prefer we were absolutely certain these make sense before merging. PyO3 has enough unsafe code to audit without introducing more.

I'd value other maintainers' opinions on this.

@davidhewitt

Copy link
Copy Markdown
Member

To clarify my position, I think I have a slight leaning that I'd prefer not to have these assertions, however can be persuaded to keep them if we can demonstrate they have a meaningful effect on final codegen.

@Tpt

Tpt commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

+1 to @davidhewitt Such assertions feels very much like "last percent of performance squeezing" and we should make sure we have done all the simpler things first and have proper benchmarking.

Also I see often

        unsafe { core::hint::assert_unchecked(self.index <= self.length) };
        self.length.saturating_sub(self.index)

how is it better than just the plain

        self.length - self.index

?

@chirizxc

chirizxc commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

I used cargo asm

cargo install cargo-show-asm
Cargo.toml
[package]
name = "pyo3_test"
version = "0.1.0"
edition = "2024"

[dependencies]
jiff = "0.2.28"
pyo3 = { version = "0.29.0", features = ["jiff-02"]}
src/main.rs
#![allow(dead_code)]

use jiff::civil::DateTime;
use jiff::tz::TimeZone;
use pyo3::types::PyDateTime;
use pyo3::{Bound, IntoPyObject, PyResult, Python};

#[inline(never)]
#[unsafe(no_mangle)]
fn datetime_to_pydatetime<'py>(
    py: Python<'py>,
    datetime: DateTime,
    fold: bool,
    timezone: Option<&TimeZone>,
) -> PyResult<Bound<'py, PyDateTime>> {
    PyDateTime::new_with_fold(
        py,
        datetime.year().into(),
        datetime.month().try_into()?,
        datetime.day().try_into()?,
        datetime.hour().try_into()?,
        datetime.minute().try_into()?,
        datetime.second().try_into()?,
        (datetime.subsec_nanosecond() / 1000).try_into()?,
        timezone
            .map(|tz| tz.into_pyobject(py))
            .transpose()?
            .as_ref(),
        fold,
    )
}

#[inline(never)]
#[unsafe(no_mangle)]
fn datetime_to_pydatetime_assert_unchecked<'py>(
    py: Python<'py>,
    datetime: DateTime,
    fold: bool,
    timezone: Option<&TimeZone>,
) -> PyResult<Bound<'py, PyDateTime>> {
    let micros = datetime.subsec_nanosecond() / 1000;
    // SAFETY: `subsec_nanosecond()` [0, 999_999_999], after / 1000 always non-negative
    unsafe { core::hint::assert_unchecked(micros >= 0) };
    PyDateTime::new_with_fold(
        py,
        datetime.year().into(),
        datetime.month().try_into()?,
        datetime.day().try_into()?,
        datetime.hour().try_into()?,
        datetime.minute().try_into()?,
        datetime.second().try_into()?,
        micros as u32,
        timezone
            .map(|tz| tz.into_pyobject(py))
            .transpose()?
            .as_ref(),
        fold,
    )
}

fn main() {}
Output
❯ cargo asm
                                                                                                                                               
    Finished `release` profile [optimized] target(s) in 0.10s
Try one of those by name or a sequence number
0 "<std::rt::lang_start<()>::{closure#0} as core::ops::function::FnOnce<()>>::call_once::{shim:vtable#0}" [23]
1 "core::ptr::drop_glue::<core::option::Option<pyo3::instance::Bound<pyo3::types::datetime::PyTzInfo>>>" [36]
2 "datetime_to_pydatetime" [271]
3 "datetime_to_pydatetime_assert_unchecked" [264]
4 "main" [19]
5 "pyo3_test::main" [7]
6 "std::rt::lang_start::<()>" [31]
7 "std::rt::lang_start::<()>::{closure#0}" [18]
8 "std::sys::backtrace::__rust_begin_short_backtrace::<fn(), ()>" [28]

##################################

❯ cargo asm 2                                                                                                                                                                                                                                                                                 
    Finished `release` profile [optimized] target(s) in 0.04s
.section .text,"xr",one_only,datetime_to_pydatetime,unique,6
        .globl  datetime_to_pydatetime
        .p2align        4
datetime_to_pydatetime:
        .cv_func_id 14
.seh_proc datetime_to_pydatetime
        .seh_handler __CxxFrameHandler3, @unwind, @except
        push rbp
        .seh_pushreg rbp
        push r15
        .seh_pushreg r15
        push r14
        .seh_pushreg r14
        push r13
        .seh_pushreg r13
        push r12
        .seh_pushreg r12
        push rsi
        .seh_pushreg rsi
        push rdi
        .seh_pushreg rdi
        push rbx
        .seh_pushreg rbx
        sub rsp, 168
        .seh_stackalloc 168
        lea rbp, [rsp + 128]
        .seh_setframe rbp, 128
        .seh_endprologue
        mov qword ptr [rbp + 32], -2
        mov rsi, rcx
        movzx ebx, byte ptr [rdx + 10]
        .cv_inline_site_id 15 within 14 inlined_at 8 19 0
        .cv_inline_site_id 16 within 15 inlined_at 10 819 0
        test bl, bl
        js .LBB6_1
        movzx edi, byte ptr [rdx + 11]
        .cv_inline_site_id 17 within 14 inlined_at 8 20 0
        .cv_inline_site_id 18 within 17 inlined_at 10 819 0
        test dil, dil
        js .LBB6_1
        movzx r15d, byte ptr [rdx + 4]
        .cv_inline_site_id 19 within 14 inlined_at 8 21 0
        .cv_inline_site_id 20 within 19 inlined_at 10 819 0
        test r15b, r15b
        js .LBB6_1
        movzx r12d, byte ptr [rdx + 5]
        .cv_inline_site_id 21 within 14 inlined_at 8 22 0
        .cv_inline_site_id 22 within 21 inlined_at 10 819 0
        test r12b, r12b
        js .LBB6_1
        movzx r13d, byte ptr [rdx + 6]
        .cv_inline_site_id 23 within 14 inlined_at 8 23 0
        .cv_inline_site_id 24 within 23 inlined_at 10 819 0
        test r13b, r13b
        js .LBB6_1
        movsxd r14, dword ptr [rdx]
        cmp r14, -1000
        jg .LBB6_7
.LBB6_1:
        .cv_inline_site_id 25 within 14 inlined_at 8 0 0
        lea rcx, [rsi + 8]
        mov dl, 3
        call <pyo3::err::PyErr as core::convert::From<core::num::error::TryFromIntError>>::from
.LBB6_17:
        mov qword ptr [rsi], 1
.LBB6_18:
        mov rax, rsi
        .seh_startepilogue
        add rsp, 168
        pop rbx
        pop rdi
        pop rsi
        pop r12
        pop r13
        pop r14
        pop r15
        pop rbp
        .seh_endepilogue
        ret
.LBB6_7:
        .cv_inline_site_id 26 within 14 inlined_at 8 26 0
        movsx edx, word ptr [rdx + 8]
        test r9, r9
        je .LBB6_8
        .cv_inline_site_id 27 within 26 inlined_at 12 1162 0
        mov dword ptr [rbp + 20], edx
        mov byte ptr [rbp + 24], r8b
        lea rcx, [rbp - 48]
        mov rdx, r9
        call <&jiff::tz::timezone::TimeZone as pyo3::conversion::IntoPyObject>::into_pyobject
        .cv_inline_site_id 28 within 14 inlined_at 8 27 0
        mov rax, qword ptr [rbp - 48]
        cmp rax, -1
        je .LBB6_10
        test rax, rax
        movzx r8d, byte ptr [rbp + 24]
        jne .LBB6_16
        mov r9, qword ptr [rbp - 40]
        jmp .LBB6_13
.LBB6_8:
        xor r9d, r9d
        jmp .LBB6_14
.LBB6_10:
        xor r9d, r9d
        movzx r8d, byte ptr [rbp + 24]
.LBB6_13:
        mov edx, dword ptr [rbp + 20]
.LBB6_14:
        imul rax, r14, 274877907
        mov rcx, rax
        shr rcx, 63
        sar rax, 38
        add eax, ecx
        mov qword ptr [rbp + 8], r9
        .cv_inline_site_id 29 within 14 inlined_at 8 28 0
        test r9, r9
        lea rcx, [rbp + 8]
        mov qword ptr [rbp + 24], r9
        cmove rcx, r9
        mov byte ptr [rsp + 72], r8b
        mov qword ptr [rsp + 64], rcx
        mov dword ptr [rsp + 56], eax
        mov byte ptr [rsp + 48], r13b
        mov byte ptr [rsp + 40], r12b
        mov byte ptr [rsp + 32], r15b
        mov rcx, rsi
        mov r8d, ebx
        mov r9d, edi
        call <pyo3::types::datetime::PyDateTime>::new_with_fold
        nop
        mov rcx, qword ptr [rbp + 24]
        call core::ptr::drop_glue::<core::option::Option<pyo3::instance::Bound<pyo3::types::datetime::PyTzInfo>>>
        jmp .LBB6_18
.LBB6_16:
        mov rax, qword ptr [rbp - 40]
        mov rcx, qword ptr [rbp]
        mov qword ptr [rsi + 48], rcx
        movups xmm0, xmmword ptr [rbp - 16]
        movups xmmword ptr [rsi + 32], xmm0
        movups xmm0, xmmword ptr [rbp - 32]
        movups xmmword ptr [rsi + 16], xmm0
        .cv_inline_site_id 30 within 14 inlined_at 8 25 0
        mov qword ptr [rsi + 8], rax
        jmp .LBB6_17
        .seh_handlerdata
        .long   $cppxdata$datetime_to_pydatetime@IMGREL
.section .text,"xr",one_only,datetime_to_pydatetime,unique,6
        .seh_endproc
        .def    "?dtor$19@?0?datetime_to_pydatetime@4HA";
        .scl    3;
        .type   32;
        .endef
        .p2align        4
"?dtor$19@?0?datetime_to_pydatetime@4HA":
.seh_proc "?dtor$19@?0?datetime_to_pydatetime@4HA"
        mov qword ptr [rsp + 16], rdx
        push rbp
        .seh_pushreg rbp
        push r15
        .seh_pushreg r15
        push r14
        .seh_pushreg r14
        push r13
        .seh_pushreg r13
        push r12
        .seh_pushreg r12
        push rsi
        .seh_pushreg rsi
        push rdi
        .seh_pushreg rdi
        push rbx
        .seh_pushreg rbx
        sub rsp, 88
        .seh_stackalloc 88
        lea rbp, [rdx + 128]
        .seh_endprologue
        mov rcx, qword ptr [rbp + 24]
        call core::ptr::drop_glue::<core::option::Option<pyo3::instance::Bound<pyo3::types::datetime::PyTzInfo>>>
        nop
        .seh_startepilogue
        add rsp, 88
        pop rbx
        pop rdi
        pop rsi
        pop r12
        pop r13
        pop r14
        pop r15
        pop rbp
        .seh_endepilogue
        ret
        
#########################################################

❯ cargo asm 3                                                                                                                                                                                                                                                                                 
    Finished `release` profile [optimized] target(s) in 0.04s
.section .text,"xr",one_only,datetime_to_pydatetime_assert_unchecked,unique,7
        .globl  datetime_to_pydatetime_assert_unchecked
        .p2align        4
datetime_to_pydatetime_assert_unchecked:
        .cv_func_id 31
.seh_proc datetime_to_pydatetime_assert_unchecked
        .seh_handler __CxxFrameHandler3, @unwind, @except
        push rbp
        .seh_pushreg rbp
        push r15
        .seh_pushreg r15
        push r14
        .seh_pushreg r14
        push r13
        .seh_pushreg r13
        push r12
        .seh_pushreg r12
        push rsi
        .seh_pushreg rsi
        push rdi
        .seh_pushreg rdi
        push rbx
        .seh_pushreg rbx
        sub rsp, 168
        .seh_stackalloc 168
        lea rbp, [rsp + 128]
        .seh_setframe rbp, 128
        .seh_endprologue
        mov qword ptr [rbp + 32], -2
        mov rsi, rcx
        movzx ebx, byte ptr [rdx + 10]
        .cv_inline_site_id 32 within 31 inlined_at 8 47 0
        .cv_inline_site_id 33 within 32 inlined_at 10 819 0
        test bl, bl
        js .LBB7_1
        movzx edi, byte ptr [rdx + 11]
        .cv_inline_site_id 34 within 31 inlined_at 8 48 0
        .cv_inline_site_id 35 within 34 inlined_at 10 819 0
        test dil, dil
        js .LBB7_1
        movzx r15d, byte ptr [rdx + 4]
        .cv_inline_site_id 36 within 31 inlined_at 8 49 0
        .cv_inline_site_id 37 within 36 inlined_at 10 819 0
        test r15b, r15b
        js .LBB7_1
        movzx r12d, byte ptr [rdx + 5]
        .cv_inline_site_id 38 within 31 inlined_at 8 50 0
        .cv_inline_site_id 39 within 38 inlined_at 10 819 0
        test r12b, r12b
        js .LBB7_1
        movzx r13d, byte ptr [rdx + 6]
        .cv_inline_site_id 40 within 31 inlined_at 8 51 0
        .cv_inline_site_id 41 within 40 inlined_at 10 819 0
        test r13b, r13b
        js .LBB7_1
        .cv_inline_site_id 42 within 31 inlined_at 8 54 0
        movsxd rax, dword ptr [rdx]
        movsx edx, word ptr [rdx + 8]
        test r9, r9
        je .LBB7_7
        .cv_inline_site_id 43 within 42 inlined_at 12 1162 0
        mov qword ptr [rbp + 16], rax
        mov dword ptr [rbp + 24], edx
        mov r14d, r8d
        lea rcx, [rbp - 48]
        mov rdx, r9
        call <&jiff::tz::timezone::TimeZone as pyo3::conversion::IntoPyObject>::into_pyobject
        .cv_inline_site_id 44 within 31 inlined_at 8 55 0
        mov rax, qword ptr [rbp - 48]
        cmp rax, -1
        je .LBB7_11
        test rax, rax
        jne .LBB7_15
        mov r8d, r14d
        mov r9, qword ptr [rbp - 40]
        jmp .LBB7_12
.LBB7_1:
        .cv_inline_site_id 45 within 31 inlined_at 8 0 0
        lea rcx, [rsi + 8]
        mov dl, 3
        call <pyo3::err::PyErr as core::convert::From<core::num::error::TryFromIntError>>::from
.LBB7_16:
        mov qword ptr [rsi], 1
.LBB7_17:
        mov rax, rsi
        .seh_startepilogue
        add rsp, 168
        pop rbx
        pop rdi
        pop rsi
        pop r12
        pop r13
        pop r14
        pop r15
        pop rbp
        .seh_endepilogue
        ret
.LBB7_7:
        xor r9d, r9d
        jmp .LBB7_8
.LBB7_11:
        xor r9d, r9d
        mov r8d, r14d
.LBB7_12:
        mov edx, dword ptr [rbp + 24]
        mov rax, qword ptr [rbp + 16]
.LBB7_8:
        imul rax, rax, 274877907
        mov rcx, rax
        shr rcx, 63
        sar rax, 38
        add eax, ecx
        mov qword ptr [rbp + 8], r9
        .cv_inline_site_id 46 within 31 inlined_at 8 56 0
        test r9, r9
        lea rcx, [rbp + 8]
        mov qword ptr [rbp + 24], r9
        cmove rcx, r9
        mov byte ptr [rsp + 72], r8b
        mov qword ptr [rsp + 64], rcx
        mov dword ptr [rsp + 56], eax
        mov byte ptr [rsp + 48], r13b
        mov byte ptr [rsp + 40], r12b
        mov byte ptr [rsp + 32], r15b
        mov rcx, rsi
        mov r8d, ebx
        mov r9d, edi
        call <pyo3::types::datetime::PyDateTime>::new_with_fold
        nop
        mov rcx, qword ptr [rbp + 24]
        call core::ptr::drop_glue::<core::option::Option<pyo3::instance::Bound<pyo3::types::datetime::PyTzInfo>>>
        jmp .LBB7_17
.LBB7_15:
        mov rax, qword ptr [rbp - 40]
        mov rcx, qword ptr [rbp]
        mov qword ptr [rsi + 48], rcx
        movups xmm0, xmmword ptr [rbp - 16]
        movups xmmword ptr [rsi + 32], xmm0
        movups xmm0, xmmword ptr [rbp - 32]
        movups xmmword ptr [rsi + 16], xmm0
        .cv_inline_site_id 47 within 31 inlined_at 8 53 0
        mov qword ptr [rsi + 8], rax
        jmp .LBB7_16
        .seh_handlerdata
        .long   $cppxdata$datetime_to_pydatetime_assert_unchecked@IMGREL
.section .text,"xr",one_only,datetime_to_pydatetime_assert_unchecked,unique,7
        .seh_endproc
        .def    "?dtor$18@?0?datetime_to_pydatetime_assert_unchecked@4HA";
        .scl    3;
        .type   32;
        .endef
        .p2align        4
"?dtor$18@?0?datetime_to_pydatetime_assert_unchecked@4HA":
.seh_proc "?dtor$18@?0?datetime_to_pydatetime_assert_unchecked@4HA"
        mov qword ptr [rsp + 16], rdx
        push rbp
        .seh_pushreg rbp
        push r15
        .seh_pushreg r15
        push r14
        .seh_pushreg r14
        push r13
        .seh_pushreg r13
        push r12
        .seh_pushreg r12
        push rsi
        .seh_pushreg rsi
        push rdi
        .seh_pushreg rdi
        push rbx
        .seh_pushreg rbx
        sub rsp, 88
        .seh_stackalloc 88
        lea rbp, [rdx + 128]
        .seh_endprologue
        mov rcx, qword ptr [rbp + 24]
        call core::ptr::drop_glue::<core::option::Option<pyo3::instance::Bound<pyo3::types::datetime::PyTzInfo>>>
        nop
        .seh_startepilogue
        add rsp, 88
        pop rbx
        pop rdi
        pop rsi
        pop r12
        pop r13
        pop r14
        pop r15
        pop rbp
        .seh_endepilogue
        ret

datetime_to_pydatetime contains the check cmp r14, -1000 and the conditional jump jg .LBB6_7 to validate the range during conversion via try_into()?

movsxd r14, dword ptr [rdx]
cmp r14, -1000
jg .LBB6_7

In datetime_to_pydatetime_assert_unchecked, this check is missing

movsxd rax, dword ptr [rdx]
movsx edx, word ptr [rdx + 8]

Similarly, number of instructions is reduced from 271 to 264

@chirizxc

chirizxc commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

For pytime_to_time:

Number of instructions: 198 -> 172

Without assert_unchecked:

movsx edi, byte ptr [rdx + 25]   # sign-extend hour
movsx ebx, byte ptr [rdx + 26]   # sign-extend minute
movsx ebp, byte ptr [rdx + 27]   # sign-extend second
test edi, edi
js .LBB0_11    # check for a negative value
test ebx, ebx
js .LBB0_11    # check for a negative value
test ebp, ebp
js .LBB0_11    # check for a negative value

With assert_unchecked:

movzx edi, word ptr [rdx + 25]   # zero-extend hour+minute

There are also no checks for negative values

Stack size:

without assert_unchecked: sub rsp, 40 (40 bytes)
with assert_unchecked: sub rsp, 32 (32 bytes)

Savings of 8 bytes of stack space due to more efficient data layout

src/main.rs
#![allow(dead_code)]

use jiff::civil::Time;
use pyo3::{Bound, PyResult};
use pyo3::types::{PyTime, PyTimeAccess};

#[inline(never)]
fn pytime_to_time(time: &impl PyTimeAccess) -> PyResult<Time> {
    Ok(Time::new(
        time.get_hour().try_into()?,
        time.get_minute().try_into()?,
        time.get_second().try_into()?,
        (time.get_microsecond() * 1000).try_into()?,
    )?)
}

#[inline(never)]
fn pytime_to_time_assert_unchecked(time: &impl PyTimeAccess) -> PyResult<Time> {
    // SAFETY: Python guarantees hour belongs to [0,23],
    // minute / second belong to [0,59], all < 128
    unsafe {
        core::hint::assert_unchecked(time.get_hour() < 128);
        core::hint::assert_unchecked(time.get_minute() < 128);
        core::hint::assert_unchecked(time.get_second() < 128);
    }
    Ok(Time::new(
        time.get_hour() as i8,
        time.get_minute() as i8,
        time.get_second() as i8,
        (time.get_microsecond() * 1000).try_into()?,
    )?)
}

#[inline(never)]
#[unsafe(no_mangle)]
fn checked(time: &Bound<'_, PyTime>) -> PyResult<Time> {
    pytime_to_time(time)
}

#[inline(never)]
#[unsafe(no_mangle)]
fn unchecked(time: &Bound<'_, PyTime>) -> PyResult<Time> {
    pytime_to_time_assert_unchecked(time)
}

fn main() {}
Output
❯ cargo asm 4                                                                                                                                                                                                                                                                                 
    Finished `release` profile [optimized] target(s) in 0.10s
.section .text,"xr",one_only,pyo3_test::pytime_to_time::<pyo3::instance::Bound<pyo3::types::datetime::PyTime>>,unique,0
        .p2align        4
pyo3_test::pytime_to_time::<pyo3::instance::Bound<pyo3::types::datetime::PyTime>>:
        .cv_func_id 0
.seh_proc _RINvCsftvnnhgnKoi_9pyo3_test14pytime_to_timeINtNtCsanQ9as8NDuY_4pyo38instance5BoundNtNtNtBL_5types8datetime6PyTimeEEB2_
        push rsi
        .seh_pushreg rsi
        push rdi
        .seh_pushreg rdi
        push rbp
        .seh_pushreg rbp
        push rbx
        .seh_pushreg rbx
        sub rsp, 40
        .seh_stackalloc 40
        .seh_endprologue
        mov rsi, rcx
        .cv_inline_site_id 1 within 0 inlined_at 1 10 0
        .cv_inline_site_id 2 within 1 inlined_at 3 623 0
        movsx edi, byte ptr [rdx + 25]
        test edi, edi
        js .LBB0_11
        .cv_inline_site_id 3 within 0 inlined_at 1 11 0
        .cv_inline_site_id 4 within 3 inlined_at 3 627 0
        movsx ebx, byte ptr [rdx + 26]
        test ebx, ebx
        js .LBB0_11
        .cv_inline_site_id 5 within 0 inlined_at 1 12 0
        .cv_inline_site_id 6 within 5 inlined_at 3 631 0
        movsx ebp, byte ptr [rdx + 27]
        test ebp, ebp
        js .LBB0_11
        .cv_inline_site_id 7 within 0 inlined_at 1 13 0
        .cv_inline_site_id 8 within 7 inlined_at 3 635 0
        movzx eax, byte ptr [rdx + 28]
        shl eax, 16
        movzx ecx, byte ptr [rdx + 29]
        shl ecx, 8
        or ecx, eax
        movzx eax, byte ptr [rdx + 30]
        or eax, ecx
        imul ecx, eax, 1000
        test ecx, ecx
        js .LBB0_11
        .cv_inline_site_id 9 within 0 inlined_at 1 9 0
        .cv_inline_site_id 10 within 9 inlined_at 5 280 0
        .cv_inline_site_id 11 within 10 inlined_at 4 103 0
        cmp dil, 24
        jae .LBB0_14
        .cv_inline_site_id 12 within 9 inlined_at 5 281 0
        .cv_inline_site_id 13 within 12 inlined_at 4 103 0
        cmp bl, 60
        jae .LBB0_15
        .cv_inline_site_id 14 within 9 inlined_at 5 282 0
        .cv_inline_site_id 15 within 14 inlined_at 4 103 0
        cmp bpl, 60
        jae .LBB0_16
        .cv_inline_site_id 16 within 9 inlined_at 5 283 0
        .cv_inline_site_id 17 within 16 inlined_at 4 103 0
        cmp ecx, 1000000000
        jae .LBB0_17
        shl rcx, 32
        .cv_inline_site_id 18 within 9 inlined_at 5 283 0
        test cl, 1
        je .LBB0_18
.LBB0_9:
        shr ecx, 8
.LBB0_10:
        .cv_inline_site_id 19 within 9 inlined_at 5 0 0
        call <jiff::error::Error as core::convert::From<jiff::util::b::BoundsError>>::from
        .cv_inline_site_id 20 within 0 inlined_at 1 9 0
        lea rcx, [rsi + 8]
        mov rdx, rax
        call <pyo3::err::PyErr as core::convert::From<jiff::error::Error>>::from
        jmp .LBB0_12
.LBB0_11:
        .cv_inline_site_id 21 within 0 inlined_at 1 0 0
        lea rcx, [rsi + 8]
        mov dl, 2
        call <pyo3::err::PyErr as core::convert::From<core::num::error::TryFromIntError>>::from
.LBB0_12:
        mov eax, 1
.LBB0_13:
        mov dword ptr [rsi], eax
        .seh_startepilogue
        add rsp, 40
        pop rbx
        pop rbp
        pop rdi
        pop rsi
        .seh_endepilogue
        ret
.LBB0_14:
        call <jiff::util::b::Hour as jiff::util::b::Bounds>::error
        mov ecx, eax
        jmp .LBB0_10
.LBB0_15:
        call <jiff::util::b::Minute as jiff::util::b::Bounds>::error
        mov ecx, eax
        jmp .LBB0_10
.LBB0_16:
        call <jiff::util::b::Second as jiff::util::b::Bounds>::error
        mov ecx, eax
        jmp .LBB0_10
.LBB0_17:
        call <jiff::util::b::SubsecNanosecond as jiff::util::b::Bounds>::error
        movzx ecx, al
        shl ecx, 8
        inc rcx
        test cl, 1
        jne .LBB0_9
.LBB0_18:
        shr rcx, 32
        shl ebx, 8
        or ebx, edi
        shl ebp, 16
        or ebp, ebx
        mov dword ptr [rsi + 4], ecx
        mov dword ptr [rsi + 8], ebp
        xor eax, eax
        jmp .LBB0_13
❯ cargo asm 5                                                                                                                                                                                                                                                                                 
    Finished `release` profile [optimized] target(s) in 0.09s
.section .text,"xr",one_only,pyo3_test::pytime_to_time_assert_unchecked::<pyo3::instance::Bound<pyo3::types::datetime::PyTime>>,unique,1
        .p2align        4
pyo3_test::pytime_to_time_assert_unchecked::<pyo3::instance::Bound<pyo3::types::datetime::PyTime>>:
        .cv_func_id 22
.seh_proc _RINvCsftvnnhgnKoi_9pyo3_test31pytime_to_time_assert_uncheckedINtNtCsanQ9as8NDuY_4pyo38instance5BoundNtNtNtB12_5types8datetime6PyTimeEEB2_
        push rsi
        .seh_pushreg rsi
        push rdi
        .seh_pushreg rdi
        push rbx
        .seh_pushreg rbx
        sub rsp, 32
        .seh_stackalloc 32
        .seh_endprologue
        mov rsi, rcx
        .cv_inline_site_id 23 within 22 inlined_at 1 30 0
        .cv_inline_site_id 24 within 23 inlined_at 3 635 0
        movzx eax, byte ptr [rdx + 28]
        shl eax, 16
        movzx ecx, byte ptr [rdx + 29]
        shl ecx, 8
        or ecx, eax
        movzx eax, byte ptr [rdx + 30]
        or eax, ecx
        imul ecx, eax, 1000
        test ecx, ecx
        js .LBB1_1
        .cv_inline_site_id 25 within 22 inlined_at 1 26 0
        .cv_inline_site_id 26 within 25 inlined_at 5 280 0
        .cv_inline_site_id 27 within 26 inlined_at 4 103 0
        movzx edi, word ptr [rdx + 25]
        cmp dil, 24
        jae .LBB1_3
        .cv_inline_site_id 28 within 25 inlined_at 5 281 0
        .cv_inline_site_id 29 within 28 inlined_at 4 103 0
        cmp edi, 15360
        jae .LBB1_5
        .cv_inline_site_id 30 within 25 inlined_at 5 282 0
        .cv_inline_site_id 31 within 30 inlined_at 4 103 0
        movzx ebx, byte ptr [rdx + 27]
        cmp bl, 60
        jae .LBB1_7
        .cv_inline_site_id 32 within 25 inlined_at 5 283 0
        .cv_inline_site_id 33 within 32 inlined_at 4 103 0
        cmp ecx, 1000000000
        jae .LBB1_9
        shl rcx, 32
        .cv_inline_site_id 34 within 25 inlined_at 5 283 0
        test cl, 1
        je .LBB1_14
.LBB1_12:
        shr ecx, 8
.LBB1_13:
        .cv_inline_site_id 35 within 25 inlined_at 5 0 0
        call <jiff::error::Error as core::convert::From<jiff::util::b::BoundsError>>::from
        .cv_inline_site_id 36 within 22 inlined_at 1 26 0
        lea rcx, [rsi + 8]
        mov rdx, rax
        call <pyo3::err::PyErr as core::convert::From<jiff::error::Error>>::from
        mov eax, 1
        jmp .LBB1_15
.LBB1_1:
        .cv_inline_site_id 37 within 22 inlined_at 1 30 0
        lea rcx, [rsi + 8]
        mov dl, 2
        call <pyo3::err::PyErr as core::convert::From<core::num::error::TryFromIntError>>::from
        mov eax, 1
        jmp .LBB1_15
.LBB1_3:
        call <jiff::util::b::Hour as jiff::util::b::Bounds>::error
        mov ecx, eax
        jmp .LBB1_13
.LBB1_5:
        call <jiff::util::b::Minute as jiff::util::b::Bounds>::error
        mov ecx, eax
        jmp .LBB1_13
.LBB1_7:
        call <jiff::util::b::Second as jiff::util::b::Bounds>::error
        mov ecx, eax
        jmp .LBB1_13
.LBB1_9:
        call <jiff::util::b::SubsecNanosecond as jiff::util::b::Bounds>::error
        movzx ecx, al
        shl ecx, 8
        inc rcx
        test cl, 1
        jne .LBB1_12
.LBB1_14:
        shr rcx, 32
        shl ebx, 16
        or ebx, edi
        mov dword ptr [rsi + 4], ecx
        mov dword ptr [rsi + 8], ebx
        xor eax, eax
.LBB1_15:
        mov dword ptr [rsi], eax
        .seh_startepilogue
        add rsp, 32
        pop rbx
        pop rdi
        pop rsi
        .seh_endepilogue
        ret

### For added assert_unchecked in src/types/list.rs and src/types/tuple.rs:

This pattern is also used in Rust lib itself

@chirizxc

chirizxc commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

+1 to @davidhewitt Such assertions feels very much like "last percent of performance squeezing" and we should make sure we have done all the simpler things first and have proper benchmarking.

Also I see often

        unsafe { core::hint::assert_unchecked(self.index <= self.length) };
        self.length.saturating_sub(self.index)

how is it better than just the plain

        self.length - self.index

?

I searched GitHub within pyo3 but couldn't find any such code

In theory, a UB could occur in a release build 🤔 if self.index > self.length => an underflow will occur, and saturating_sub guarantees a return value of 0 in the event of an underflow

@Tpt

Tpt commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

I searched GitHub within pyo3 but couldn't find any such code

@chirizxc It's from your diff like list.rs line 870.

@chirizxc

Copy link
Copy Markdown
Contributor Author

I searched GitHub within pyo3 but couldn't find any such code

@chirizxc It's from your diff like list.rs line 870.

Looking at the results again at https://godbolt.org, it seems that assert_unchecked in this form in the files src/types/tuple.rs and src/types/list.rs serves no purpose; they all generate the exact same instructions: https://godbolt.org/z/13ha6v49o

Instead of self.length.0.saturating_sub(self.index.0), we can use self.length.0 - self.index.0

    fn len(&self) -> usize {
        self.length.0.saturating_sub(self.index.0)
    }

    pub fn len_pr2(&self) -> usize {
        self.length.0 - self.index.0
    }
len:
        mov     rcx, qword ptr [rdi + 16]
        xor     eax, eax
        sub     rcx, qword ptr [rdi + 8]
        cmovae  rax, rcx
        ret

len_pr2:
        mov     rax, qword ptr [rdi + 16]
        sub     rax, qword ptr [rdi + 8]
        ret

But could the invariant index <= length theoretically be violated if the list is modified externally during iteration?

In next_unsynchronized and next_back_unsynchronized, there is a check: let length = length.0.min(list.len());

If the list is reduced from the outside (for example, by another thread in free-threaded Python), then length will be reduced, but index may retain its old value. Could this somehow lead to index > length?

@Tpt

Tpt commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

If the list is reduced from the outside (for example, by another thread in free-threaded Python), then length will be reduced, but index may retain its old value. Could this somehow lead to index > length?

Sounds like something possible. It's why I would tend to prefer keeping the current code as it is.

@Icxolu Icxolu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the assertion hints actually making any differences here? I quickly tested your examples on my machine and got no difference in the asm between having the hint and not having it (and having a debug_assertion instead for that matter). So I believe the difference you observed is simply caused by using an integer cast now (which is not checked in release mode) instead of going through TryInto which does check.

So I currently don't see any justification to introduce more unsafe here. If we really wanted to, we could keep the casts and add debug_assertions instead, but I'm not really convinced that it makes a measurable difference. In general I would prefer to actually have benchmarks that show the improvement of an optimization. We would also need at least some documentation everywhere that we do these optimizations so that we know in the future why we did it this way and it's not getting lost in maintenance. All in all I would also agree @davidhewitt and @Tpt that this is not sufficiently justified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants