FEAT: Adding Quad to and from bytes array casting support #228

SwayamInSync · 2025-11-20T07:57:57Z

Similar implementation to #225 (only endianness does not apply here)

seberg · 2025-11-20T11:45:10Z

I do wonder if it makes sense to template this. Even the aligned/unaligned could possibly be templated with a tiny memcpy no-op or assignment helper (assuming the compiler will optimize things away)?

(It's just 4 times almost the same code, I guess? And even if you take care of some unicode shenanigans -- such as proper unicode whitespace check -- honestly, I don't think it matters speed wise to just use the unicode check also for bytes.)

SwayamInSync · 2025-11-25T09:57:57Z

The input is different, loading from Py_UCS vs normal char * (so might need to do specialized template which again expands to same code size)
I think the processing logic might be modularized, I'll give it a shot here

SwayamInSync

Sorry it took me 3 days, I was learning more about the compiler optimizations and here is the godbolt compiler explorer link to see for proof

SwayamInSync · 2025-11-30T08:58:25Z

quaddtype/numpy_quaddtype/src/utilities.h

+    if constexpr (Aligned) {
+        return *(const T *)ptr;
+    }
+    else {


Case-1: Aligned is true
- There will be no runtime overhead of if constexpr

Case-2: Aligned is false
- The size is known at compile time (16 bytes)
- so it is like copying into a local variable with known alignment
- so compiler replaces the memcpy call with inline load instructions (movdqu)

SwayamInSync · 2025-11-30T09:01:23Z

@seberg if looks fine, then I'll be happy to refactor other loops in future PRs as well

ngoldbaum · 2025-12-01T21:07:35Z

If there are any spots you'd particularly like review for that would help. This is a big diff!

SwayamInSync · 2025-12-01T21:26:08Z

It actually became big because I also refactored the unicode casting code here (which was done in #225 )

So if you see the casts.cpp then there is no more different aligned and unaligned loops (for both bytes and unicode) we now template instantiate them by setting the Aligned template parameter to true and false for aligned and unaligned loops respectively and it correspondingly uses the method of load/store defined inside utilities.h

so I think just reviewing following will be good enough

newly load/store templates inside utilities.h here compiler can optimize out the memcpy
in casts.cpp just bytes_to_quad_strided_loop and quad_to_bytes_loop (the unicode part was already reviewed it just here it got refactor to use template as well)

SwayamInSync · 2025-12-04T14:30:15Z

Hi @ngoldbaum let me know if you need any help here!

I anyways have to do one more PR for the flag issue in both unicode + bytes loop registration so if you want me to simplify then I can undo the unicode refactors here (since they already being reviewed and push them in that PR) here only byte related functions need to be checked bytes_to_quad_strided_loop and quad_to_bytes_loop

ngoldbaum · 2025-12-04T15:56:25Z

I've been buried under notifications all week and haven't had time. I'm planning to look this over today though.

ngoldbaum · 2025-12-04T15:59:22Z

quaddtype/numpy_quaddtype/src/casts.cpp

+    }
+
+    memcpy(temp_str, bytes_str, bytes_size);
+    temp_str[bytes_size] = '\0';


you can delete this since the assignment below already handles the null-at-the-end case.

ngoldbaum

Spotted a couple issues.

ngoldbaum · 2025-12-04T16:04:59Z

quaddtype/numpy_quaddtype/src/casts.cpp

+
+// Helper function: Copy string to bytes output buffer
+static inline void
+copy_string_to_bytes(const char *str, char *out_bytes, npy_intp bytes_size)


I'd call this copy_cstring_to_bytes to make it clearer that str has to be null-terminated.

or just inline it because the function only has one caller

I thought it might be later needed for adding variable-len string support, but it seems that'll also be simple as a strncpy call

ngoldbaum · 2025-12-04T16:06:18Z

quaddtype/numpy_quaddtype/src/casts.cpp

+static inline void
+copy_string_to_bytes(const char *str, char *out_bytes, npy_intp bytes_size)
+{
+    npy_intp str_len = strlen(str);


strlen is a timebomb in any C codebase. Use strnlen or strncpy instead.

SwayamInSync · 2025-12-04T18:19:02Z

Comments are addressed!!

ngoldbaum · 2025-12-04T18:24:19Z

LGTM now.

It occurs to me that adding a test run that builds CPython, NumPy, and quaddtype using address sanitizer and then runs the tests. There's an ASan test run in the NumPy CI if you want an example. Keep in mind that you need to build NumPy with pip instead of spin if you use that example though.

SwayamInSync · 2025-12-04T18:51:07Z

LGTM now.

It occurs to me that adding a test run that builds CPython, NumPy, and quaddtype using address sanitizer and then runs the tests. There's an ASan test run in the NumPy CI if you want an example. Keep in mind that you need to build NumPy with pip instead of spin if you use that example though.

Make sense, I'll check and extend the suite this weekend

SwayamInSync · 2025-12-04T18:55:02Z

Merging this in!

SwayamInSync added 2 commits November 20, 2025 07:53

to and from bytes cast

8e3eba4

fix rounding and more tests

c22e9f4

SwayamInSync added the numpy_quaddtype label Nov 20, 2025

SwayamInSync requested review from ngoldbaum and seberg November 20, 2025 10:49

optimizing

5e3fce6

SwayamInSync commented Nov 30, 2025

View reviewed changes

SwayamInSync mentioned this pull request Dec 1, 2025

Large arrays casting of unicode/bytes cast to quaddtype leads segfault #237

Closed

ngoldbaum reviewed Dec 4, 2025

View reviewed changes

SwayamInSync added 2 commits December 4, 2025 16:28

fixes

daf0611

strlen fixes for unicode

628a505

SwayamInSync merged commit 8db38b0 into numpy:main Dec 4, 2025
11 checks passed

SwayamInSync deleted the bytes-cast branch December 4, 2025 18:55

Uh oh!

FEAT: Adding Quad to and from bytes array casting support #228

FEAT: Adding Quad to and from bytes array casting support #228

Uh oh!

Conversation

SwayamInSync commented Nov 20, 2025

Uh oh!

seberg commented Nov 20, 2025

Uh oh!

SwayamInSync commented Nov 25, 2025

Uh oh!

SwayamInSync left a comment

Choose a reason for hiding this comment

Uh oh!

SwayamInSync Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

SwayamInSync commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Dec 1, 2025

Uh oh!

SwayamInSync commented Dec 1, 2025

Uh oh!

SwayamInSync commented Dec 4, 2025

Uh oh!

ngoldbaum commented Dec 4, 2025

Uh oh!

ngoldbaum Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

ngoldbaum Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ngoldbaum Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

SwayamInSync Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ngoldbaum Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

SwayamInSync commented Dec 4, 2025

Uh oh!

ngoldbaum commented Dec 4, 2025

Uh oh!

SwayamInSync commented Dec 4, 2025

Uh oh!

SwayamInSync commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SwayamInSync commented Nov 30, 2025 •

edited

Loading