-
-
Notifications
You must be signed in to change notification settings - Fork 15
FEAT: Adding Quad to and from bytes array casting support #228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I do wonder if it makes sense to template this. Even the aligned/unaligned could possibly be templated with a tiny (It's just 4 times almost the same code, I guess? And even if you take care of some unicode shenanigans -- such as proper unicode whitespace check -- honestly, I don't think it matters speed wise to just use the unicode check also for bytes.) |
|
The input is different, loading from Py_UCS vs normal char * (so might need to do specialized template which again expands to same code size) |
SwayamInSync
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took me 3 days, I was learning more about the compiler optimizations and here is the godbolt compiler explorer link to see for proof
| if constexpr (Aligned) { | ||
| return *(const T *)ptr; | ||
| } | ||
| else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case-1: Aligned is true
- There will be no runtime overhead of if constexpr
Case-2: Aligned is false
- The size is known at compile time (16 bytes)
- so it is like copying into a local variable with known alignment
- so compiler replaces the memcpy call with inline load instructions (movdqu)
|
@seberg if looks fine, then I'll be happy to refactor other loops in future PRs as well |
|
If there are any spots you'd particularly like review for that would help. This is a big diff! |
|
It actually became big because I also refactored the unicode casting code here (which was done in #225 ) So if you see the so I think just reviewing following will be good enough
|
|
Hi @ngoldbaum let me know if you need any help here! I anyways have to do one more PR for the flag issue in both unicode + bytes loop registration so if you want me to simplify then I can undo the unicode refactors here (since they already being reviewed and push them in that PR) here only byte related functions need to be checked |
|
I've been buried under notifications all week and haven't had time. I'm planning to look this over today though. |
| } | ||
|
|
||
| memcpy(temp_str, bytes_str, bytes_size); | ||
| temp_str[bytes_size] = '\0'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can delete this since the assignment below already handles the null-at-the-end case.
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spotted a couple issues.
|
|
||
| // Helper function: Copy string to bytes output buffer | ||
| static inline void | ||
| copy_string_to_bytes(const char *str, char *out_bytes, npy_intp bytes_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this copy_cstring_to_bytes to make it clearer that str has to be null-terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just inline it because the function only has one caller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it might be later needed for adding variable-len string support, but it seems that'll also be simple as a strncpy call
| static inline void | ||
| copy_string_to_bytes(const char *str, char *out_bytes, npy_intp bytes_size) | ||
| { | ||
| npy_intp str_len = strlen(str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strlen is a timebomb in any C codebase. Use strnlen or strncpy instead.
|
Comments are addressed!! |
|
LGTM now. It occurs to me that adding a test run that builds CPython, NumPy, and quaddtype using address sanitizer and then runs the tests. There's an ASan test run in the NumPy CI if you want an example. Keep in mind that you need to build NumPy with |
Make sense, I'll check and extend the suite this weekend |
|
Merging this in! |
Similar implementation to #225 (only endianness does not apply here)