Skip to content

ggml-cpu: fix _pdep_u64 usage on Linux x86 32-bit#3768

Open
Kabir08 wants to merge 2 commits into
ggml-org:masterfrom
Kabir08:fix-linux32-inference-slowdown
Open

ggml-cpu: fix _pdep_u64 usage on Linux x86 32-bit#3768
Kabir08 wants to merge 2 commits into
ggml-org:masterfrom
Kabir08:fix-linux32-inference-slowdown

Conversation

@Kabir08
Copy link
Copy Markdown

@Kabir08 Kabir08 commented Apr 19, 2026

_pdep_u64 is a BMI2 intrinsic only available in 64-bit (x86_64) mode. On 32-bit i386 with BMI2, only _pdep_u32 exists. The previous guard '#ifdef BMI2' was insufficient and produced wrong results on Linux 32-bit.

Fix both occurrences in ggml_vec_dot_iq1_s_q8_K and ggml_vec_dot_iq1_m_q8_K to use:
#if defined(BMI2) && defined(x86_64)

This ensures 32-bit builds use the scalar fallback paths.

Fixes: #3758

Kabir08 added 2 commits April 19, 2026 11:47
_pdep_u64 is a BMI2 intrinsic only available in 64-bit (x86_64) mode.
On 32-bit i386 with BMI2, only _pdep_u32 exists. The previous guard
'#ifdef __BMI2__' was insufficient and produced wrong results on Linux 32-bit.

Fix both occurrences in ggml_vec_dot_iq1_s_q8_K and
ggml_vec_dot_iq1_m_q8_K to use:
    #if defined(__BMI2__) && defined(__x86_64__)

This ensures 32-bit builds use the scalar fallback paths.

Fixes: ggml-org#3758
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Very slow inference on Linux 32-bit (CPU-only) vs fast performance on Win32/Win64/Linux64 on same hardware

1 participant