fix(): Updated Drawable based on Ludashi 3.0 improvements#1382
fix(): Updated Drawable based on Ludashi 3.0 improvements#1382phobos665 wants to merge 3 commits into
Conversation
…de by ludashi regarding smart usage of NEON and checks before attempting to transfer.
📝 WalkthroughWalkthroughAdds optional ARM NEON support and refactors several drawable routines to use per-row unpacking, hoisted row buffers (stack-first, heap-fallback), an RGB-only copy fast-path, NEON-accelerated masked/unmasked bitmap routines, and scalar fallbacks. ChangesSIMD-Accelerated Drawable Rendering
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/src/main/cpp/winlator/drawable.c (1)
347-366:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftCritical:
fromBitmapdoes not match the Drawable's pixel format and ignores stride/format.Several correctness gaps in this new JNI function:
Missing R↔B channel swap. The Drawable buffer stores pixels as BGRA bytes (per
unpackColorat lines 24-29:rgba[0]=B, rgba[1]=G, rgba[2]=R, rgba[3]=A), andJava_com_winlator_xserver_Pixmap_toBitmap(lines 391-435) explicitly swaps R↔B when emitting an AndroidARGB_8888bitmap (which is RGBA in memory).fromBitmapis the inverse direction and must perform the same R↔B swap; the currentmemcpywill produce a buffer with red and blue swapped, so any subsequenttoBitmapcall will return colors that don't match the source bitmap.Stride is ignored.
AndroidBitmapInfo.stridemay be larger thaninfo.width * 4(row padding). The plainmemcpyofwidth * height * 4will read past the end of valid pixels on the last row(s) and copy padding bytes into pixel positions on intermediate rows. Copy must be row-by-row usinginfo.stride.No format check. If the bitmap is
RGB_565,RGBA_F16,ALPHA_8, etc., the resulting buffer is garbage. Validateinfo.format == ANDROID_BITMAP_FORMAT_RGBA_8888and bail out otherwise.No destination capacity check. Should validate
GetDirectBufferCapacity(env, data) >= info.width * info.height * 4before writing.The leftover
/* replace the whole JNI body */comment at line 347 should be removed before merge.🔒️ Sketch of a corrected implementation
-/* replace the whole JNI body */ JNIEXPORT void JNICALL Java_com_winlator_xserver_Drawable_fromBitmap(JNIEnv *env, jclass obj, jobject bitmap, jobject data) { uint8_t *dst = (*env)->GetDirectBufferAddress(env, data); - if (!dst) { + jlong dstCap = (*env)->GetDirectBufferCapacity(env, data); + if (!dst || dstCap < 0) { printf("Error: NULL buffer address in fromBitmap\n"); return; } AndroidBitmapInfo info; uint8_t *pixels; if (AndroidBitmap_getInfo(env, bitmap, &info) < 0) return; + if (info.format != ANDROID_BITMAP_FORMAT_RGBA_8888) { + printf("Error: unsupported bitmap format %u in fromBitmap\n", info.format); + return; + } + size_t needed = (size_t)info.width * info.height * 4; + if ((size_t)dstCap < needed) { + printf("Error: destination buffer too small in fromBitmap\n"); + return; + } if (AndroidBitmap_lockPixels(env, bitmap, (void **)&pixels) < 0) return; - memcpy(dst, pixels, (size_t)info.width * info.height * 4); + /* Per row: copy info.width pixels, swapping R↔B to match Drawable's BGRA layout. */ + for (uint32_t y = 0; y < info.height; y++) { + const uint8_t *srcRow = pixels + (size_t)y * info.stride; + uint8_t *dstRow = dst + (size_t)y * info.width * 4; + for (uint32_t x = 0; x < info.width; x++) { + dstRow[x*4 + 0] = srcRow[x*4 + 2]; /* B <- R */ + dstRow[x*4 + 1] = srcRow[x*4 + 1]; /* G */ + dstRow[x*4 + 2] = srcRow[x*4 + 0]; /* R <- B */ + dstRow[x*4 + 3] = srcRow[x*4 + 3]; /* A */ + } + } AndroidBitmap_unlockPixels(env, bitmap); }The shuffle table from
toBitmap(lines 404-409) can be reused here for a NEON-accelerated version.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/main/cpp/winlator/drawable.c` around lines 347 - 366, The JNI function Java_com_winlator_xserver_Drawable_fromBitmap currently does a raw memcpy and must be fixed: validate AndroidBitmapInfo.format == ANDROID_BITMAP_FORMAT_RGBA_8888, check the destination buffer capacity via GetDirectBufferCapacity(env, data) is at least info.width * info.height * 4, read pixels with AndroidBitmap_getInfo and AndroidBitmap_lockPixels as before but copy row-by-row using info.stride (not a single memcpy) into the direct buffer, and while copying swap R↔B per-pixel (invert channels to match Drawable's BGRA layout); finally call AndroidBitmap_unlockPixels and remove the leftover comment "/* replace the whole JNI body */". Ensure all early returns log or handle errors appropriately.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@app/src/main/cpp/winlator/drawable.c`:
- Around line 347-366: The JNI function
Java_com_winlator_xserver_Drawable_fromBitmap currently does a raw memcpy and
must be fixed: validate AndroidBitmapInfo.format ==
ANDROID_BITMAP_FORMAT_RGBA_8888, check the destination buffer capacity via
GetDirectBufferCapacity(env, data) is at least info.width * info.height * 4,
read pixels with AndroidBitmap_getInfo and AndroidBitmap_lockPixels as before
but copy row-by-row using info.stride (not a single memcpy) into the direct
buffer, and while copying swap R↔B per-pixel (invert channels to match
Drawable's BGRA layout); finally call AndroidBitmap_unlockPixels and remove the
leftover comment "/* replace the whole JNI body */". Ensure all early returns
log or handle errors appropriately.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 66eaa17d-fc7d-4c81-a3f4-d1a8fed42f01
📒 Files selected for processing (1)
app/src/main/cpp/winlator/drawable.c
There was a problem hiding this comment.
1 issue found across 1 file
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="app/src/main/cpp/winlator/drawable.c">
<violation number="1" location="app/src/main/cpp/winlator/drawable.c:270">
P2: Thick-line rendering is now angle-dependent because the two branches draw different pixel footprints.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| @@ -7,6 +7,10 @@ | |||
| #include <android/bitmap.h> | |||
There was a problem hiding this comment.
P2: Thick-line rendering is now angle-dependent because the two branches draw different pixel footprints.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/src/main/cpp/winlator/drawable.c, line 270:
<comment>Thick-line rendering is now angle-dependent because the two branches draw different pixel footprints.</comment>
<file context>
@@ -203,19 +246,37 @@ Java_com_winlator_xserver_Drawable_drawLine(JNIEnv *env, jclass obj, jshort x0,
while (true) {
- for (int16_t i = 0; i < lineWidth; i++) {
- memcpy(dataAddr + (x0 + (i + y0) * stride) * 4, row, rowSize);
+ if (isHorizontal) {
+ // Horizontal-ish: write a full row of pixels at once
+ for (int16_t i = 0; i < lineWidth; i++) {
</file context>
|
Some of the changes look for me slower than the previous version (especially when replacing memcpy with something else). Note that Ludashi 3.0 has quite a few new problems, it might be worth waiting until it is polished before taking his changes. |
MIght grab you to chat through which parts are worth pulling out. I was seeing a decent improvement, although if there's parts we can adjust then let's chat |
lvonasek
left a comment
There was a problem hiding this comment.
I left a few comments. But I might be wrong.
| int j = (x + dstX + (y + dstY) * dstStride) * 4; | ||
| dstDataAddr[j+0] = srcDataAddr[i+0]; | ||
| dstDataAddr[j+1] = srcDataAddr[i+1]; | ||
| dstDataAddr[j+2] = srcDataAddr[i+2]; |
There was a problem hiding this comment.
This would be better with memcpy
…improve draw performance
…nto fix/drawable-ludashi-improvements
There was a problem hiding this comment.
1 issue found across 1 file
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="app/src/main/cpp/winlator/drawable.c">
<violation number="1" location="app/src/main/cpp/winlator/drawable.c:219">
P2: Writing `uint32_t` pixels through a casted `uint8_t` stack buffer can be misaligned and violates aliasing/alignment guarantees; use a properly aligned `uint32_t` buffer instead.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| @@ -7,6 +7,10 @@ | |||
| #include <android/bitmap.h> | |||
There was a problem hiding this comment.
P2: Writing uint32_t pixels through a casted uint8_t stack buffer can be misaligned and violates aliasing/alignment guarantees; use a properly aligned uint32_t buffer instead.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/src/main/cpp/winlator/drawable.c, line 219:
<comment>Writing `uint32_t` pixels through a casted `uint8_t` stack buffer can be misaligned and violates aliasing/alignment guarantees; use a properly aligned `uint32_t` buffer instead.</comment>
<file context>
@@ -168,18 +203,26 @@ Java_com_winlator_xserver_Drawable_fillRect(JNIEnv *env, jclass obj, jshort x, j
- for (int i = 0; i < rowSize; i += 4) memcpy(row + i, rgba, 4);
+ uint32_t color32 = ((uint32_t)rgba[3] << 24) | ((uint32_t)rgba[2] << 16) | ((uint32_t)rgba[1] << 8) | rgba[0];
+ uint32_t *row32 = (uint32_t *)(void *)row; // (void*) cast silences strict-aliasing warning
+ for (int i = 0; i < width; i++) row32[i] = color32;
for (int16_t i = 0; i < height; i++) {
</file context>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
app/src/main/cpp/winlator/drawable.c (1)
161-172: ⚡ Quick winGCF_COPY fast path: prefer
memcpyover per-pixel 3-byte copies.The "fast path" still costs three byte loads + three byte stores per pixel and writes one byte at a time. The neighboring
copyArea(line 130) treats the same buffer format as a contiguous 32-bit blit viamemcpy, so dst alpha clearly isn't sacred in this codebase. Replacing the inner loop with a per-rowmemcpyofwidth * 4bytes is dramatically faster (single libc call per row, lets the compiler/CPU use wide loads/stores) and matches the existingcopyAreapattern. This also addresses the recurring "would be better with memcpy" feedback.♻️ Proposed refactor
// Fast path: GCF_COPY is plain pixel blitting — copy only RGB bytes to match if (gcFunction == GCF_COPY) { + const size_t rowBytes = (size_t)width * 4; for (int16_t y = 0; y < height; y++) { - // Hoist row base pointers: eliminates a multiply per pixel in the inner loop - uint8_t *s = srcDataAddr + (srcX + (y + srcY) * srcStride) * 4; - uint8_t *d = dstDataAddr + (dstX + (y + dstY) * dstStride) * 4; - for (int16_t x = 0; x < width; x++, s += 4, d += 4) { - d[0] = s[0]; d[1] = s[1]; d[2] = s[2]; - /* byte 3 (alpha) intentionally not copied */ - } + memcpy(dstDataAddr + (dstX + (y + dstY) * dstStride) * 4, + srcDataAddr + (srcX + (y + srcY) * srcStride) * 4, + rowBytes); } return; }If destination alpha must be preserved (worth confirming against the X11 semantics this implements), keep the byte-wise variant but at least batch into 32-bit reads and mask the alpha byte instead of three discrete byte ops.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/main/cpp/winlator/drawable.c` around lines 161 - 172, The GCF_COPY fast path in the gcFunction == GCF_COPY block currently copies pixels byte-by-byte; replace the inner per-pixel 3-byte copies with a per-row memcpy: compute the source row pointer (using srcDataAddr, srcX, srcY, srcStride) and destination row pointer (dstDataAddr, dstX, dstY, dstStride) and call memcpy(d, s, width * 4) for each y to match the existing copyArea approach and leverage wide loads/stores; if destination alpha truly must be preserved, instead batch 32-bit reads/writes and mask out the alpha byte rather than performing three separate byte loads/stores.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/main/cpp/winlator/drawable.c`:
- Around line 396-416: The NEON block uses the AArch64-only intrinsic vqtbl1q_u8
under the existing `#ifdef` __ARM_NEON, which breaks 32-bit ARM builds; update the
preprocessor guard to require AArch64 (e.g., check __aarch64__ in addition to
__ARM_NEON) around the vqtbl1q_u8 path (the static shuffle, vShuffle,
vld1q_u8/vqtbl1q_u8/vst1q_u8 loop) so armeabi-v7a falls back to the existing
scalar/#else branch, or replace with an ARMv7-compatible implementation using
vtbl1_u8 on uint8x8_t halves if you want NEON on 32-bit.
- Around line 269-280: The vertical branch in the drawing loop (conditioned on
isHorizontal) writes only a lineWidth×1 strip while the horizontal branch paints
a lineWidth×lineWidth square, causing asymmetric brush shapes for lineWidth>1;
update the vertical-ish branch (the loop that uses ((uint32_t *)dataAddr)[(x0 +
i) + y0 * stride] = color32) to paint the same square brush as the horizontal
branch (use the same memcpy-based row writes or an equivalent nested loop
writing rowSize bytes for each of lineWidth rows at each Bresenham step using
dataAddr, x0, y0, stride, row, rowSize and color32) so both branches produce
identical lineWidth×lineWidth coverage per step.
---
Nitpick comments:
In `@app/src/main/cpp/winlator/drawable.c`:
- Around line 161-172: The GCF_COPY fast path in the gcFunction == GCF_COPY
block currently copies pixels byte-by-byte; replace the inner per-pixel 3-byte
copies with a per-row memcpy: compute the source row pointer (using srcDataAddr,
srcX, srcY, srcStride) and destination row pointer (dstDataAddr, dstX, dstY,
dstStride) and call memcpy(d, s, width * 4) for each y to match the existing
copyArea approach and leverage wide loads/stores; if destination alpha truly
must be preserved, instead batch 32-bit reads/writes and mask out the alpha byte
rather than performing three separate byte loads/stores.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ecc068d2-88f0-457c-9716-757b338632ba
📒 Files selected for processing (1)
app/src/main/cpp/winlator/drawable.c
| while (true) { | ||
| for (int16_t i = 0; i < lineWidth; i++) { | ||
| memcpy(dataAddr + (x0 + (i + y0) * stride) * 4, row, rowSize); | ||
| if (isHorizontal) { | ||
| // Horizontal-ish: write a full row of pixels at once | ||
| for (int16_t i = 0; i < lineWidth; i++) { | ||
| memcpy(dataAddr + (x0 + (i + y0) * stride) * 4, row, rowSize); | ||
| } | ||
| } else { | ||
| // Vertical-ish: write individual pixels | ||
| for (int16_t i = 0; i < lineWidth; i++) { | ||
| ((uint32_t *)dataAddr)[(x0 + i) + y0 * stride] = color32; | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Show prior drawLine implementation to confirm what brush shape was rendered before this PR.
git log --all --oneline -- app/src/main/cpp/winlator/drawable.c | head -n 20
git show HEAD~1:app/src/main/cpp/winlator/drawable.c 2>/dev/null | sed -n '/Java_com_winlator_xserver_Drawable_drawLine/,/^}/p'Repository: utkarshdalal/GameNative
Length of output: 458
🏁 Script executed:
#!/bin/bash
# First, find and read the drawable.c file
find . -name "drawable.c" -type fRepository: utkarshdalal/GameNative
Length of output: 106
🏁 Script executed:
#!/bin/bash
# Read the drawable.c file to see the full drawLine implementation around lines 269-280
wc -l app/src/main/cpp/winlator/drawable.c
sed -n '240,300p' app/src/main/cpp/winlator/drawable.cRepository: utkarshdalal/GameNative
Length of output: 2059
🏁 Script executed:
#!/bin/bash
# Try to get the git history to find previous versions
cd app/src/main/cpp/winlator && git log --oneline drawable.c | head -20Repository: utkarshdalal/GameNative
Length of output: 183
🏁 Script executed:
#!/bin/bash
# Show the merge commit details and parents
cd app/src/main/cpp/winlator && git show 874b6f53 --stat | head -40Repository: utkarshdalal/GameNative
Length of output: 2520
🏁 Script executed:
#!/bin/bash
# Get the prior version before the merge
cd app/src/main/cpp/winlator && git show 874b6f53^:drawable.c 2>/dev/null | sed -n '240,300p'Repository: utkarshdalal/GameNative
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Also try the other parent
cd app/src/main/cpp/winlator && git show 874b6f53^2:drawable.c 2>/dev/null | sed -n '240,300p'Repository: utkarshdalal/GameNative
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Show git diff for this file in the merge to see what changed
cd app/src/main/cpp/winlator && git show 874b6f53 -- drawable.c | grep -A 50 -B 10 "isHorizontal" | head -100Repository: utkarshdalal/GameNative
Length of output: 2678
🏁 Script executed:
#!/bin/bash
# Check what branches exist to understand the structure better
git branch -a | head -20Repository: utkarshdalal/GameNative
Length of output: 171
🏁 Script executed:
#!/bin/bash
# Try to find commits that specifically changed drawLine
cd app/src/main/cpp/winlator && git log --all --oneline -S "isHorizontal" -- drawable.cRepository: utkarshdalal/GameNative
Length of output: 183
🏁 Script executed:
#!/bin/bash
# Check if there's a prior version of the file on master branch
git show master:app/src/main/cpp/winlator/drawable.c 2>&1 | grep -A 20 "drawLine" | head -40Repository: utkarshdalal/GameNative
Length of output: 913
🏁 Script executed:
#!/bin/bash
# Get full history of commits before the merge
git log --oneline --all | head -20Repository: utkarshdalal/GameNative
Length of output: 1386
🏁 Script executed:
#!/bin/bash
# Try to see the actual state on master before this merge
git show master:app/src/main/cpp/winlator/drawable.c 2>&1 | wc -lRepository: utkarshdalal/GameNative
Length of output: 71
🏁 Script executed:
#!/bin/bash
# Get the full drawLine function from master
git show master:app/src/main/cpp/winlator/drawable.c 2>&1 | sed -n '/Java_com_winlator_xserver_Drawable_drawLine/,/^}/p'Repository: utkarshdalal/GameNative
Length of output: 1351
Inconsistent brush shape between horizontal and vertical branches when lineWidth > 1 is a regression.
The prior implementation painted a lineWidth × lineWidth square at every Bresenham step for all lines. The current code breaks this:
- Horizontal-ish: Unchanged — still paints
lineWidth × lineWidthsquare (memcpy loop over lineWidth iterations, each writing rowSize bytes). - Vertical-ish: New asymmetry — paints only
lineWidth × 1horizontal strip (single-pixel-height row at y0).
For lineWidth == 1 both are identical, but for thicker lines this creates asymmetric rendering: vertical-ish lines render with reduced thickness compared to horizontal-ish lines, breaking the prior consistent behavior.
Either restore uniform square brushes (apply the memcpy approach to both branches), or implement symmetric perpendicular strips (write a column of lineWidth pixels for horizontal, row for vertical). The current mixed approach is a regression.
Symmetric perpendicular brush fix
if (isHorizontal) {
- // Horizontal-ish: write a full row of pixels at once
+ // Horizontal-ish: thicken vertically — column of lineWidth pixels at x0.
for (int16_t i = 0; i < lineWidth; i++) {
- memcpy(dataAddr + (x0 + (i + y0) * stride) * 4, row, rowSize);
+ ((uint32_t *)dataAddr)[x0 + (y0 + i) * stride] = color32;
}
} else {
// Vertical-ish: write individual pixels
for (int16_t i = 0; i < lineWidth; i++) {
((uint32_t *)dataAddr)[(x0 + i) + y0 * stride] = color32;
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/src/main/cpp/winlator/drawable.c` around lines 269 - 280, The vertical
branch in the drawing loop (conditioned on isHorizontal) writes only a
lineWidth×1 strip while the horizontal branch paints a lineWidth×lineWidth
square, causing asymmetric brush shapes for lineWidth>1; update the vertical-ish
branch (the loop that uses ((uint32_t *)dataAddr)[(x0 + i) + y0 * stride] =
color32) to paint the same square brush as the horizontal branch (use the same
memcpy-based row writes or an equivalent nested loop writing rowSize bytes for
each of lineWidth rows at each Bresenham step using dataAddr, x0, y0, stride,
row, rowSize and color32) so both branches produce identical lineWidth×lineWidth
coverage per step.
| #ifdef __ARM_NEON | ||
| if (!maskDataAddr) { | ||
| /* Fast path: no mask — swap R and B channels across 4 pixels at a time. | ||
| * src layout per pixel: [R, G, B, A] | ||
| * dst layout per pixel: [B, G, R, A] | ||
| * vrev32q_u8 reverses the 4 bytes within each 32-bit pixel: RGBA → ABGR, | ||
| * which maps R→B and B→R with G and A landing in wrong positions. | ||
| * Instead we use vtbl (byte table lookup) to do an exact per-byte shuffle. */ | ||
| static const uint8_t shuffle[16] = { | ||
| 2, 1, 0, 3, /* pixel 0: swap bytes 0 and 2 (R↔B), keep 1 (G) and 3 (A) */ | ||
| 6, 5, 4, 7, /* pixel 1 */ | ||
| 10, 9, 8, 11, /* pixel 2 */ | ||
| 14, 13, 12, 15 /* pixel 3 */ | ||
| }; | ||
| uint8x16_t vShuffle = vld1q_u8(shuffle); | ||
| int i = 0; | ||
| for (; i + 3 < size; i += 4) { | ||
| uint8x16_t vSrc = vld1q_u8(src + i * 4); | ||
| uint8x16_t vDst = vqtbl1q_u8(vSrc, vShuffle); | ||
| vst1q_u8(pixels + i * 4, vDst); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm whether the project targets armeabi-v7a (in which case __ARM_NEON without __aarch64__ is reachable).
fd -t f -e gradle -e kts | xargs rg -n -C2 'abiFilters|ndk\s*\{|armeabi-v7a|arm64-v8a' 2>/dev/null
fd -i 'application\.mk|android\.mk|cmakelists\.txt' | xargs rg -n -C2 'APP_ABI|ANDROID_ABI|armeabi|arm64' 2>/dev/nullRepository: utkarshdalal/GameNative
Length of output: 315
vqtbl1q_u8 is AArch64-only — guard must include __aarch64__.
The project explicitly targets both arm64-v8a and armeabi-v7a (as seen in app/build.gradle.kts:76). The code uses #ifdef __ARM_NEON, which is defined for both ARMv7-A and AArch64, but vqtbl1q_u8 is an A64-only intrinsic and does not exist on ARMv7-A NEON. This will cause a compilation failure on 32-bit ARM builds.
Change the guard to:
-#ifdef __ARM_NEON
+#if defined(__ARM_NEON) && defined(__aarch64__)The scalar #else branch already handles both cases, so 32-bit ARM will fall back to the portable path unchanged. Alternatively, if you need NEON vectorization on ARMv7-A, use vtbl1_u8 on uint8x8_t halves instead.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #ifdef __ARM_NEON | |
| if (!maskDataAddr) { | |
| /* Fast path: no mask — swap R and B channels across 4 pixels at a time. | |
| * src layout per pixel: [R, G, B, A] | |
| * dst layout per pixel: [B, G, R, A] | |
| * vrev32q_u8 reverses the 4 bytes within each 32-bit pixel: RGBA → ABGR, | |
| * which maps R→B and B→R with G and A landing in wrong positions. | |
| * Instead we use vtbl (byte table lookup) to do an exact per-byte shuffle. */ | |
| static const uint8_t shuffle[16] = { | |
| 2, 1, 0, 3, /* pixel 0: swap bytes 0 and 2 (R↔B), keep 1 (G) and 3 (A) */ | |
| 6, 5, 4, 7, /* pixel 1 */ | |
| 10, 9, 8, 11, /* pixel 2 */ | |
| 14, 13, 12, 15 /* pixel 3 */ | |
| }; | |
| uint8x16_t vShuffle = vld1q_u8(shuffle); | |
| int i = 0; | |
| for (; i + 3 < size; i += 4) { | |
| uint8x16_t vSrc = vld1q_u8(src + i * 4); | |
| uint8x16_t vDst = vqtbl1q_u8(vSrc, vShuffle); | |
| vst1q_u8(pixels + i * 4, vDst); | |
| } | |
| `#if` defined(__ARM_NEON) && defined(__aarch64__) | |
| if (!maskDataAddr) { | |
| /* Fast path: no mask — swap R and B channels across 4 pixels at a time. | |
| * src layout per pixel: [R, G, B, A] | |
| * dst layout per pixel: [B, G, R, A] | |
| * vrev32q_u8 reverses the 4 bytes within each 32-bit pixel: RGBA → ABGR, | |
| * which maps R→B and B→R with G and A landing in wrong positions. | |
| * Instead we use vtbl (byte table lookup) to do an exact per-byte shuffle. */ | |
| static const uint8_t shuffle[16] = { | |
| 2, 1, 0, 3, /* pixel 0: swap bytes 0 and 2 (R↔B), keep 1 (G) and 3 (A) */ | |
| 6, 5, 4, 7, /* pixel 1 */ | |
| 10, 9, 8, 11, /* pixel 2 */ | |
| 14, 13, 12, 15 /* pixel 3 */ | |
| }; | |
| uint8x16_t vShuffle = vld1q_u8(shuffle); | |
| int i = 0; | |
| for (; i + 3 < size; i += 4) { | |
| uint8x16_t vSrc = vld1q_u8(src + i * 4); | |
| uint8x16_t vDst = vqtbl1q_u8(vSrc, vShuffle); | |
| vst1q_u8(pixels + i * 4, vDst); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/src/main/cpp/winlator/drawable.c` around lines 396 - 416, The NEON block
uses the AArch64-only intrinsic vqtbl1q_u8 under the existing `#ifdef` __ARM_NEON,
which breaks 32-bit ARM builds; update the preprocessor guard to require AArch64
(e.g., check __aarch64__ in addition to __ARM_NEON) around the vqtbl1q_u8 path
(the static shuffle, vShuffle, vld1q_u8/vqtbl1q_u8/vst1q_u8 loop) so armeabi-v7a
falls back to the existing scalar/#else branch, or replace with an
ARMv7-compatible implementation using vtbl1_u8 on uint8x8_t halves if you want
NEON on 32-bit.
Description
This is a change to our Drawable.c to bring in some modest improvements in the drawing layer (E.g. UI, text etc).
Here's the general overview:
Recording
Type of Change
Checklist
#code-changes, I have discussed this change there and it has been green-lighted. If I do not have access, I have still provided clear context in this PR. If I skip both, I accept that this change may face delays in review, may not be reviewed at all, or may be closed.CONTRIBUTING.md.Summary by cubic
Speeds up the drawing pipeline in
drawable.cwith NEON-accelerated paths, a fastGCF_COPYRGB blit, and a byte-table R↔B swap in bitmap export. Reduces per-pixel work, favors 32-bit stores, and avoids heap allocs for typical widths.copyAreaOp: fast path forGCF_COPY(RGB only); hoisted row bases.fillRect/drawLine: prebuilt 32-bit row buffer; stack buffer up to 4096 px; choose H/V path once; vertical uses 32-bit stores.drawAlphaMaskedBitmap: NEON vector mask/src checks and select; opaque fore/back colors; scalar fallback.Pixmap_toBitmap: NEON byte-table shuffle for R↔B when no mask; scalar tail and mask path unchanged.Written for commit 874b6f5. Summary will update on new commits.
Summary by CodeRabbit