fix(): Updated Drawable based on Ludashi 3.0 improvements by phobos665 · Pull Request #1382 · utkarshdalal/GameNative

phobos665 · 2026-05-06T10:08:04Z

Description

This is a change to our Drawable.c to bring in some modest improvements in the drawing layer (E.g. UI, text etc).

Here's the general overview:

Use ARM NEON to process 4 bytes at a time instead of iterating 1 pixel at a time. It should help reduce the amount of processing time overall.
Check if a GCFunction is GC_COPY and if so, just copy over the pixels without doing any unncessary processing as we're not required to do anything here. GC_COPY doesn't expect anything back.
Apply Ludashi changes so that we can write horizontally and do memcopy (which reduces time) without having to do each pixel individually.
Rely on NEON for bitmap export. Basically we use a byte-shuffle table which allows us to be deterministic about the channel-swapping, reducing the need for operations to be carried out on those pixels.

Recording

Type of Change

Bug fix
Performance / stability improvement
Compatibility improvements
Other (requires prior approval)

Checklist

If I have access to #code-changes, I have discussed this change there and it has been green-lighted. If I do not have access, I have still provided clear context in this PR. If I skip both, I accept that this change may face delays in review, may not be reviewed at all, or may be closed.
This change aligns with the current project scope (core functionality, stability, or performance). If not, it has been explicitly approved beforehand.
I have attached a recording of the change.
I have read and agree to the contribution guidelines in CONTRIBUTING.md.

Summary by cubic

Speeds up the drawing pipeline in drawable.c with NEON-accelerated paths, a fast GCF_COPY RGB blit, and a byte-table R↔B swap in bitmap export. Reduces per-pixel work, favors 32-bit stores, and avoids heap allocs for typical widths.

Refactors
- 1bpp unpack: process 8 px/byte and handle remainder bits (X11 padding).
- copyAreaOp: fast path for GCF_COPY (RGB only); hoisted row bases.
- fillRect/drawLine: prebuilt 32-bit row buffer; stack buffer up to 4096 px; choose H/V path once; vertical uses 32-bit stores.
- drawAlphaMaskedBitmap: NEON vector mask/src checks and select; opaque fore/back colors; scalar fallback.
- Pixmap_toBitmap: NEON byte-table shuffle for R↔B when no mask; scalar tail and mask path unchanged.

^{Written for commit 874b6f5. Summary will update on new commits.}

Summary by CodeRabbit

Performance
- Faster bitmap, pixmap and alpha-masked rendering for smoother graphics and reduced CPU usage.
- Platform-optimized vector acceleration where available for significant speedups on supported devices.
- Reduced heap allocations and improved memory handling for common drawing sizes, lowering latency and improving responsiveness on resource-limited devices.

…de by ludashi regarding smart usage of NEON and checks before attempting to transfer.

coderabbitai · 2026-05-06T10:08:42Z

📝 Walkthrough

Walkthrough

Adds optional ARM NEON support and refactors several drawable routines to use per-row unpacking, hoisted row buffers (stack-first, heap-fallback), an RGB-only copy fast-path, NEON-accelerated masked/unmasked bitmap routines, and scalar fallbacks.

Changes

SIMD-Accelerated Drawable Rendering

Layer / File(s)	Summary
SIMD header inclusion `app/src/main/cpp/winlator/drawable.c`	Conditional `#include <arm_neon.h>` guarded by `__ARM_NEON__`.
Bitmap unpack (drawBitmap) `app/src/main/cpp/winlator/drawable.c`	Replace per-pixel `getBit` with per-row expansion: iterate full 8-pixel source bytes then handle remainder bits.
Copy-area fast-path (GCF_COPY) `app/src/main/cpp/winlator/drawable.c`	Add `GCF_COPY` path that copies RGB bytes per-row with hoisted pointers and returns early; fallback uses per-pixel `setPixelOp`.
Fill rectangle row-buffer `app/src/main/cpp/winlator/drawable.c`	Use stack-backed row buffer for common widths, malloc only when width exceeds capacity; fill a packed 32-bit row then memcpy to each destination row.
Draw line with row buffer `app/src/main/cpp/winlator/drawable.c`	Precompute horizontal/vertical dominance, use stack-or-heap row buffer, memcpy full rows for dominant direction and pixel writes for the other; free heap buffer only when allocated.
Alpha-masked bitmap (NEON fallback) `app/src/main/cpp/winlator/drawable.c`	Switch drawAlphaMaskedBitmap to `uint32_t` buffers; add `__ARM_NEON` vectorized implementation processing multiple pixels and provide scalar fallback.
Pixmap_toBitmap NEON swap `app/src/main/cpp/winlator/drawable.c`	Replace no-mask conversion with NEON byte-shuffle R/B swap and scalar cleanup for trailing pixels; masked path and non-NEON builds remain scalar.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Quick paws across the pixel stream,
NEON hums and bytes all gleam,
Stack buffers leap, mallocs stay light,
Eight bits stride in tidy flight,
Rabbity render — fast and bright!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title references 'Ludashi 3.0 improvements' but fails to specify the actual changes made to drawable.c—it describes an external source rather than the specific improvements being introduced.	Consider a more descriptive title that highlights the primary optimization (e.g., 'Add ARM NEON optimizations to drawable rendering') instead of relying on an external reference.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description includes most required sections: clear overview of changes, type of change selected, and checklist items. However, the recording checkbox is unchecked despite the template requiring it, and the checklist items remain unchecked.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

app/src/main/cpp/winlator/drawable.c (1)
347-366: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Critical: fromBitmap does not match the Drawable's pixel format and ignores stride/format.

Several correctness gaps in this new JNI function:

Missing R↔B channel swap. The Drawable buffer stores pixels as BGRA bytes (per unpackColor at lines 24-29: rgba[0]=B, rgba[1]=G, rgba[2]=R, rgba[3]=A), and Java_com_winlator_xserver_Pixmap_toBitmap (lines 391-435) explicitly swaps R↔B when emitting an Android ARGB_8888 bitmap (which is RGBA in memory). fromBitmap is the inverse direction and must perform the same R↔B swap; the current memcpy will produce a buffer with red and blue swapped, so any subsequent toBitmap call will return colors that don't match the source bitmap.

Stride is ignored. AndroidBitmapInfo.stride may be larger than info.width * 4 (row padding). The plain memcpy of width * height * 4 will read past the end of valid pixels on the last row(s) and copy padding bytes into pixel positions on intermediate rows. Copy must be row-by-row using info.stride.

No format check. If the bitmap is RGB_565, RGBA_F16, ALPHA_8, etc., the resulting buffer is garbage. Validate info.format == ANDROID_BITMAP_FORMAT_RGBA_8888 and bail out otherwise.

No destination capacity check. Should validate GetDirectBufferCapacity(env, data) >= info.width * info.height * 4 before writing.

The leftover /* replace the whole JNI body */ comment at line 347 should be removed before merge.
🔒️ Sketch of a corrected implementation
-/* replace the whole JNI body */
 JNIEXPORT void JNICALL
 Java_com_winlator_xserver_Drawable_fromBitmap(JNIEnv *env, jclass obj,
         jobject bitmap, jobject data) {
     uint8_t *dst = (*env)->GetDirectBufferAddress(env, data);
-    if (!dst) {
+    jlong dstCap = (*env)->GetDirectBufferCapacity(env, data);
+    if (!dst || dstCap < 0) {
         printf("Error: NULL buffer address in fromBitmap\n");
         return;
     }
 
     AndroidBitmapInfo info;
     uint8_t *pixels;
 
     if (AndroidBitmap_getInfo(env, bitmap, &info) < 0) return;
+    if (info.format != ANDROID_BITMAP_FORMAT_RGBA_8888) {
+        printf("Error: unsupported bitmap format %u in fromBitmap\n", info.format);
+        return;
+    }
+    size_t needed = (size_t)info.width * info.height * 4;
+    if ((size_t)dstCap < needed) {
+        printf("Error: destination buffer too small in fromBitmap\n");
+        return;
+    }
     if (AndroidBitmap_lockPixels(env, bitmap, (void **)&pixels) < 0) return;
 
-    memcpy(dst, pixels, (size_t)info.width * info.height * 4);
+    /* Per row: copy info.width pixels, swapping R↔B to match Drawable's BGRA layout. */
+    for (uint32_t y = 0; y < info.height; y++) {
+        const uint8_t *srcRow = pixels + (size_t)y * info.stride;
+        uint8_t *dstRow = dst + (size_t)y * info.width * 4;
+        for (uint32_t x = 0; x < info.width; x++) {
+            dstRow[x*4 + 0] = srcRow[x*4 + 2]; /* B <- R */
+            dstRow[x*4 + 1] = srcRow[x*4 + 1]; /* G */
+            dstRow[x*4 + 2] = srcRow[x*4 + 0]; /* R <- B */
+            dstRow[x*4 + 3] = srcRow[x*4 + 3]; /* A */
+        }
+    }
 
     AndroidBitmap_unlockPixels(env, bitmap);
 }
The shuffle table from toBitmap (lines 404-409) can be reused here for a NEON-accelerated version.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/main/cpp/winlator/drawable.c` around lines 347 - 366, The JNI
function Java_com_winlator_xserver_Drawable_fromBitmap currently does a raw
memcpy and must be fixed: validate AndroidBitmapInfo.format ==
ANDROID_BITMAP_FORMAT_RGBA_8888, check the destination buffer capacity via
GetDirectBufferCapacity(env, data) is at least info.width * info.height * 4,
read pixels with AndroidBitmap_getInfo and AndroidBitmap_lockPixels as before
but copy row-by-row using info.stride (not a single memcpy) into the direct
buffer, and while copying swap R↔B per-pixel (invert channels to match
Drawable's BGRA layout); finally call AndroidBitmap_unlockPixels and remove the
leftover comment "/* replace the whole JNI body */". Ensure all early returns
log or handle errors appropriately.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@app/src/main/cpp/winlator/drawable.c`:
- Around line 347-366: The JNI function
Java_com_winlator_xserver_Drawable_fromBitmap currently does a raw memcpy and
must be fixed: validate AndroidBitmapInfo.format ==
ANDROID_BITMAP_FORMAT_RGBA_8888, check the destination buffer capacity via
GetDirectBufferCapacity(env, data) is at least info.width * info.height * 4,
read pixels with AndroidBitmap_getInfo and AndroidBitmap_lockPixels as before
but copy row-by-row using info.stride (not a single memcpy) into the direct
buffer, and while copying swap R↔B per-pixel (invert channels to match
Drawable's BGRA layout); finally call AndroidBitmap_unlockPixels and remove the
leftover comment "/* replace the whole JNI body */". Ensure all early returns
log or handle errors appropriately.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66eaa17d-fc7d-4c81-a3f4-d1a8fed42f01

📥 Commits

Reviewing files that changed from the base of the PR and between b6baf92 and 428f1e8.

📒 Files selected for processing (1)

app/src/main/cpp/winlator/drawable.c

cubic-dev-ai

1 issue found across 1 file

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="app/src/main/cpp/winlator/drawable.c">

<violation number="1" location="app/src/main/cpp/winlator/drawable.c:270">
P2: Thick-line rendering is now angle-dependent because the two branches draw different pixel footprints.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-05-06T10:18:07Z

@@ -7,6 +7,10 @@
 #include <android/bitmap.h>


P2: Thick-line rendering is now angle-dependent because the two branches draw different pixel footprints.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At app/src/main/cpp/winlator/drawable.c, line 270: <comment>Thick-line rendering is now angle-dependent because the two branches draw different pixel footprints.</comment> <file context> @@ -203,19 +246,37 @@ Java_com_winlator_xserver_Drawable_drawLine(JNIEnv *env, jclass obj, jshort x0, while (true) { - for (int16_t i = 0; i < lineWidth; i++) { - memcpy(dataAddr + (x0 + (i + y0) * stride) * 4, row, rowSize); + if (isHorizontal) { + // Horizontal-ish: write a full row of pixels at once + for (int16_t i = 0; i < lineWidth; i++) { </file context>

lvonasek · 2026-05-06T10:54:46Z

Some of the changes look for me slower than the previous version (especially when replacing memcpy with something else).

Note that Ludashi 3.0 has quite a few new problems, it might be worth waiting until it is polished before taking his changes.

phobos665 · 2026-05-06T13:25:37Z

Some of the changes look for me slower than the previous version (especially when replacing memcpy with something else).

MIght grab you to chat through which parts are worth pulling out. I was seeing a decent improvement, although if there's parts we can adjust then let's chat

lvonasek

I left a few comments. But I might be wrong.

lvonasek · 2026-05-06T13:43:16Z

+                int j = (x + dstX + (y + dstY) * dstStride) * 4;
+                dstDataAddr[j+0] = srcDataAddr[i+0];
+                dstDataAddr[j+1] = srcDataAddr[i+1];
+                dstDataAddr[j+2] = srcDataAddr[i+2];


This would be better with memcpy

…improve draw performance

…nto fix/drawable-ludashi-improvements

cubic-dev-ai

1 issue found across 1 file

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="app/src/main/cpp/winlator/drawable.c">

<violation number="1" location="app/src/main/cpp/winlator/drawable.c:219">
P2: Writing `uint32_t` pixels through a casted `uint8_t` stack buffer can be misaligned and violates aliasing/alignment guarantees; use a properly aligned `uint32_t` buffer instead.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-05-09T11:26:09Z

@@ -7,6 +7,10 @@
 #include <android/bitmap.h>


P2: Writing uint32_t pixels through a casted uint8_t stack buffer can be misaligned and violates aliasing/alignment guarantees; use a properly aligned uint32_t buffer instead.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At app/src/main/cpp/winlator/drawable.c, line 219: <comment>Writing `uint32_t` pixels through a casted `uint8_t` stack buffer can be misaligned and violates aliasing/alignment guarantees; use a properly aligned `uint32_t` buffer instead.</comment> <file context> @@ -168,18 +203,26 @@ Java_com_winlator_xserver_Drawable_fillRect(JNIEnv *env, jclass obj, jshort x, j - for (int i = 0; i < rowSize; i += 4) memcpy(row + i, rgba, 4); + uint32_t color32 = ((uint32_t)rgba[3] << 24) | ((uint32_t)rgba[2] << 16) | ((uint32_t)rgba[1] << 8) | rgba[0]; + uint32_t *row32 = (uint32_t *)(void *)row; // (void*) cast silences strict-aliasing warning + for (int i = 0; i < width; i++) row32[i] = color32; for (int16_t i = 0; i < height; i++) { </file context>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

app/src/main/cpp/winlator/drawable.c (1)
161-172: ⚡ Quick win

GCF_COPY fast path: prefer memcpy over per-pixel 3-byte copies.

The "fast path" still costs three byte loads + three byte stores per pixel and writes one byte at a time. The neighboring copyArea (line 130) treats the same buffer format as a contiguous 32-bit blit via memcpy, so dst alpha clearly isn't sacred in this codebase. Replacing the inner loop with a per-row memcpy of width * 4 bytes is dramatically faster (single libc call per row, lets the compiler/CPU use wide loads/stores) and matches the existing copyArea pattern. This also addresses the recurring "would be better with memcpy" feedback.
♻️ Proposed refactor
     // Fast path: GCF_COPY is plain pixel blitting — copy only RGB bytes to match
     if (gcFunction == GCF_COPY) {
+        const size_t rowBytes = (size_t)width * 4;
         for (int16_t y = 0; y < height; y++) {
-            // Hoist row base pointers: eliminates a multiply per pixel in the inner loop
-            uint8_t *s = srcDataAddr + (srcX + (y + srcY) * srcStride) * 4;
-            uint8_t *d = dstDataAddr + (dstX + (y + dstY) * dstStride) * 4;
-            for (int16_t x = 0; x < width; x++, s += 4, d += 4) {
-                d[0] = s[0]; d[1] = s[1]; d[2] = s[2];
-                /* byte 3 (alpha) intentionally not copied */
-            }
+            memcpy(dstDataAddr + (dstX + (y + dstY) * dstStride) * 4,
+                   srcDataAddr + (srcX + (y + srcY) * srcStride) * 4,
+                   rowBytes);
         }
         return;
     }
If destination alpha must be preserved (worth confirming against the X11 semantics this implements), keep the byte-wise variant but at least batch into 32-bit reads and mask the alpha byte instead of three discrete byte ops.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/main/cpp/winlator/drawable.c` around lines 161 - 172, The GCF_COPY
fast path in the gcFunction == GCF_COPY block currently copies pixels
byte-by-byte; replace the inner per-pixel 3-byte copies with a per-row memcpy:
compute the source row pointer (using srcDataAddr, srcX, srcY, srcStride) and
destination row pointer (dstDataAddr, dstX, dstY, dstStride) and call memcpy(d,
s, width * 4) for each y to match the existing copyArea approach and leverage
wide loads/stores; if destination alpha truly must be preserved, instead batch
32-bit reads/writes and mask out the alpha byte rather than performing three
separate byte loads/stores.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src/main/cpp/winlator/drawable.c`:
- Around line 396-416: The NEON block uses the AArch64-only intrinsic vqtbl1q_u8
under the existing `#ifdef` __ARM_NEON, which breaks 32-bit ARM builds; update the
preprocessor guard to require AArch64 (e.g., check __aarch64__ in addition to
__ARM_NEON) around the vqtbl1q_u8 path (the static shuffle, vShuffle,
vld1q_u8/vqtbl1q_u8/vst1q_u8 loop) so armeabi-v7a falls back to the existing
scalar/#else branch, or replace with an ARMv7-compatible implementation using
vtbl1_u8 on uint8x8_t halves if you want NEON on 32-bit.
- Around line 269-280: The vertical branch in the drawing loop (conditioned on
isHorizontal) writes only a lineWidth×1 strip while the horizontal branch paints
a lineWidth×lineWidth square, causing asymmetric brush shapes for lineWidth>1;
update the vertical-ish branch (the loop that uses ((uint32_t *)dataAddr)[(x0 +
i) + y0 * stride] = color32) to paint the same square brush as the horizontal
branch (use the same memcpy-based row writes or an equivalent nested loop
writing rowSize bytes for each of lineWidth rows at each Bresenham step using
dataAddr, x0, y0, stride, row, rowSize and color32) so both branches produce
identical lineWidth×lineWidth coverage per step.

---

Nitpick comments:
In `@app/src/main/cpp/winlator/drawable.c`:
- Around line 161-172: The GCF_COPY fast path in the gcFunction == GCF_COPY
block currently copies pixels byte-by-byte; replace the inner per-pixel 3-byte
copies with a per-row memcpy: compute the source row pointer (using srcDataAddr,
srcX, srcY, srcStride) and destination row pointer (dstDataAddr, dstX, dstY,
dstStride) and call memcpy(d, s, width * 4) for each y to match the existing
copyArea approach and leverage wide loads/stores; if destination alpha truly
must be preserved, instead batch 32-bit reads/writes and mask out the alpha byte
rather than performing three separate byte loads/stores.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ecc068d2-88f0-457c-9716-757b338632ba

📥 Commits

Reviewing files that changed from the base of the PR and between 428f1e8 and 874b6f5.

📒 Files selected for processing (1)

app/src/main/cpp/winlator/drawable.c

coderabbitai · 2026-05-09T11:27:05Z

    while (true) {
-        for (int16_t i = 0; i < lineWidth; i++) {
-            memcpy(dataAddr + (x0 + (i + y0) * stride) * 4, row, rowSize);
+        if (isHorizontal) {
+            // Horizontal-ish: write a full row of pixels at once
+            for (int16_t i = 0; i < lineWidth; i++) {
+                memcpy(dataAddr + (x0 + (i + y0) * stride) * 4, row, rowSize);
+            }
+        } else {
+            // Vertical-ish: write individual pixels
+            for (int16_t i = 0; i < lineWidth; i++) {
+                ((uint32_t *)dataAddr)[(x0 + i) + y0 * stride] = color32;
+            }
        }


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Show prior drawLine implementation to confirm what brush shape was rendered before this PR. git log --all --oneline -- app/src/main/cpp/winlator/drawable.c | head -n 20 git show HEAD~1:app/src/main/cpp/winlator/drawable.c 2>/dev/null | sed -n '/Java_com_winlator_xserver_Drawable_drawLine/,/^}/p'

Repository: utkarshdalal/GameNative

Length of output: 458

🏁 Script executed:

#!/bin/bash # First, find and read the drawable.c file find . -name "drawable.c" -type f

Repository: utkarshdalal/GameNative

Length of output: 106

🏁 Script executed:

#!/bin/bash # Read the drawable.c file to see the full drawLine implementation around lines 269-280 wc -l app/src/main/cpp/winlator/drawable.c sed -n '240,300p' app/src/main/cpp/winlator/drawable.c

Repository: utkarshdalal/GameNative

Length of output: 2059

🏁 Script executed:

#!/bin/bash # Try to get the git history to find previous versions cd app/src/main/cpp/winlator && git log --oneline drawable.c | head -20

Repository: utkarshdalal/GameNative

Length of output: 183

🏁 Script executed:

#!/bin/bash # Show the merge commit details and parents cd app/src/main/cpp/winlator && git show 874b6f53 --stat | head -40

Repository: utkarshdalal/GameNative

Length of output: 2520

🏁 Script executed:

#!/bin/bash # Get the prior version before the merge cd app/src/main/cpp/winlator && git show 874b6f53^:drawable.c 2>/dev/null | sed -n '240,300p'

Repository: utkarshdalal/GameNative

Length of output: 49

🏁 Script executed:

#!/bin/bash # Also try the other parent cd app/src/main/cpp/winlator && git show 874b6f53^2:drawable.c 2>/dev/null | sed -n '240,300p'

Repository: utkarshdalal/GameNative

Length of output: 49

🏁 Script executed:

#!/bin/bash # Show git diff for this file in the merge to see what changed cd app/src/main/cpp/winlator && git show 874b6f53 -- drawable.c | grep -A 50 -B 10 "isHorizontal" | head -100

Repository: utkarshdalal/GameNative

Length of output: 2678

🏁 Script executed:

#!/bin/bash # Check what branches exist to understand the structure better git branch -a | head -20

Repository: utkarshdalal/GameNative

Length of output: 171

🏁 Script executed:

#!/bin/bash # Try to find commits that specifically changed drawLine cd app/src/main/cpp/winlator && git log --all --oneline -S "isHorizontal" -- drawable.c

Repository: utkarshdalal/GameNative

Length of output: 183

🏁 Script executed:

#!/bin/bash # Check if there's a prior version of the file on master branch git show master:app/src/main/cpp/winlator/drawable.c 2>&1 | grep -A 20 "drawLine" | head -40

Repository: utkarshdalal/GameNative

Length of output: 913

🏁 Script executed:

#!/bin/bash # Get full history of commits before the merge git log --oneline --all | head -20

Repository: utkarshdalal/GameNative

Length of output: 1386

🏁 Script executed:

#!/bin/bash # Try to see the actual state on master before this merge git show master:app/src/main/cpp/winlator/drawable.c 2>&1 | wc -l

Repository: utkarshdalal/GameNative

Length of output: 71

🏁 Script executed:

#!/bin/bash # Get the full drawLine function from master git show master:app/src/main/cpp/winlator/drawable.c 2>&1 | sed -n '/Java_com_winlator_xserver_Drawable_drawLine/,/^}/p'

Repository: utkarshdalal/GameNative

Length of output: 1351

Inconsistent brush shape between horizontal and vertical branches when lineWidth > 1 is a regression.

The prior implementation painted a lineWidth × lineWidth square at every Bresenham step for all lines. The current code breaks this:

Horizontal-ish: Unchanged — still paints lineWidth × lineWidth square (memcpy loop over lineWidth iterations, each writing rowSize bytes).

Vertical-ish: New asymmetry — paints only lineWidth × 1 horizontal strip (single-pixel-height row at y0).

For lineWidth == 1 both are identical, but for thicker lines this creates asymmetric rendering: vertical-ish lines render with reduced thickness compared to horizontal-ish lines, breaking the prior consistent behavior.

Either restore uniform square brushes (apply the memcpy approach to both branches), or implement symmetric perpendicular strips (write a column of lineWidth pixels for horizontal, row for vertical). The current mixed approach is a regression.

Symmetric perpendicular brush fix

if (isHorizontal) { - // Horizontal-ish: write a full row of pixels at once + // Horizontal-ish: thicken vertically — column of lineWidth pixels at x0. for (int16_t i = 0; i < lineWidth; i++) { - memcpy(dataAddr + (x0 + (i + y0) * stride) * 4, row, rowSize); + ((uint32_t *)dataAddr)[x0 + (y0 + i) * stride] = color32; } } else { // Vertical-ish: write individual pixels for (int16_t i = 0; i < lineWidth; i++) { ((uint32_t *)dataAddr)[(x0 + i) + y0 * stride] = color32; } }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/main/cpp/winlator/drawable.c` around lines 269 - 280, The vertical branch in the drawing loop (conditioned on isHorizontal) writes only a lineWidth×1 strip while the horizontal branch paints a lineWidth×lineWidth square, causing asymmetric brush shapes for lineWidth>1; update the vertical-ish branch (the loop that uses ((uint32_t *)dataAddr)[(x0 + i) + y0 * stride] = color32) to paint the same square brush as the horizontal branch (use the same memcpy-based row writes or an equivalent nested loop writing rowSize bytes for each of lineWidth rows at each Bresenham step using dataAddr, x0, y0, stride, row, rowSize and color32) so both branches produce identical lineWidth×lineWidth coverage per step.

coderabbitai · 2026-05-09T11:27:05Z

+#ifdef __ARM_NEON
+    if (!maskDataAddr) {
+        /* Fast path: no mask — swap R and B channels across 4 pixels at a time.
+         * src layout per pixel: [R, G, B, A]
+         * dst layout per pixel: [B, G, R, A]
+         * vrev32q_u8 reverses the 4 bytes within each 32-bit pixel: RGBA → ABGR,
+         * which maps R→B and B→R with G and A landing in wrong positions.
+         * Instead we use vtbl (byte table lookup) to do an exact per-byte shuffle. */
+        static const uint8_t shuffle[16] = {
+            2, 1, 0, 3,   /* pixel 0: swap bytes 0 and 2 (R↔B), keep 1 (G) and 3 (A) */
+            6, 5, 4, 7,   /* pixel 1 */
+            10, 9, 8, 11, /* pixel 2 */
+            14, 13, 12, 15 /* pixel 3 */
+        };
+        uint8x16_t vShuffle = vld1q_u8(shuffle);
+        int i = 0;
+        for (; i + 3 < size; i += 4) {
+            uint8x16_t vSrc = vld1q_u8(src + i * 4);
+            uint8x16_t vDst = vqtbl1q_u8(vSrc, vShuffle);
+            vst1q_u8(pixels + i * 4, vDst);
+        }


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm whether the project targets armeabi-v7a (in which case __ARM_NEON without __aarch64__ is reachable). fd -t f -e gradle -e kts | xargs rg -n -C2 'abiFilters|ndk\s*\{|armeabi-v7a|arm64-v8a' 2>/dev/null fd -i 'application\.mk|android\.mk|cmakelists\.txt' | xargs rg -n -C2 'APP_ABI|ANDROID_ABI|armeabi|arm64' 2>/dev/null

Repository: utkarshdalal/GameNative

Length of output: 315

vqtbl1q_u8 is AArch64-only — guard must include __aarch64__.

The project explicitly targets both arm64-v8a and armeabi-v7a (as seen in app/build.gradle.kts:76). The code uses #ifdef __ARM_NEON, which is defined for both ARMv7-A and AArch64, but vqtbl1q_u8 is an A64-only intrinsic and does not exist on ARMv7-A NEON. This will cause a compilation failure on 32-bit ARM builds.

Change the guard to:

-#ifdef __ARM_NEON +#if defined(__ARM_NEON) && defined(__aarch64__)

The scalar #else branch already handles both cases, so 32-bit ARM will fall back to the portable path unchanged. Alternatively, if you need NEON vectorization on ARMv7-A, use vtbl1_u8 on uint8x8_t halves instead.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#ifdef __ARM_NEON

if (!maskDataAddr) {

/* Fast path: no mask — swap R and B channels across 4 pixels at a time.

* src layout per pixel: [R, G, B, A]

* dst layout per pixel: [B, G, R, A]

* vrev32q_u8 reverses the 4 bytes within each 32-bit pixel: RGBA → ABGR,

* which maps R→B and B→R with G and A landing in wrong positions.

* Instead we use vtbl (byte table lookup) to do an exact per-byte shuffle. */

static const uint8_t shuffle[16] = {

2, 1, 0, 3, /* pixel 0: swap bytes 0 and 2 (R↔B), keep 1 (G) and 3 (A) */

6, 5, 4, 7, /* pixel 1 */

10, 9, 8, 11, /* pixel 2 */

14, 13, 12, 15 /* pixel 3 */

};

uint8x16_t vShuffle = vld1q_u8(shuffle);

int i = 0;

for (; i + 3 < size; i += 4) {

uint8x16_t vSrc = vld1q_u8(src + i * 4);

uint8x16_t vDst = vqtbl1q_u8(vSrc, vShuffle);

vst1q_u8(pixels + i * 4, vDst);

}

`#if` defined(__ARM_NEON) && defined(__aarch64__)

if (!maskDataAddr) {

/* Fast path: no mask — swap R and B channels across 4 pixels at a time.

* src layout per pixel: [R, G, B, A]

* dst layout per pixel: [B, G, R, A]

* vrev32q_u8 reverses the 4 bytes within each 32-bit pixel: RGBA → ABGR,

* which maps R→B and B→R with G and A landing in wrong positions.

* Instead we use vtbl (byte table lookup) to do an exact per-byte shuffle. */

static const uint8_t shuffle[16] = {

2, 1, 0, 3, /* pixel 0: swap bytes 0 and 2 (R↔B), keep 1 (G) and 3 (A) */

6, 5, 4, 7, /* pixel 1 */

10, 9, 8, 11, /* pixel 2 */

14, 13, 12, 15 /* pixel 3 */

};

uint8x16_t vShuffle = vld1q_u8(shuffle);

int i = 0;

for (; i + 3 < size; i += 4) {

uint8x16_t vSrc = vld1q_u8(src + i * 4);

uint8x16_t vDst = vqtbl1q_u8(vSrc, vShuffle);

vst1q_u8(pixels + i * 4, vDst);

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/main/cpp/winlator/drawable.c` around lines 396 - 416, The NEON block uses the AArch64-only intrinsic vqtbl1q_u8 under the existing `#ifdef` __ARM_NEON, which breaks 32-bit ARM builds; update the preprocessor guard to require AArch64 (e.g., check __aarch64__ in addition to __ARM_NEON) around the vqtbl1q_u8 path (the static shuffle, vShuffle, vld1q_u8/vqtbl1q_u8/vst1q_u8 loop) so armeabi-v7a falls back to the existing scalar/#else branch, or replace with an ARMv7-compatible implementation using vtbl1_u8 on uint8x8_t halves if you want NEON on 32-bit.

fix(): Update drawable to be more performant based on some changes ma…

428f1e8

…de by ludashi regarding smart usage of NEON and checks before attempting to transfer.

phobos665 requested a review from utkarshdalal as a code owner May 6, 2026 10:08

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

cubic-dev-ai Bot reviewed May 6, 2026

View reviewed changes

lvonasek reviewed May 6, 2026

View reviewed changes

phobos665 marked this pull request as draft May 6, 2026 14:00

TripleJ160 reviewed May 6, 2026

View reviewed changes

Comment thread app/src/main/cpp/winlator/drawable.c

phobos665 added 2 commits May 7, 2026 21:27

fix(): add the voidcast on row fill and made some new adjustments to …

adf85e9

…improve draw performance

Merge branch 'master' of https://github.com/utkarshdalal/GameNative i…

874b6f5

…nto fix/drawable-ludashi-improvements

phobos665 marked this pull request as ready for review May 9, 2026 11:19

cubic-dev-ai Bot reviewed May 9, 2026

View reviewed changes

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

phobos665 marked this pull request as draft May 10, 2026 11:48

Uh oh!

Conversation

phobos665 commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Recording

Type of Change

Checklist

Summary by cubic

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

lvonasek commented May 6, 2026

Uh oh!

phobos665 commented May 6, 2026

Uh oh!

lvonasek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lvonasek May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

phobos665 commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading