Skip to content

ARM32/Thumb2: generated asm fixes#10725

Open
SparkiDev wants to merge 1 commit into
wolfSSL:masterfrom
SparkiDev:aes_x25519_arm32_thumb2_fixes
Open

ARM32/Thumb2: generated asm fixes#10725
SparkiDev wants to merge 1 commit into
wolfSSL:masterfrom
SparkiDev:aes_x25519_arm32_thumb2_fixes

Conversation

@SparkiDev

Copy link
Copy Markdown
Contributor

Description

Fix Thumb2 Curve25519 asm to do full reduce.
Change ARM32 to simpler carry/overflow processing. Minor optimizations - use ubfx, no need to move register into temporary, cache value instead of loading again later. Reduce the register push and pops in Thumb2 generated code. Fix Thumb2 to have values less than 64 in decimal.

Testing

ARM32 (armv7a) and Thumb2 (armv7m) assembly configurations in QEMU.

@SparkiDev SparkiDev self-assigned this Jun 18, 2026
@SparkiDev SparkiDev added the For This Release Release version 5.9.2 label Jun 18, 2026
@SparkiDev

Copy link
Copy Markdown
Contributor Author

Code generated with PR:
https://github.com/wolfSSL/scripts/pull/590

@SparkiDev

Copy link
Copy Markdown
Contributor Author

Jenkins: retest this please

@wolfSSL-Fenrir-bot wolfSSL-Fenrir-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fenrir Automated Review — PR #10725

No scan targets match the changed files in this PR. Review skipped.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates multiple ARM32/Thumb2 hand-written/generated assembly sources to improve correctness and simplify/improve instruction sequences, primarily by adjusting reduction/carry handling and making immediate operands more consistent.

Changes:

  • Normalize many immediate operands (hex → decimal, #0x0#0, etc.) across Thumb2 SHA2/SHA3/ChaCha/Poly1305/ML-KEM codegen outputs.
  • Update ARM32 Curve25519 field ops to a simpler underflow/overflow handling approach and adjust fe_isnegative to avoid a reload by caching the low-bit early.
  • Optimize AES/GCM assembly (including ubfx usage and arch<7 fallbacks) and remove a redundant register move in the AES key schedule path.

Reviewed changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated no comments.

Show a summary per file
File Description
wolfcrypt/src/port/arm/thumb2-sha512-asm.S Immediate-operand normalization in SHA-512 transform loop.
wolfcrypt/src/port/arm/thumb2-sha512-asm_c.c Mirrors SHA-512 Thumb2 asm immediate changes in inline-asm C.
wolfcrypt/src/port/arm/thumb2-sha3-asm.S Immediate-operand normalization in SHA-3 Thumb2 implementation.
wolfcrypt/src/port/arm/thumb2-sha3-asm_c.c Mirrors SHA-3 Thumb2 asm immediate changes in inline-asm C.
wolfcrypt/src/port/arm/thumb2-sha256-asm.S Immediate-operand normalization in SHA-256 Thumb2 implementation.
wolfcrypt/src/port/arm/thumb2-sha256-asm_c.c Mirrors SHA-256 Thumb2 asm immediate changes in inline-asm C.
wolfcrypt/src/port/arm/thumb2-poly1305-asm.S Immediate-operand normalization and small simplifications in Poly1305 Thumb2 asm.
wolfcrypt/src/port/arm/thumb2-poly1305-asm_c.c Mirrors Poly1305 Thumb2 asm immediate changes in inline-asm C.
wolfcrypt/src/port/arm/thumb2-mlkem-asm.S Immediate-operand normalization in ML-KEM Thumb2 asm loops.
wolfcrypt/src/port/arm/thumb2-mlkem-asm_c.c Mirrors ML-KEM Thumb2 asm immediate changes in inline-asm C.
wolfcrypt/src/port/arm/thumb2-chacha-asm.S Immediate-operand normalization in ChaCha Thumb2 asm.
wolfcrypt/src/port/arm/thumb2-chacha-asm_c.c Mirrors ChaCha Thumb2 asm immediate changes in inline-asm C.
wolfcrypt/src/port/arm/thumb2-aes-asm.S GCM nibble extraction simplification (ubfx) and immediate normalization; updates perf comment.
wolfcrypt/src/port/arm/armv8-32-curve25519.S Simplifies underflow/overflow adjustment logic and tweaks fe_isnegative bit usage.
wolfcrypt/src/port/arm/armv8-32-curve25519_c.c Mirrors Curve25519 asm changes; adds r12 clobber for updated fe_isnegative.
wolfcrypt/src/port/arm/armv8-32-aes-asm.S Removes redundant move in AES key schedule; adds arch<7 fallback sequences for GCM nibble extraction.
wolfcrypt/src/port/arm/armv8-32-aes-asm_c.c Mirrors AES/GCM asm changes in inline-asm C.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@douzzer douzzer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't merge as-is -- see wolfssl/scripts#590

Fix Thumb2 Curve25519 asm to do full reduce.
Change ARM32 to simpler carry/overflow processing.
Minor optimizations - use ubfx, no need to move register into temporary, cache value instead of loading again later.
Reduce the register push and pops in Thumb2 generated code.
Fix Thumb2 to have values less than 64 in decimal.
@SparkiDev SparkiDev force-pushed the aes_x25519_arm32_thumb2_fixes branch from 8d2c23d to 9558b0d Compare June 18, 2026 07:52
@SparkiDev SparkiDev added Not For This Release Not for release 5.9.2 and removed For This Release Release version 5.9.2 labels Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Not For This Release Not for release 5.9.2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants