wolfCrypt on TI C2000 C28x (LAUNCHXL-F28P55X) by dgarske · Pull Request #10724 · wolfSSL/wolfssl

dgarske · 2026-06-18T00:15:18Z

wolfCrypt: support TI C2000 C28x (CHAR_BIT == 16) targets

What

Enables wolfCrypt on toolchains where a C byte/int is wider than 8 bits - specifically the TI C2000 C28x DSP, where CHAR_BIT == 16 (the smallest addressable unit is a 16-bit cell, int/short are 16-bit, long is 32-bit). Validated on a TI LAUNCHXL-F28P55X (TMS320F28P550SJ) at 150 MHz: SHA-256/384/512(+512-224/256), SHA-3, SHAKE128/256, ML-DSA-87 (verify, keygen, sign), and ECDSA + ECDH P-256 all pass on hardware. wolfcrypt_test passes on x86-64 with no regression.

Why it's non-trivial

On a 16-bit-char target, a word32 occupies two 16-bit cells (two octets packed per cell), sizeof(word32) == 2, and a byte[] holds one octet per cell. So the common idioms - aliasing a word as a byte stream ((byte*)&w, XMEMCPY+ByteReverseWords), sizeof as a byte count, (byte)x to truncate to an octet, and 8 * sizeof(x) for a bit width - are all wrong. There is also a cl2000 codegen quirk: (word32)octet << 24 is miscompiled as a 16-bit shift (the fix accumulates with <<= 8), and the 32x32->32 q^-1 multiply in the ML-DSA Montgomery reduction is miscompiled (split-testing on hardware pinned it to that one multiply; the 64-bit widening multiply compiles correctly, so the fix computes the q^-1 product through the 64-bit path, which is also ~4% faster than the shift-based form).

Changes (4 commits, all gated / no-op on 8-bit-byte targets)

infra, hashes, DRBG - types.h auto-detects WOLFSSL_WIDE_BYTE (CHAR_BIT!=8 / TI C2000 toolchains), guarantees CHAR_BIT is defined, and adds the shared WC_OCTET() octet mask; wc_port.{h,c} widen the atomic init-state bitfield for 16-bit int; settings.h+sp_int.h allow SP math on a 16-bit-int CPU (WOLFSSL_SP_ALLOW_16BIT_CPU, 16-bit-char SP type detection); misc.c rotate bit-width via CHAR_BIT; coding.c base64 octet mask; sha256/sha512 octet-wise big-endian word I/O + CHAR_BIT*sizeof length carry; sha3.c octet-wise Keccak squeeze; random.c octet-portable Hash-DRBG length/counter serialization.
ML-DSA - decode integer-promotion fixes (a byte/word16 field promotes to unsigned 16-bit int, so 2 - field was unsigned and a negative coefficient zero-extended into sword32; cast the field to sword32); encode octet masks. Adds WOLFSSL_MLDSA_VERIFY_SMALLEST_MEM, which streams the signature's z vector one polynomial at a time instead of pinning the whole l-vector - cutting the ML-DSA-87 verify key by ~6 KB (with WOLFSSL_MLDSA_ASSIGN_KEY, ~10.7 KB total verify RAM).
test/bench/ci - brace-init SHA/SHAKE KAT vectors (a "\x.." string is sign-extended by a signed-16-bit-char compiler); WOLFSSL_NO_MALLOC benchmark buffers; and a hardware-free cl2000 compile-only CI guard (scripts/ti-c2000/ + .github/workflows/ti-c2000-compile.yml).
ML-DSA Montgomery - compute the q^-1 step of mldsa_mont_red() through the 32x64->64 widening multiply (MLDSA_MUL_QINV_WIDE64, auto-enabled for WC_16BIT_CPU) instead of the 32x32->32 low multiply cl2000 miscompiles; correct on any conforming compiler and ~4% faster than the shift-based form on the C28x.

Algorithms validated on hardware (TI F28P55x @ 150 MHz)

SHA-256; SHA-384; SHA-512; SHA-512/224; SHA-512/256; SHA3-224/256/384/512; SHAKE128; SHAKE256; HMAC/Hash wrappers; SHA-256 Hash-DRBG; ML-DSA-87 verify, key generation and signing; ECDSA P-256 sign and verify; ECDH P-256 key agreement. (wolfcrypt_test MEMORY/mutex/full-ML-DSA report config-expected results on this bare-metal, verify-only, no-WOLFSSL_MEMORY build.)

Benchmarks (TI F28P55x @ 150 MHz, generic C)

Algorithm	Throughput
SHA-256	277 KiB/s
SHA-384 / SHA-512 / SHA-512-224 / SHA-512-256	~176 KiB/s
SHA3-224 / SHA3-256 / SHA3-384 / SHA3-512	158 / 149 / 115 / 81 KiB/s
SHAKE128 / SHAKE256	182 / 149 KiB/s
RNG (SHA-256 Hash-DRBG)	122 KiB/s (Init/Free ~97 ops/sec)
ML-DSA-87 verify	~305 ms/op (3.28 ops/sec)

SHAKE vs a reference C implementation (cycles for 1 KB): SHAKE128 ~824 k (ref 1,195,069); SHAKE256 ~1.01 M (ref 1,360,788) - roughly 26-31% fewer cycles. ML-DSA-87 verify RAM: ~10.7 KB total (struct ~8.7 KB + stack <2 KB, zero heap) with WOLFSSL_MLDSA_VERIFY_SMALLEST_MEM + WOLFSSL_MLDSA_ASSIGN_KEY, down from ~22 KB. The ~305 ms/op verify figure reflects two optimizations measured on hardware: the 64-bit-widened Montgomery q^-1 multiply above (this PR; 317 -> 305 ms/op) and the companion example running the Keccak permutation and the ML-DSA NTTs from RAM (example PR; 354 -> 317 ms/op).

Notes

Every change is behind WOLFSSL_WIDE_BYTE / WC_16BIT_CPU / WC_SHA3_BYTEWISE / WOLFSSL_SP_ALLOW_16BIT_CPU / WOLFSSL_MLDSA_*, or is an idempotent octet mask (WC_OCTET), so 8-bit-byte builds are functionally unchanged (CHAR_BIT == 8 makes the CHAR_BIT-based expressions byte-for-byte identical to the originals). The bare-metal board example (BSP, linker, KATs, harness) is the companion PR wolfSSL/wolfssl-examples#576 (wolfSSL/wolfssl-examples#576), under embedded/ti-c2000-f28p55x/ - not in this PR. There is no public C28x instruction-set simulator, so the CI is compile-only; on-target KATs run on a hardware-in-the-loop runner.

Test

x86-64: ./configure --enable-all and --enable-dilithium --enable-experimental; wolfcrypt_test (incl. ECC, ML-DSA) passes.
TI C28x: make the wolfssl-examples embedded/ti-c2000-f28p55x (default verify+test, SIGN=1, ECC=1); all KATs + round-trips pass on the F28P55x.
Compile guard: CGT_ROOT=... scripts/ti-c2000/compile.sh.

Copilot

Pull request overview

This PR adds and CI-guards a bare-metal wolfCrypt port for TI C2000 C28x targets where CHAR_BIT == 16, introducing gated fixes so hashing, DRBG, ML-DSA verify, and SP-math ECC work correctly when a C “byte” is wider than 8 bits.

Changes:

Introduces WOLFSSL_NO_OCTET_BYTE detection and uses octet-wise load/store paths to avoid invalid byte/word aliasing on CHAR_BIT != 8 targets (SHA-256/512 family, SHA-3/SHAKE, Base64 CT decode, DRBG helpers, rotate helpers).
Adds “smallest memory” ML-DSA verify mode that streams z per polynomial to reduce pinned RAM in wc_MlDsaKey.
Adds TI C2000 compile-only guard scripts plus a GitHub Actions workflow that downloads the TI CGT and compiles a scoped subset.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
wolfssl/wolfcrypt/wc_port.h	Makes atomic arg type selection robust for 16-bit `int` by also checking `UINT_MAX`.
wolfssl/wolfcrypt/wc_mldsa.h	Adds `WOLFSSL_MLDSA_VERIFY_SMALLEST_MEM` struct layout variant for reduced verify RAM.
wolfssl/wolfcrypt/types.h	Adds `WOLFSSL_NO_OCTET_BYTE` auto-detection; adjusts `WC_16BIT_CPU` 64-bit availability behavior.
wolfssl/wolfcrypt/sp_int.h	Adds support for `unsigned char` being 16-bit (no native 8-bit type).
wolfssl/wolfcrypt/settings.h	Requires explicit opt-in for SP math on 16-bit-`int` CPUs via `WOLFSSL_SP_ALLOW_16BIT_CPU`.
wolfssl/wolfcrypt/dilithium.h	Adds smallest-mem verify gating and defaults slow Montgomery reduction macros on `WC_16BIT_CPU`.
wolfcrypt/test/test.c	Switches large-digest constants from C strings to `byte[]` to avoid `CHAR_BIT!=8` pitfalls.
wolfcrypt/src/wc_port.c	Fixes init-state static assert to use `CHAR_BIT` instead of hardcoded 8.
wolfcrypt/src/wc_mldsa.c	Adds octet-masking for packed bytes and fixes integer-promotion/sign issues on 16-bit `int`; adds streaming `z` verify path.
wolfcrypt/src/sha512.c	Adds octet-wise word load/store and corrects length carry/length placement for `CHAR_BIT!=8`.
wolfcrypt/src/sha3.c	Forces bytewise Keccak absorb/squeeze for `WOLFSSL_NO_OCTET_BYTE` and adds squeeze helper.
wolfcrypt/src/sha256.c	Adds octet-wise word load/store and corrects length carry/length placement for `CHAR_BIT!=8`.
wolfcrypt/src/random.c	Fixes DRBG serialization/addition helpers for non-8-bit “byte” targets.
wolfcrypt/src/misc.c	Fixes rotate helpers to use `CHAR_BIT`-based bit width when needed.
wolfcrypt/src/coding.c	Ensures Base64 CT decode returns `0xFF` for invalid chars even when `byte` is wider than 8 bits.
wolfcrypt/benchmark/benchmark.c	Adds static buffers for `WOLFSSL_NO_MALLOC` benchmarking and adjusts frees/allocations accordingly.
scripts/ti-c2000/user_settings.h	Adds minimal CI-only config for cl2000 compile-guard.
scripts/ti-c2000/compile.sh	Adds compile-only script to build a scoped source set with TI cl2000.
.github/workflows/ti-c2000-compile.yml	Adds CI workflow to download/cache TI CGT and run the compile-only guard.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… hashes, DRBG Enables wolfCrypt on toolchains where a C byte/char is wider than 8 bits (e.g. TI C2000 C28x, CHAR_BIT == 16), all gated on WOLFSSL_WIDE_BYTE and a no-op on 8-bit-byte targets (the default fast paths are left exactly as-is): - types.h: auto-set WOLFSSL_WIDE_BYTE for CHAR_BIT != 8 / known TI C2000 toolchains (and define CHAR_BIT = 16 when <limits.h> is absent); wc_port.h/.c widen the atomic init-state bitfield + CHAR_BIT static assert for 16-bit int. - settings.h + sp_int.h: allow SP math on a 16-bit-int CPU via WOLFSSL_SP_ALLOW_16BIT_CPU, and detect a 16-bit char in the SP smallest-type selection. - misc.c/misc.h: shared big-endian octet<->word helpers (WordsFromBytesBE32/64, BytesFromWordsBE32/64) for WOLFSSL_WIDE_BYTE, where a word cannot be aliased as an octet stream. They are CHAR_BIT-generic, cl2000-safe (loads accumulate with <<= 8, since (word)octet << 24 is miscompiled as a 16-bit shift), in-place safe for the SHA schedule, and store by octet count for partial digests. misc.c rotate width uses CHAR_BIT. - coding.c: mask the constant-time base64 result to an octet. - sha256.c/sha512.c: use the shared helpers for the schedule load and digest store, plus a CHAR_BIT*sizeof length carry; sha3.c: octet-wise Keccak squeeze. - random.c: Hash-DRBG length + reseed-counter serialization via the shared helpers (and an octet-masked carry) under WOLFSSL_WIDE_BYTE; default builds keep the word-aliasing path unchanged. WOLFSSL_WIDE_BYTE replaces the earlier WOLFSSL_NO_OCTET_BYTE working name.

…EST_MEM ML-DSA-87 keygen/sign/verify on a 16-bit byte/int CPU (TI C28x), gated and a no-op on normal targets: - Encode/decode integer-promotion fixes: a byte/word16 field promotes to *unsigned* int where int is 16-bit, so '2 - field' was unsigned and a negative coefficient zero-extended into sword32 (e.g. -1 -> 0x0000FFFF); cast the unpacked field to sword32 (eta-2/eta-4/t0 decode). Bit-packers relied on (byte) truncating to 8 bits; mask with MLDSA_OCT() and cast the <<MLDSA_D shift to sword32 (eta-2/t0/t1/gamma1 encode). - dilithium.h: shift-based Montgomery reduction on WC_16BIT_CPU (cl2000 miscompiles the multiply form). - New WOLFSSL_MLDSA_VERIFY_SMALLEST_MEM: stream the signature z vector one polynomial at a time instead of pinning the whole l-vector, cutting the ML-DSA-87 verify key by ~6 KB (with WOLFSSL_MLDSA_ASSIGN_KEY, ~10.7 KB total verify RAM on the C28x).

…mpile CI - test.c: store the SHA/SHAKE large_digest KAT vectors as brace-init byte arrays (clean octets) instead of "\x.." string literals, which a signed-16-bit-char toolchain (cl2000) would sign-extend. - benchmark.c: WOLFSSL_NO_MALLOC mode uses static plain/cipher buffers and skips the key/iv XMALLOC/XFREE (gated; default build unchanged). - scripts/ti-c2000/ + .github/workflows/ti-c2000-compile.yml: a hardware-free cl2000 compile-only CI guard for the CHAR_BIT!=8 wolfCrypt subset.

…it CPUs The TI cl2000 (C2000 C28x) compiler miscompiles the 32x32->32 low multiply used for the q^-1 step of mldsa_mont_red() - verified on a TMS320F28P550SJ, the ML-DSA-87 verify KAT fails (res=0) - but compiles the 32x64->64 widening multiply correctly. Compute the q^-1 product through the 64-bit path (MLDSA_MUL_QINV_WIDE64): correct on any conforming compiler and, on the C28x, ~4% faster than the shift-based reduction (305 vs 317 ms/op for ML-DSA-87 verify). dilithium.h auto-selects it for WC_16BIT_CPU and leaves the q multiply enabled (it compiles correctly); a user can still force the shift form with MLDSA_MUL_QINV_SLOW / MLDSA_MUL_Q_SLOW. Validated on hardware for keygen+sign+verify (round-trip res=1). No effect on 8-bit/>=32-bit-int builds.

dgarske self-assigned this Jun 18, 2026

Copilot AI review requested due to automatic review settings June 18, 2026 00:15

Copilot started reviewing on behalf of dgarske June 18, 2026 00:15 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Comment thread wolfssl/wolfcrypt/types.h Outdated

Comment thread wolfcrypt/benchmark/benchmark.c

dgarske force-pushed the ti_c25 branch from 6830d02 to 6983f41 Compare June 18, 2026 22:00

dgarske added 4 commits June 18, 2026 15:26

dgarske force-pushed the ti_c25 branch from 6983f41 to 20e4053 Compare June 18, 2026 23:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wolfCrypt on TI C2000 C28x (LAUNCHXL-F28P55X)#10724

wolfCrypt on TI C2000 C28x (LAUNCHXL-F28P55X)#10724
dgarske wants to merge 4 commits into
wolfSSL:masterfrom
dgarske:ti_c25

dgarske commented Jun 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dgarske commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

wolfCrypt: support TI C2000 C28x (CHAR_BIT == 16) targets

What

Why it's non-trivial

Changes (4 commits, all gated / no-op on 8-bit-byte targets)

Algorithms validated on hardware (TI F28P55x @ 150 MHz)

Benchmarks (TI F28P55x @ 150 MHz, generic C)

Notes

Test

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dgarske commented Jun 18, 2026 •

edited

Loading