Skip to content

Harden existing audio prefill APIs (#20136)#20136

Open
kirklandsign wants to merge 1 commit into
mainfrom
export-D107929913
Open

Harden existing audio prefill APIs (#20136)#20136
kirklandsign wants to merge 1 commit into
mainfrom
export-D107929913

Conversation

@kirklandsign

@kirklandsign kirklandsign commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary:

Improve the existing audio prefill methods (prefillAudio, prefillRawAudio) with input validation and code cleanup. No API signature changes.

Kotlin:

  • Add require checks for positive dimensions and array size >= expected element count
  • Use Math.multiplyExact to detect overflow in dimension multiplication (consistent with ByteBuffer variants)
  • Improve docstrings: clarify data types (uint8/float32), document throws IllegalArgumentException

JNI (jni_layer_llama.cpp):

  • Add dimension validation (batch_size > 0, n_bins > 0, etc.)
  • Add array size consistency check against batchSize * nBins * nFrames
  • Replace double-allocation + per-element copy with single allocation + reinterpret_cast / direct getRegion
  • Fix emplace_back lint (modernize-use-emplace)
  • Remove silent success on empty data (now returns InvalidArgument)
  • Use size_t casts to prevent integer overflow in size calculations

Differential Revision: D107929913

Copilot AI review requested due to automatic review settings June 8, 2026 22:44
@kirklandsign kirklandsign requested a review from psiddh as a code owner June 8, 2026 22:44
@pytorch-bot

pytorch-bot Bot commented Jun 8, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20136

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 69dba93 with merge base a9d5674 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 8, 2026
@meta-codesync

meta-codesync Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D107929913.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to harden the Android audio prefill APIs by adding input validation and reducing unnecessary JNI copies while keeping API signatures unchanged.

Changes:

  • Add dimension/size validation for prefillAudio / prefillRawAudio in Kotlin and JNI.
  • Replace JNI double-allocation + per-element conversions with direct getRegion into native vectors.
  • Update Kotlin KDoc to clarify data types and thrown exceptions.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
extension/android/jni/jni_layer_llama.cpp Adds validation and refactors audio/raw-audio JNI array handling to reduce copies (but currently has a correctness issue when arrays are larger than expected).
extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmModule.kt Adds require(...) validation for audio prefills and improves documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 480 to 490
auto data_size = data->size();
if (data_size == 0) {
return 0;
size_t expected =
static_cast<size_t>(batch_size) * static_cast<size_t>(n_bins) *
static_cast<size_t>(n_frames);
if (static_cast<size_t>(data_size) < expected) {
return static_cast<jint>(Error::InvalidArgument);
}
std::vector<jbyte> data_jbyte(data_size);
std::vector<uint8_t> data_u8(data_size);
data->getRegion(0, data_size, data_jbyte.data());
for (int i = 0; i < data_size; i++) {
data_u8[i] = data_jbyte[i];
}
data->getRegion(
0, data_size, reinterpret_cast<jbyte*>(data_u8.data()));
llm::Audio audio{std::move(data_u8), batch_size, n_bins, n_frames};
Comment on lines 513 to 522
auto data_size = data->size();
if (data_size == 0) {
return 0;
size_t expected =
static_cast<size_t>(batch_size) * static_cast<size_t>(n_bins) *
static_cast<size_t>(n_frames);
if (static_cast<size_t>(data_size) < expected) {
return static_cast<jint>(Error::InvalidArgument);
}
std::vector<jfloat> data_jfloat(data_size);
std::vector<float> data_f(data_size);
data->getRegion(0, data_size, data_jfloat.data());
for (int i = 0; i < data_size; i++) {
data_f[i] = data_jfloat[i];
}
data->getRegion(0, data_size, data_f.data());
llm::Audio audio{std::move(data_f), batch_size, n_bins, n_frames};
Comment on lines 546 to 556
auto data_size = data->size();
if (data_size == 0) {
return 0;
size_t expected =
static_cast<size_t>(batch_size) * static_cast<size_t>(n_channels) *
static_cast<size_t>(n_samples);
if (static_cast<size_t>(data_size) < expected) {
return static_cast<jint>(Error::InvalidArgument);
}
std::vector<jbyte> data_jbyte(data_size);
std::vector<uint8_t> data_u8(data_size);
data->getRegion(0, data_size, data_jbyte.data());
for (int i = 0; i < data_size; i++) {
data_u8[i] = data_jbyte[i];
}
data->getRegion(
0, data_size, reinterpret_cast<jbyte*>(data_u8.data()));
llm::RawAudio audio{std::move(data_u8), batch_size, n_channels, n_samples};
* @param batchSize Input batch size
* @param nBins Input number of bins
* @param nFrames Input number of frames
* @param audio Input audio as a byte array (uint8 values)
* @param batchSize Input batch size
* @param nBins Input number of bins
* @param nFrames Input number of frames
* @param audio Input audio as a float array (float32 values)
* Prefill the KV cache with the given raw audio input.
* Prefill the KV cache with raw audio data.
*
* @param audio Input raw audio as a byte array
Comment on lines +729 to +731
require(batchSize > 0 && nBins > 0 && nFrames > 0) {
"batchSize, nBins, and nFrames must all be positive"
}
@kirklandsign

Copy link
Copy Markdown
Contributor Author

@claude review

@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown

Claude finished @kirklandsign's task in 1m 41s —— View job


Code Review: Harden existing audio prefill APIs

  • Read the PR diff and changed files
  • Analyze JNI changes for correctness and safety
  • Analyze Kotlin changes for correctness and safety
  • Post review feedback

Overall this is a solid hardening PR — the validation additions, emplace_back fix, and copy elimination are all welcome improvements. I have one correctness issue to flag and a few minor notes.


Bug: Vector size mismatch with llm::Audio constructor

Severity: High — this will cause an abort at runtime for valid (but oversized) inputs.

The Kotlin layer validates audio.size >= expected (allows arrays larger than needed), and the JNI code allocates a vector of size data_size (the full JNI array length) and copies all elements. However, llm::Audio's constructor (audio.h:57-63) enforces strict equality:

ET_CHECK_MSG(
    ... std::get<std::vector<uint8_t>>(data_).size() ==
        static_cast<size_t>(batch_size * n_bins * n_frames), ...);

So if a caller passes an array with size > expected, the Kotlin check passes, the JNI check passes, but the Audio constructor's ET_CHECK_MSG will abort the process.

Affected lines: jni_layer_llama.cpp:487-490, jni_layer_llama.cpp:520-522

Suggested fix (pick one):

  1. Strict equality everywhere — change Kotlin >= to == and JNI < to !=. This is the simplest and most defensive option.
  2. Copy only expected elements — allocate expected and getRegion(0, expected, ...). This allows callers to pass padded buffers.

Note: RawAudio is a plain struct with no size assertion, so the third function (prefill_raw_audio_input) does not have this problem — but using >= there is still inconsistent with option 1.

Fix this →


Minor Notes

1. Overflow in audio.h (pre-existing, not introduced here)

The Audio constructor computes batch_size * n_bins * n_frames using int32_t arithmetic (audio.h:60), which can overflow for large dimensions. The new Kotlin/JNI validation catches this before reaching the constructor, so it's safe in practice for this call path, but worth noting for future hardening of audio.h itself.

2. reinterpret_cast<jbyte*>(data_u8.data()) is correct

jbyte is signed char; uint8_t is unsigned char. They share the same size and alignment on all platforms, and reinterpret_cast preserves the bit pattern. The raw bytes are correctly reinterpreted as uint8_t after the copy. Good optimization eliminating the intermediate vector + per-element loop.

3. Float path getRegion (line 521)

jfloat is float (IEEE 754, same type), so data->getRegion(0, data_size, data_f.data()) works directly without a cast. Correct.

4. emplace_back fix

Changing inputs.emplace_back(llm::MultimodalInput{std::move(audio)}) to inputs.emplace_back(std::move(audio)) — good, this avoids the intermediate temporary and satisfies modernize-use-emplace.


Summary

The PR successfully hardens input validation and eliminates unnecessary copies. The one blocking issue is the vector-size mismatch between the >= check at the JNI/Kotlin layer and the == assertion inside llm::Audio's constructor. Once that's aligned, this looks good to land.


@meta-codesync meta-codesync Bot changed the title Harden existing audio prefill APIs Harden existing audio prefill APIs (#20136) Jun 8, 2026
meta-codesync Bot pushed a commit that referenced this pull request Jun 8, 2026
Summary:

Improve the existing audio prefill methods (`prefillAudio`, `prefillRawAudio`) with input validation and code cleanup. No API signature changes.

Kotlin:
- Add `require` checks for positive dimensions and array size >= expected element count
- Use `Math.multiplyExact` to detect overflow in dimension multiplication (consistent with ByteBuffer variants)
- Improve docstrings: clarify data types (uint8/float32), document `throws IllegalArgumentException`

JNI (`jni_layer_llama.cpp`):
- Add dimension validation (`batch_size > 0`, `n_bins > 0`, etc.)
- Add array size consistency check against `batchSize * nBins * nFrames`
- Replace double-allocation + per-element copy with single allocation + `reinterpret_cast` / direct `getRegion`
- Fix `emplace_back` lint (`modernize-use-emplace`)
- Remove silent success on empty data (now returns `InvalidArgument`)
- Use `size_t` casts to prevent integer overflow in size calculations

Differential Revision: D107929913
@meta-codesync meta-codesync Bot force-pushed the export-D107929913 branch from 12ab659 to d165042 Compare June 8, 2026 23:09
meta-codesync Bot pushed a commit that referenced this pull request Jun 8, 2026
Summary:

Improve the existing audio prefill methods (`prefillAudio`, `prefillRawAudio`) with input validation and code cleanup. No API signature changes.

Kotlin:
- Add `require` checks for positive dimensions and array size >= expected element count
- Use `Math.multiplyExact` to detect overflow in dimension multiplication (consistent with ByteBuffer variants)
- Improve docstrings: clarify data types (uint8/float32), document `throws IllegalArgumentException`

JNI (`jni_layer_llama.cpp`):
- Add dimension validation (`batch_size > 0`, `n_bins > 0`, etc.)
- Add array size consistency check against `batchSize * nBins * nFrames`
- Replace double-allocation + per-element copy with single allocation + `reinterpret_cast` / direct `getRegion`
- Fix `emplace_back` lint (`modernize-use-emplace`)
- Remove silent success on empty data (now returns `InvalidArgument`)
- Use `size_t` casts to prevent integer overflow in size calculations

Differential Revision: D107929913
@meta-codesync meta-codesync Bot force-pushed the export-D107929913 branch from d165042 to 1692652 Compare June 8, 2026 23:13
Summary:

Improve the existing audio prefill methods (`prefillAudio`, `prefillRawAudio`) with input validation and code cleanup. No API signature changes.

Kotlin:
- Add `require` checks for positive dimensions and array size >= expected element count
- Use `Math.multiplyExact` to detect overflow in dimension multiplication (consistent with ByteBuffer variants)
- Improve docstrings: clarify data types (uint8/float32), document `throws IllegalArgumentException`

JNI (`jni_layer_llama.cpp`):
- Add dimension validation (`batch_size > 0`, `n_bins > 0`, etc.)
- Add array size consistency check against `batchSize * nBins * nFrames`
- Replace double-allocation + per-element copy with single allocation + `reinterpret_cast` / direct `getRegion`
- Fix `emplace_back` lint (`modernize-use-emplace`)
- Remove silent success on empty data (now returns `InvalidArgument`)
- Use `size_t` casts to prevent integer overflow in size calculations

Differential Revision: D107929913
Copilot AI review requested due to automatic review settings June 8, 2026 23:18
@meta-codesync meta-codesync Bot force-pushed the export-D107929913 branch from 1692652 to 69dba93 Compare June 8, 2026 23:18
@kirklandsign kirklandsign review requested due to automatic review settings June 8, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants