Summary
A struct field name containing a NUL byte (U+0000) causes a non-unwinding panic → SIGABRT that aborts the whole process when the file's schema is exported across the Arrow C Data FFI boundary. Because it aborts rather than returning an error, a single crafted or mislabeled file can take down any long-lived process (a query server, a JNI host) that opens it — so it reads as a robustness/DoS concern, not just a validation gap.
Observed via the JNI bindings (vortex-jni 0.75.0) on macOS aarch64, but the crash site is in arrow-rs's FFI schema export, so it is not JNI-specific.
Reproduction
- Produce a Vortex file whose top-level struct has a field whose name contains an embedded NUL, e.g. the 10-char string
col + \0 + hidden. (The Vortex file writer rejects duplicate field names, but does not reject NUL/control characters in names, so such a file is producible — and any non-Vortex producer emitting the same DType would trip this too.)
- Open it and scan through the Arrow export path (
DataSource::open → scan → Arrow C schema export).
The process aborts during schema export.
Backtrace (abridged)
thread '<unnamed>' panicked at library/core/src/panicking.rs:225:5:
panic in a function that cannot unwind
...
10: core::panicking::panic_nounwind_fmt
11: core::panicking::panic_nounwind
12: core::panicking::panic_cannot_unwind
13: arrow_array::ffi_stream::get_schema
14: Java_org_apache_arrow_c_jni_JniWrapper_getSchemaArrayStream
thread caused non-unwinding panic. aborting.
The proximate cause looks like a CString::new on the field name inside get_schema failing on the interior NUL, in a context declared not to unwind — so the failure escalates to panic_cannot_unwind → abort instead of surfacing as a Result.
Expected
Opening or exporting the file should fail with a recoverable VortexError / ArrowError, not abort the process. Either reject NUL (and arguably other control characters) in field names at the boundary where a DType/schema is constructed or exported, or handle the CString::new error inside get_schema without crossing a non-unwinding boundary.
Context
Found while hardening cross-compatibility for the pure-Java implementation (vortex-java, intro in #8250). vortex-java now rejects NUL/blank/control field names on both its writer and reader by policy, precisely so it never emits a file that aborts the reference toolchain — but the upstream abort seemed worth reporting on its own. Happy to provide the exact reproducer file or a standalone test if useful.
Summary
A struct field name containing a NUL byte (
U+0000) causes a non-unwinding panic →SIGABRTthat aborts the whole process when the file's schema is exported across the Arrow C Data FFI boundary. Because it aborts rather than returning an error, a single crafted or mislabeled file can take down any long-lived process (a query server, a JNI host) that opens it — so it reads as a robustness/DoS concern, not just a validation gap.Observed via the JNI bindings (
vortex-jni0.75.0) on macOS aarch64, but the crash site is inarrow-rs's FFI schema export, so it is not JNI-specific.Reproduction
col+\0+hidden. (The Vortex file writer rejects duplicate field names, but does not reject NUL/control characters in names, so such a file is producible — and any non-Vortex producer emitting the sameDTypewould trip this too.)DataSource::open→scan→ Arrow C schema export).The process aborts during schema export.
Backtrace (abridged)
The proximate cause looks like a
CString::newon the field name insideget_schemafailing on the interior NUL, in a context declared not to unwind — so the failure escalates topanic_cannot_unwind→ abort instead of surfacing as aResult.Expected
Opening or exporting the file should fail with a recoverable
VortexError/ArrowError, not abort the process. Either reject NUL (and arguably other control characters) in field names at the boundary where aDType/schema is constructed or exported, or handle theCString::newerror insideget_schemawithout crossing a non-unwinding boundary.Context
Found while hardening cross-compatibility for the pure-Java implementation (vortex-java, intro in #8250). vortex-java now rejects NUL/blank/control field names on both its writer and reader by policy, precisely so it never emits a file that aborts the reference toolchain — but the upstream abort seemed worth reporting on its own. Happy to provide the exact reproducer file or a standalone test if useful.