Skip to content

NUL byte in a struct field name aborts the process (SIGABRT) via Arrow FFI schema export #8652

Description

@dfa1

Summary

A struct field name containing a NUL byte (U+0000) causes a non-unwinding panic → SIGABRT that aborts the whole process when the file's schema is exported across the Arrow C Data FFI boundary. Because it aborts rather than returning an error, a single crafted or mislabeled file can take down any long-lived process (a query server, a JNI host) that opens it — so it reads as a robustness/DoS concern, not just a validation gap.

Observed via the JNI bindings (vortex-jni 0.75.0) on macOS aarch64, but the crash site is in arrow-rs's FFI schema export, so it is not JNI-specific.

Reproduction

  1. Produce a Vortex file whose top-level struct has a field whose name contains an embedded NUL, e.g. the 10-char string col + \0 + hidden. (The Vortex file writer rejects duplicate field names, but does not reject NUL/control characters in names, so such a file is producible — and any non-Vortex producer emitting the same DType would trip this too.)
  2. Open it and scan through the Arrow export path (DataSource::openscan → Arrow C schema export).

The process aborts during schema export.

Backtrace (abridged)

thread '<unnamed>' panicked at library/core/src/panicking.rs:225:5:
panic in a function that cannot unwind
...
  10: core::panicking::panic_nounwind_fmt
  11: core::panicking::panic_nounwind
  12: core::panicking::panic_cannot_unwind
  13: arrow_array::ffi_stream::get_schema
  14: Java_org_apache_arrow_c_jni_JniWrapper_getSchemaArrayStream
thread caused non-unwinding panic. aborting.

The proximate cause looks like a CString::new on the field name inside get_schema failing on the interior NUL, in a context declared not to unwind — so the failure escalates to panic_cannot_unwind → abort instead of surfacing as a Result.

Expected

Opening or exporting the file should fail with a recoverable VortexError / ArrowError, not abort the process. Either reject NUL (and arguably other control characters) in field names at the boundary where a DType/schema is constructed or exported, or handle the CString::new error inside get_schema without crossing a non-unwinding boundary.

Context

Found while hardening cross-compatibility for the pure-Java implementation (vortex-java, intro in #8250). vortex-java now rejects NUL/blank/control field names on both its writer and reader by policy, precisely so it never emits a file that aborts the reference toolchain — but the upstream abort seemed worth reporting on its own. Happy to provide the exact reproducer file or a standalone test if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions