fix(parquet_derive): support raw identifiers as column names by cbmixx · Pull Request #10113 · apache/arrow-rs

cbmixx · 2026-06-11T10:07:49Z

Which issue does this PR close?

Closes parquet_derive: cannot read or write columns whose name is a Rust keyword (raw identifiers like r#type become column "r#type") #10112.

Rationale for this change

#[derive(ParquetRecordReader)] and #[derive(ParquetRecordWriter)] could not
handle a Parquet column whose name is a Rust keyword (e.g. type). The only way
to spell such a field in Rust is a raw identifier (r#type), but the derives
stringified the identifier including the r# prefix:

The reader's column-index lookup used name_to_index.get(stringify!(#field_names)),
and stringify!(r#type) yields "r#type", so reading failed with
ParquetError::General("column name 'r#type' is not found in parquet file!").
The writer's Field::parquet_type() used self.ident.to_string(), which keeps
the r# prefix, so the written schema got a column literally named r#type.

This made it impossible to read or write Parquet columns whose names are Rust
keywords, e.g. files produced by other Parquet writers with a column named type.

What changes are included in this PR?

Unraw the identifier (via syn::ext::IdentExt::unraw, already available through
the existing syn dependency) wherever it is used as a column name, while keeping
the raw identifier for field access in the generated code:

parquet_derive/src/lib.rs: the reader derive builds a parallel list of unrawed
field-name strings for the name_to_index lookup and its error message.
parquet_derive/src/parquet_field.rs: Field::parquet_type() uses
self.ident.unraw().to_string() for the schema column name.

Are these changes tested?

Yes. Added a unit test (test_parquet_type_with_raw_identifier) and an
integration round-trip test (test_parquet_derive_raw_identifiers) covering a
struct with a raw-identifier field (r#type) alongside a normal field, asserting
the schema columns are named type/count. I verified both tests fail without
the fix (the writer emits a column named r#type) and pass with it.

Are there any user-facing changes?

Structs with raw-identifier fields now read and write columns named without the
r# prefix. This is a bug fix; there are no public API changes. Code that somehow
relied on the previous r#-prefixed column names would change behavior, but such
names could not be produced by any other Parquet writer.

AI disclosure (per CONTRIBUTING.md): this change was developed with the
assistance of an AI coding tool. I reviewed every line, verified the fix against
the failing/passing tests described above, and own the change.

ParquetRecordReader and ParquetRecordWriter derives stringified struct field identifiers including the r# prefix, so a field declared as r#type was looked up (reader) and written to the schema (writer) as a column literally named "r#type" instead of "type". This made it impossible to read or write parquet columns whose names are Rust keywords. Unraw the identifier wherever it is used as a column name, while keeping the raw identifier for field access in the generated code. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions Bot added the parquet-derive label Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(parquet_derive): support raw identifiers as column names#10113

fix(parquet_derive): support raw identifiers as column names#10113
cbmixx wants to merge 1 commit into
apache:mainfrom
cbmixx:fix-parquet-derive-raw-identifiers

cbmixx commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cbmixx commented Jun 11, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant