fix(parquet_derive): support raw identifiers as column names#10113
Open
cbmixx wants to merge 1 commit into
Open
fix(parquet_derive): support raw identifiers as column names#10113cbmixx wants to merge 1 commit into
cbmixx wants to merge 1 commit into
Conversation
ParquetRecordReader and ParquetRecordWriter derives stringified struct field identifiers including the r# prefix, so a field declared as r#type was looked up (reader) and written to the schema (writer) as a column literally named "r#type" instead of "type". This made it impossible to read or write parquet columns whose names are Rust keywords. Unraw the identifier wherever it is used as a column name, while keeping the raw identifier for field access in the generated code. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
#[derive(ParquetRecordReader)]and#[derive(ParquetRecordWriter)]could nothandle a Parquet column whose name is a Rust keyword (e.g.
type). The only wayto spell such a field in Rust is a raw identifier (
r#type), but the derivesstringified the identifier including the
r#prefix:name_to_index.get(stringify!(#field_names)),and
stringify!(r#type)yields"r#type", so reading failed withParquetError::General("column name 'r#type' is not found in parquet file!").Field::parquet_type()usedself.ident.to_string(), which keepsthe
r#prefix, so the written schema got a column literally namedr#type.This made it impossible to read or write Parquet columns whose names are Rust
keywords, e.g. files produced by other Parquet writers with a column named
type.What changes are included in this PR?
Unraw the identifier (via
syn::ext::IdentExt::unraw, already available throughthe existing
syndependency) wherever it is used as a column name, while keepingthe raw identifier for field access in the generated code:
parquet_derive/src/lib.rs: the reader derive builds a parallel list of unrawedfield-name strings for the
name_to_indexlookup and its error message.parquet_derive/src/parquet_field.rs:Field::parquet_type()usesself.ident.unraw().to_string()for the schema column name.Are these changes tested?
Yes. Added a unit test (
test_parquet_type_with_raw_identifier) and anintegration round-trip test (
test_parquet_derive_raw_identifiers) covering astruct with a raw-identifier field (
r#type) alongside a normal field, assertingthe schema columns are named
type/count. I verified both tests fail withoutthe fix (the writer emits a column named
r#type) and pass with it.Are there any user-facing changes?
Structs with raw-identifier fields now read and write columns named without the
r#prefix. This is a bug fix; there are no public API changes. Code that somehowrelied on the previous
r#-prefixed column names would change behavior, but suchnames could not be produced by any other Parquet writer.
AI disclosure (per CONTRIBUTING.md): this change was developed with the
assistance of an AI coding tool. I reviewed every line, verified the fix against
the failing/passing tests described above, and own the change.