native_datafusion: ParquetSchemaConvert error does not include the file path

## Describe the bug

When the `native_datafusion` scan adapter rejects an incompatible Parquet column read, the resulting `SparkError::ParquetSchemaConvert` carries an empty `file_path`. The JVM shim translates this to a `SparkException` whose message reads:

```
Parquet column cannot be converted in file . Column: [a], Expected: int, Found: BINARY.
```

(Note the empty path between `in file` and `.`.) Spark's vectorized reader populates this path via `FileScanRDD`'s catch block (`currentFile.urlEncodedPath`), so its message reads e.g. `... in file file:/tmp/.../part-00000.parquet. Column: ...`.

This blocks several Spark SQL tests that extract the path from the message and re-open the file (e.g. `ParquetSchemaSuite > schema mismatch failure error message for parquet vectorized reader`).

## Where the gap is

`SparkPhysicalExprAdapter::replace_with_spark_cast` and the deferred `RejectOnNonEmpty` expression build the error with `file_path: String::new()` because `PhysicalExprAdapterFactory::create` does not receive the file path. Fixing this likely requires either:

- Capturing the file path when the per-file adapter is created (would need a DataFusion API extension), or
- Catching `ParquetSchemaConvert` at a higher layer with file context (e.g. the parquet `ScanExec`/`FileOpener` wrapper) and re-raising with the path filled in.

## Repro

`./dev/diffs/3.4.3.diff` has the test currently tagged with `IgnoreCometNativeDataFusion` pointing at this issue. Drop the tag and run:

```
ENABLE_COMET=true ENABLE_COMET_ONHEAP=true build/sbt "sql/testOnly *ParquetSchemaSuite -- -z 'schema mismatch failure error message for parquet vectorized reader'"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

native_datafusion: ParquetSchemaConvert error does not include the file path #4316

Describe the bug

Where the gap is

Repro

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

native_datafusion: ParquetSchemaConvert error does not include the file path #4316

Description

Describe the bug

Where the gap is

Repro

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions