[SPARK-55361][DOCS][PYTHON] suggest the use of to_arrow_schema to avoid specifying schema twice by casgie · Pull Request #54180 · apache/spark

casgie · 2026-02-06T16:10:01Z

suggest the use of to_arrow_schema to avoid specifying schema twice

What changes were proposed in this pull request?

python/docs/source/tutorial/sql/python_data_source.rst gives an example for using PyArrow RecordBatch.

In the example, the Schema is specified twice, once for Spark

def schema(self):
   return "key int, value string"

and then again for PyArrow

def read(self, partition):
   ...
   schema = pa.schema([("key", pa.int32()), ("value", pa.string())])

I am proposing to change the documentation to only specify a Spark schema and to use
pyspark.sql.pandas.types.to_arrow_schema() to convert the Spark schema to arrow.

Why are the changes needed?

Using Python Data Source in production, having to specify the schema twice is a hassle and also a source for errors.

Does this PR introduce any user-facing change?

Yes, changes documentation.

How was this patch tested?

N/A

Was this patch authored or co-authored using generative AI tooling?

No

…id specifying schema twice

github-actions · 2026-02-06T16:10:12Z

JIRA Issue Information

=== Documentation SPARK-55361 ===
Summary: add to_arrow_schema python_data_source.rst to avoid double-specifying schema
Assignee: None
Status: Open
Affected: ["4.0.0","4.0.1","4.1.1"]

This comment was automatically generated by GitHub Actions

[SPARK-55361][DOCS][PYTHON] suggest the use of to_arrow_schema to avo…

62cbc3e

…id specifying schema twice

github-actions bot added SQL DOCS PYTHON labels Feb 6, 2026

zhengruifeng requested a review from allisonwang-db February 7, 2026 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55361][DOCS][PYTHON] suggest the use of to_arrow_schema to avoid specifying schema twice#54180

[SPARK-55361][DOCS][PYTHON] suggest the use of to_arrow_schema to avoid specifying schema twice#54180
casgie wants to merge 1 commit intoapache:masterfrom
casgie:SPARK-55361-improve-docs-python-source

casgie commented Feb 6, 2026

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

casgie commented Feb 6, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Feb 6, 2026

JIRA Issue Information

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants