[SPARK-55229][PYTHON] Implement DataFrame.zipWithIndex in PySpark Classic by fangchenli · Pull Request #54195 · apache/spark

fangchenli · 2026-02-07T06:13:05Z

What changes were proposed in this pull request?

Implement DataFrame.zipWithIndex in PySpark Classic

Why are the changes needed?

This method was added in Scala earlier. We need to add it in PySpark classic so user can use it in PySpark.

Does this PR introduce any user-facing change?

Yes, user can see and use this API in PySpark.

How was this patch tested?

Unittests added.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.6

…ssic

github-actions · 2026-02-07T06:13:16Z

JIRA Issue Information

=== Sub-task SPARK-55229 ===
Summary: Implement DataFrame.zipWithIndex in PySpark Classic
Assignee: None
Status: Open
Affected: ["4.2.0"]

This comment was automatically generated by GitHub Actions

zhengruifeng · 2026-02-07T08:32:28Z

python/pyspark/sql/classic/dataframe.py


+    def zipWithIndex(self, indexColName: str = "index") -> ParentDataFrame:
+        return self.select(
+            F.col("*"), InternalFunction.distributed_sequence_id().alias(indexColName)


this function is dedicated for PS, and I am making it different from zipWithIndex on underlying RDD cache.

basically, we directly invoke JVM methods via py4j for methods in pyspark classic

fangchenli added 2 commits February 6, 2026 21:59

[SPARK-55229][PYTHON] Implement DataFrame.zipWithIndex in PySpark Cla…

c40d37b

…ssic

add error test

29b435e

github-actions bot added SQL PYTHON labels Feb 7, 2026

add zipWithIndex in connect

ca5e8d0

zhengruifeng reviewed Feb 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55229][PYTHON] Implement DataFrame.zipWithIndex in PySpark Classic#54195

[SPARK-55229][PYTHON] Implement DataFrame.zipWithIndex in PySpark Classic#54195
fangchenli wants to merge 3 commits intoapache:masterfrom
fangchenli:pyspark-zip-with-index

fangchenli commented Feb 7, 2026

Uh oh!

github-actions bot commented Feb 7, 2026

Uh oh!

zhengruifeng Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fangchenli commented Feb 7, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Feb 7, 2026

JIRA Issue Information

Uh oh!

zhengruifeng Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants