Skip to content

add spark runner documentation#367

Open
leroyjb wants to merge 20 commits intogoogle:mainfrom
leroyjb:main
Open

add spark runner documentation#367
leroyjb wants to merge 20 commits intogoogle:mainfrom
leroyjb:main

Conversation

@leroyjb
Copy link

@leroyjb leroyjb commented Nov 19, 2025

No description provided.

@leroyjb leroyjb marked this pull request as ready for review November 20, 2025 13:18
Copy link
Collaborator

@RamSaw RamSaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you! Let's fix small nits, I will take another look and submit likely.

run: mvn compile exec:java -Dexec.mainClass=com.google.privacy.differentialprivacy.pipelinedp4j.examples.SparkDataFrameExample -Dexec.args="--inputFilePath=$(pwd)/../input.csv --outputFolder=output"
- name: Build Beam with Spark Runner
working-directory: examples/pipelinedp4j/beam
run: mvn package -Pspark-runner,spark-runner-embedeed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is , correct syntax?

Copy link
Author

@leroyjb leroyjb Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is how you pass multiple profile with maven

- name: Build Beam with Spark Runner
working-directory: examples/pipelinedp4j/beam
run: mvn package -Pspark-runner,spark-runner-embedeed
- name: Run Spark Runner Example
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Run Beam Example with Spark Runner
better name, more complete

run: mvn package -Pspark-runner,spark-runner-embedeed
- name: Run Spark Runner Example
working-directory: examples/pipelinedp4j/beam
run: java --add-opens=java.base/sun.nio.ch=ALL-UNNAMED -jar target/beam-1.0-SNAPSHOT-shaded.jar --runner=SparkRunner --sparkMaster="local[*]" --inputFilePath=$(pwd)/../input.csv --outputFilePath=output-spark.txt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$(pwd)/../input.csv can be just ../input.csv

"com.fasterxml.jackson.module:jackson-module-scala_%s:%s" % (SCALA_TAG, JACKSON_TAG),
"org.scala-lang:scala-library:%s" % SCALA_LIBRARY_TAG,
"info.picocli:picocli:4.7.6",
# For Apache Spark Runner testing locally
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing? a little bit confusing, maybe running is better?

],

)
maven_install(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should it be a separate maven install?

"org.scala-lang:scala-library:%s" % SCALA_LIBRARY_TAG,
"info.picocli:picocli:4.7.6",
# For Apache Spark Runner testing locally
"org.apache.spark:spark-streaming_%s:%s" % (2.12, SPARK_TAG),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it should be here and the other deps in separate maven install?

</profile>
<profile>
<id>spark-runner</id>
<!-- Makes the DataflowRunner available when running a pipeline. -->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataflow is incorrect


<profile>
<id>spark-runner-embedeed</id>
<!-- Makes the DataflowRunner available when running a pipeline. -->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be updated? Dataflow is probably incorrect

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add that this is for running locally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants