add spark runner documentation by leroyjb · Pull Request #367 · google/differential-privacy

leroyjb · 2025-11-19T15:13:51Z

No description provided.

RamSaw

Looks good, thank you! Let's fix small nits, I will take another look and submit likely.

RamSaw · 2025-12-03T13:14:28Z

.github/workflows/maven.yml

        run: mvn compile exec:java -Dexec.mainClass=com.google.privacy.differentialprivacy.pipelinedp4j.examples.SparkDataFrameExample -Dexec.args="--inputFilePath=$(pwd)/../input.csv --outputFolder=output"
+      - name: Build Beam with Spark Runner
+        working-directory: examples/pipelinedp4j/beam
+        run: mvn package -Pspark-runner,spark-runner-embedeed


is , correct syntax?

yes, this is how you pass multiple profile with maven

RamSaw · 2025-12-03T13:14:57Z

.github/workflows/maven.yml

+      - name: Build Beam with Spark Runner
+        working-directory: examples/pipelinedp4j/beam
+        run: mvn package -Pspark-runner,spark-runner-embedeed
+      - name: Run Spark Runner Example


nit: Run Beam Example with Spark Runner
better name, more complete

RamSaw · 2025-12-03T13:15:34Z

.github/workflows/maven.yml

+        run: mvn package -Pspark-runner,spark-runner-embedeed
+      - name: Run Spark Runner Example
+        working-directory: examples/pipelinedp4j/beam
+        run: java --add-opens=java.base/sun.nio.ch=ALL-UNNAMED -jar target/beam-1.0-SNAPSHOT-shaded.jar --runner=SparkRunner --sparkMaster="local[*]"  --inputFilePath=$(pwd)/../input.csv --outputFilePath=output-spark.txt


$(pwd)/../input.csv can be just ../input.csv

examples/pipelinedp4j/README.md

RamSaw · 2025-12-03T13:21:36Z

examples/pipelinedp4j/WORKSPACE.bazel

        "com.fasterxml.jackson.module:jackson-module-scala_%s:%s" % (SCALA_TAG, JACKSON_TAG),
        "org.scala-lang:scala-library:%s" % SCALA_LIBRARY_TAG,
        "info.picocli:picocli:4.7.6",
+        # For Apache Spark Runner testing locally


testing? a little bit confusing, maybe running is better?

RamSaw · 2025-12-03T13:22:19Z

examples/pipelinedp4j/WORKSPACE.bazel

    ],
+
+)
+maven_install(


why should it be a separate maven install?

RamSaw · 2025-12-03T13:23:18Z

examples/pipelinedp4j/WORKSPACE.bazel

        "org.scala-lang:scala-library:%s" % SCALA_LIBRARY_TAG,
        "info.picocli:picocli:4.7.6",
+        # For Apache Spark Runner testing locally
+        "org.apache.spark:spark-streaming_%s:%s" % (2.12, SPARK_TAG),


why it should be here and the other deps in separate maven install?

RamSaw · 2025-12-03T13:24:33Z

examples/pipelinedp4j/beam/pom.xml

        </profile>
+        <profile>
+        <id>spark-runner</id>
+            <!-- Makes the DataflowRunner available when running a pipeline. -->


nit: indentation

dataflow is incorrect

RamSaw · 2025-12-03T13:25:15Z

examples/pipelinedp4j/beam/pom.xml

+
+        <profile>
+            <id>spark-runner-embedeed</id>
+            <!-- Makes the DataflowRunner available when running a pipeline. -->


should it be updated? Dataflow is probably incorrect

I would add that this is for running locally

leroyjb added 15 commits November 19, 2025 15:12

add spark runner documentation

02e515d

add maven.yaml support

f32709d

add bazel support

daf611e

update documentation

95f27c8

remove unnecessary job

8c96aaf

Revert removal of empty line

d862d44

delete unnecessary bazel file

883c0bf

remove unnecessary bazel file

d288a36

fix indent

5881c88

fix profile typo

78d56ff

fix type ( double quote instead of simple quote)

33cab36

indent xml file

b3622c2

revert schema location

07eb1f2

fix missing tag

a17030b

reformat pom.xml

f964929

leroyjb marked this pull request as ready for review November 20, 2025 13:18

RamSaw requested changes Dec 3, 2025

View reviewed changes

leroyjb and others added 5 commits January 29, 2026 13:57

Merge branch 'google:main' into main

94f4588

nit: typo, renaming, indentation

86e25de

nit: typo, renaming, indentation

9b5d30c

Merge branch 'google:main' into main

566e68c

Merge branch 'main' of github.com:leroyjb/differential-privacy

0c0e3fd

Conversation

leroyjb commented Nov 19, 2025

Uh oh!

RamSaw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leroyjb Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leroyjb Jan 29, 2026 •

edited

Loading