Conversation
RamSaw
left a comment
There was a problem hiding this comment.
Looks good, thank you! Let's fix small nits, I will take another look and submit likely.
| run: mvn compile exec:java -Dexec.mainClass=com.google.privacy.differentialprivacy.pipelinedp4j.examples.SparkDataFrameExample -Dexec.args="--inputFilePath=$(pwd)/../input.csv --outputFolder=output" | ||
| - name: Build Beam with Spark Runner | ||
| working-directory: examples/pipelinedp4j/beam | ||
| run: mvn package -Pspark-runner,spark-runner-embedeed |
There was a problem hiding this comment.
yes, this is how you pass multiple profile with maven
.github/workflows/maven.yml
Outdated
| - name: Build Beam with Spark Runner | ||
| working-directory: examples/pipelinedp4j/beam | ||
| run: mvn package -Pspark-runner,spark-runner-embedeed | ||
| - name: Run Spark Runner Example |
There was a problem hiding this comment.
nit: Run Beam Example with Spark Runner
better name, more complete
.github/workflows/maven.yml
Outdated
| run: mvn package -Pspark-runner,spark-runner-embedeed | ||
| - name: Run Spark Runner Example | ||
| working-directory: examples/pipelinedp4j/beam | ||
| run: java --add-opens=java.base/sun.nio.ch=ALL-UNNAMED -jar target/beam-1.0-SNAPSHOT-shaded.jar --runner=SparkRunner --sparkMaster="local[*]" --inputFilePath=$(pwd)/../input.csv --outputFilePath=output-spark.txt |
There was a problem hiding this comment.
$(pwd)/../input.csv can be just ../input.csv
| "com.fasterxml.jackson.module:jackson-module-scala_%s:%s" % (SCALA_TAG, JACKSON_TAG), | ||
| "org.scala-lang:scala-library:%s" % SCALA_LIBRARY_TAG, | ||
| "info.picocli:picocli:4.7.6", | ||
| # For Apache Spark Runner testing locally |
There was a problem hiding this comment.
testing? a little bit confusing, maybe running is better?
| ], | ||
|
|
||
| ) | ||
| maven_install( |
There was a problem hiding this comment.
why should it be a separate maven install?
| "org.scala-lang:scala-library:%s" % SCALA_LIBRARY_TAG, | ||
| "info.picocli:picocli:4.7.6", | ||
| # For Apache Spark Runner testing locally | ||
| "org.apache.spark:spark-streaming_%s:%s" % (2.12, SPARK_TAG), |
There was a problem hiding this comment.
why it should be here and the other deps in separate maven install?
examples/pipelinedp4j/beam/pom.xml
Outdated
| </profile> | ||
| <profile> | ||
| <id>spark-runner</id> | ||
| <!-- Makes the DataflowRunner available when running a pipeline. --> |
examples/pipelinedp4j/beam/pom.xml
Outdated
|
|
||
| <profile> | ||
| <id>spark-runner-embedeed</id> | ||
| <!-- Makes the DataflowRunner available when running a pipeline. --> |
There was a problem hiding this comment.
should it be updated? Dataflow is probably incorrect
There was a problem hiding this comment.
I would add that this is for running locally
No description provided.