-
Install MongoDB to store the data. Once installed, use the
mongodcommand to start mongo. Create a new db namednewsparkwith collectionnewsand usernewsparkby:mongouse newsparkdb.createCollection("news")-
db.createUser( { user: "newspark", pwd: "newspark", roles: [ "readWrite", "dbAdmin" ] } )
-
This PoC relies on Spark's MLlib, so go ahead and install it. You can use Tom's simple but effective installation guide
-
Go to the root of the project and run
sbt gen-idea(if using Intellij IDEA)
To store some test data:
- Outside the console
mongoimport --db newspark --collection news --drop --file path-to-repo/poc-lsa-model/src/main/resources/realdata/2016-08-15.json
sbt runand selectLemmatizer
eval $(docker-machine env newspark-dev)${docker-utils-dir}/newspark sbt assembly${docker-utils-dir}/newspark run