Skip to content

Newspark-UTN/poc-lsa-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

poc-lsa-model

Installation (local)

  • Download and install Scala and SBT

  • Install MongoDB to store the data. Once installed, use the mongod command to start mongo. Create a new db named newspark with collection news and user newspark by:

    • mongo
    • use newspark
    • db.createCollection("news")
    •  db.createUser(
          {
            user: "newspark",
            pwd: "newspark",
            roles: [ "readWrite", "dbAdmin" ]
          }
       )
      
  • This PoC relies on Spark's MLlib, so go ahead and install it. You can use Tom's simple but effective installation guide

  • Go to the root of the project and run sbt gen-idea (if using Intellij IDEA)

Optional - Test data

To store some test data:

  • Outside the console mongoimport --db newspark --collection news --drop --file path-to-repo/poc-lsa-model/src/main/resources/realdata/2016-08-15.json

Tests

Lemmatization

  • sbt run and select Lemmatizer

Docker

  • eval $(docker-machine env newspark-dev)
  • ${docker-utils-dir}/newspark sbt assembly
  • ${docker-utils-dir}/newspark run

About

Latent Semantic Analysis PoC using Spark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages