-
Notifications
You must be signed in to change notification settings - Fork 0
Mantis Shrimp
Mantis Shrimp is as separate from Ocean as possible. It is a generic worker framework and currently there is implemented on top of this architecture a tagging system.
It is modelled as set of Akka actors (very popular Scala library for distributed programming) that pull things from RabbitMQ . For the current iteration I will use NER tagger. Modules that are present in the current architecture
-
Mantis Node - node in the system
-
Mantis Tagger - Akka Actor - can receive news to tag, or be configure to pull himself from Kafka (for speed purposes!). It will push tags to Kafka or send to requester.
-
Mantis News Fetcher - python script that will pull things from Kafka and push into Neo4j. Note: decomposition Kafka/Neo4j is important because we want to use Kafka to fast queries/inserts, like user statistics
-
Mantis Master - Akka Actor with registered Mantis Nod (weson't do much for now)
-
Mantis News Dumper - not ready yet (lionfish issues)
All nodes are connected in a tree, see exemplary conf files from mantis_shrimp directory to get a feel of what is going on
The world of NLP is currently thiriving with new ideas. I would like to use knowledge graphs (Freebase, DBPedia) in our application. Also many deeplearning solutions are widely available (see word2vec - just one example of a great neural language model).
The first thing to do would be document classification using:
-
Knowledge Graph
-
Simple algorithm already present in some library like LDA (maybe Mahout, Vobbit from Microsoft)