Skip to content
Stanislaw Jastrzebski edited this page May 22, 2014 · 8 revisions

Don Corleone is a synchronization service. The main idea is that after modyfing config.json you can run

sudo -E python2 run_node.py

and your services will be configured automatically.

Service

Service is a single program that probably can communicate with other services. Service can be local, or global (in this case everyone in the cluster can use your service). Service's params, like host or port, can be queried using Don Corleone API. Things like local services, wheter the service is unary, or not, and other issues are handled automatically.

Config.json

Exemplary JSON configuration:

{
"master":"http://ocean-don.no-ip.biz",
"master_local_url":"http://127.0.0.1:8881",
"master_local":true,
"node_responsibilities":[
    ["neo4j",{"port":7474, "host":"127.0.0.1", "local":true}],
    ["lionfish",{"port":7777, "host":"127.0.0.1", "local":true}],
    ["spidercrab_master", {
            "local": false,
            "update_interval_s": 10,
            "graph_worker_id": "oceanic_crab"
        }],
    ["spidercrab_slave", {
            "number":5,
            "local": true,
            "graph_worker_id": "oceanic_crab"
        }]
    ]
,
"node_id":"staszek",
"home":"/home/moje/Projekty/ocean/ocean",
"public_ssh_domain":"127.0.0.1",
"ssh-user":"staszek",
"ssh-reversed":true,
"ssh-port":2215
}

Detailed field descriptions:

  • master: URL of master node, can be empty if you are using local Don Corleone

  • master_local_url: URL of local Don Corleone if you are using local, can be empty if using master. Note: you won't have to change it in most cases

  • master_local: true/false - which master should Don Corleone use. If it is local, when running will check if it should run Don Corleone automatically for you :)

  • node_responsibilities: define here what services you want to make public. It is a list of services. Each service is of format [service_name, service_params]. All of the service params can be queried afterwards using Don Corleone API. Also a subset of service_params can be passed to running service via parameters (more on this in Adding new service section).

  • node_id: unique node id in the cluster

  • home: path to root

  • public_ssh_domain, ssh-user, ssh-reversed, ssh-port : Don Corleone is using ssh to communicate with your computer. You can use Don Corleone being behind NAT (use ssh-reversed = true). Check your Don Corleone setting via running ssh ssh-user@public_ssh_domain -p ssh-port if your are not behind NAT. If you are using local Don Corleone you should set ssh domain to 127.0.0.1. Make sure sshd daemon is running when using Don Corleone !

Don Corleone API

For functions accessible via API see don_utils.py. If Don Corleone is offline you can still pull config from config.json using API

Adding new service

To add a new service you have to do 4 things:

  1. Convert your service, or create running script. Your running script/program should accept parameters that will link it to other services (like neo4j, or lionfish). You can also use Don Corleone API to discover services, but it is a tight coupling with Don Corleone, and it is adviced against.

  2. Add it to KNOWN_SERVICES in don_corleone_constants.py (will change)

  3. Create running, terminate, test scripts. See exemplary running scripts in scripts/ directory

  4. Create config file in config/ directory. See exemplary config files in config/ directory. Fields description:

    • params: list of lists, where are each list can 1 or 2 Those are the parameters passed to your program on start. You can set default to "do not pass", you can set the default value (list of length 2), or you can set the default value to another service's parameter value using anchor syntax (see examples)

Supported services along with exemplary configs

  1. Mantis setup local

This will run mantis tagging job on your computer. To add tagging sample data make sure Reuters is in data/ folder and add them using mantis_shrimp/scripts/import_dataset_rabbitmq.py

{
"master":"http://ocean-don.no-ip.biz/don",
"master_local_url":"http://127.0.0.1:8881",
"master_local":false,
"node_responsibilities":[
    ["lionfish_scala", {"port":7777, "host":"localhost", "local":true}],
    ["rabbitmq", {"host":"localhost", "local": true}],
    ["logstash", {"host":"localhost","local":true}],
    ["mantis_shrimp_master", {"host":"localhost", "host_master":"localhost", "config_path":"mantis_tagging_cluster_master.conf", "logging_strategy":"stderr", "local":true}],
    ["mantis_shrimp", {"host":"localhost", "host_master":"localhost", "port":3345, "config_path":"mantis_tagging_cluster_slave.conf", "logging_strategy":"stderr", "local":true}]
]
,

"node_id":"staszek",
"home":"/home/moje/Projekty/ocean/ocean",
"public_ssh_domain":"ocean-db.no-ip.biz",
"ssh-user":"staszek",
"ssh-port":2215
}

Debugging

Don Corleone acts as a filter for integration. It is supposed to fail often because it has hardly encoded paths and stuff like this (which might change in the future, but is hard to change because it has to know for instance where are stored particular services in the repository).

After running your node go to error_logs folder and check logs. Handy (unix) command is tail -f log_name. I will add logstash log fetching from this directory in the future.

Clone this wiki locally