Skip to content

Antipode-SOSP23/antipode-train-ticket

Repository files navigation

Antipode @ TrainTicket

In this repo you will find found how Antipode fixes one of the inconsistencies found on TrainTicket benchmark.

Developed as a testbed for replicating industrial faults, the TrainTicket benchmark is a microservice-based application that provides typical ticket booking functionalities, such as ticket reservation and payment. It is implemented in Java and it consists of more than 40 services including web servers, datastores and queues.

We focus on the F1 error (details on Xiang et al paper) that occurs when a user cancels a ticket. This operation is split into two tasks: (a) changing the status of the ticket to be cancelled, and (b) refunding the ticket price to the client.

These events are performed by different services. A violation happens when the refund (b) is delayed, which results in the customer not seeing the refunded amount right away. This scenario is identified by the fault analysis survey performed by the benchmark authors: " using asynchronous tasks within a request might result in events being processed in a different order, which might lead to incorrect application behavior".

Antipode was added by placing a barrier before returning the cancellation output to the user.

Prerequisites

You need to install the following dependencies before runnning:

  • Python 3.7+
  • The requisites for our maestro script. Simply run pip install -r requirements.txt
  • Docker
  • Pull submodules: git submodule update --init --recursive

Prerequesites for Local deployment:

  • Docker Swarm

Prerequesites for GCP deployment:

Optional dependencies:

Usage

All the deployment is controlled by our ./maestro script. Type ./maestro -h to look into all the commands and option, and also each command comes with its own help, e.g. ./maestro run -h.

Local Deployment

This environment is mainly used for development and debugging.

In preparation

GCP Deploymnet

Make sure you have created a Google Cloud Platform (GCP) account and have setup a new project. Go to the config.yml file and add your GCP project and user information, for example:

gcp:
  project_id: antipode
  default_ssh_user: jfloff

Set the following firewall rules on GCP web console (maestro will warn you of any missing keys):

  • portainer
    • IP Addresses: 0.0.0.0/0
    • TCP ports: 9000,8000,9090,9091,9100
  • swarm
    • IP Addresses: 0.0.0.0/0
    • TCP ports: 2376,2377,7946
    • UDP ports: 7946,4789
  • nodes
    • IP Addresses: 0.0.0.0/0
    • TCP ports: 3306,9001,5672,6379,8080,11178,11188,12031,12032,12340,12342,12345,12346,12347,12386,12862,14322,14567,14568,14569,14578,15672,15678,15679,15680,15681,16101,16108,16110,16111,16112,16113,16114,16115,16346,16579,16686,17001,17853,18673,18767,18855,18856,18885,18886,18888,18898,19001,2003,8089,5557
    • UDP ports: 5775,6831,6832

Then run maestro to build and deploy the GCP instances:

./maestro --gcp build
./maestro --gcp deploy -config CONFIG_FILE -clients NUM_CLIENTS

You can either build your own deployment configuration file, or you one already existing. For instance, for SOSP'23 plots you should use the configs/sosp23.yml config and 1 as number of clients.

After deploy is done you can start the TrainTicket services with:

./maestro --gcp run -antipode

In order to run the original TrainTicket application remove the -antipode parameter.

With our services started we run the workload:

./maestro --gcp wkld -d DURATION_SEC -t NUM_THREADS

For instance, we can the workload for 300 seconds (300) with 12 concurrent threads. At the end the wkld command will make available in the gather folder the evaluation results. The gather folder is key for plotting after (see ahead).

At the end, you can clean your experiment and destroy your GCP instance with:

./maestro --gcp clean -strong

In order to keep your GCP instances and just undeploy undeploy the TrainTicket application, remove the -strong parameter.

Although maestro run a single deployment end-to-end, in order to run all the necessary workloads for plots you need to repeat these steps for several different combinations. To ease that process we provide maestrina, a convenience script that executes all combinations of workloads to plot after. We pre-populate maestrina with the combinations and rounds configurations for the SOSP'23 plots There might be instances where maestrina is not able to run a specific endpoint, and in those scenarios you might need to rerun ou run maestro individually -- which is always the safest method.

There are other commands available (for details do -h), namely:

  • ./maestro --gcp seed that seeds targeted datastores with random data. This does not need to run everytime because we already generated a seed and restore it before running the workload.
  • ./maestro --gcp info that has multiple options from links to admin panels, logs and others
  • ./maestro --gcp ssh which is able to connect to multiple GCP instance via ssh

Plots

The first step to build plots is making sure that you have all the datapoints first. After that you need to set those datapoints into a plot config file, similar to the one provided at plots/configs/sample.yml.

With a configuration set with your datapoint, you simply run:

./plot CONFIG --plots throughput_latency_with_consistency_window

To generate a throughput/latency plot in combination with our consistency window metric. There is also only an individual throughput/latency plot with the key throughput_latency. For more information type ./plot -h

Paper References

João Loff, Daniel Porto, João Garcia, Jonathan Mace, Rodrigo Rodrigues
Antipode: Enforcing Cross-Service Causal Consistency in Distributed Applications
SOSP 2023.
Paper

Xiang Zhou, Xin Peng, Tao Xie, Jun Sun, Chao Ji, Wenhai Li, and Dan Ding
Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study
IEEE Transactions on Software Engineering 2021
Paper

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors