Skip to content

GIS-PuppetMaster/Duet

Repository files navigation

Duet: efficient and scalable hybriD neUral rElation undersTanding

This project is developed based on Naru's code:https://github.com/naru-project/naru, huge thanks to its authors

Prepare the Anaconda Environment

  1. We recommend Python 3.10.9 with Win10 or Ubuntu
  2. pip3 install -r requirements.txt

Install the sampling algorithm

python3 ./MySampler/setup.py

Dataset Prepare

  1. Download DMV dataset used by Naru:https://github.com/naru-project/naru
  2. Download Kddcup98 and Census used by UAE:https://github.com/pagegitss/UAE
  3. put Vehicle__Snowmobile__and_Boat_Registrations.csv, cup98.csv, census.csv into ./datasets

Workload Generation

  1. run python3 generate_all_workload_gpu.py
  2. run python3 generate_train_workload_gpu_npred.py for queries used to evaluate scalability
  • Note that 100000queries with seed42 is for training, and 2000queries with seed42 is In-Workload Queries for testing, and 2000queries with seed1234 is Random Queries for testing

train Duet

  • For DMV, run python3 train_model.py --num-queries=100000 --dataset=dmv --epochs=50 --warmups=12000 --bs=2048 --expand-factor=4 --layers=0 --direct-io --input-encoding=binary --output-encoding=one_hot --multi_pred_embedding=mlp --use-workloads --tag=dmv_mlp_binary_Workloads --gpu-id=0
  • For Kddcup98, run python3 train_model.py --num-queries=100000 --dataset=cup98 --epochs=50 --warmups=12000 --bs=100 --expand-factor=4 --layers=2 --fc-hiddens=128 --residual --direct-io --input-encoding=binary --output-encoding=one_hot --multi_pred_embedding=mlp --use-workloads --tag=cup98_mlp_binary_Workloads --gpu-id=0
  • For Census, run python3 train_model.py --num-queries=100000 --dataset=census --epochs=50 --warmups=12000 --bs=100 --expand-factor=4 --layers=2 --fc-hiddens=128 --residual --direct-io --input-encoding=binary --output-encoding=one_hot --multi_pred_embedding=mlp --use-workloads --tag=census_mlp_binary_Workloads --gpu-id=0
  • For Duet's data-driven version, remove the --use-workloads option

evaluate Duet

Scalability

  • run python3 run_eval_npred.py
  • run python3 draw_nfilter_curve.py to draw the scalability plot with the same format as our paper

Accuracy

  • We give the code to evaluate the error of all epochs and the result of the epoch when the model achieve minium loss
  • Take DMV as example, run python3 eval_model.py --dataset=dmv --load_queries=dmv-2000queries-oracle-cards-seed1234.pkl --glob=dmv-16.3MB-data19.550-made-hidden512_256_512_128_1024-emb32-directIo-binaryInone_hotOut-inputNoEmbIfLeq-mlp-seed0 --layers=0 --direct_io --input_encoding=binary --output_encoding=one_hot --multi_pred_embedding=mlp --tag=dmv_mlp_binary_noWorkloads --gpu_id=0 --end_epoch=50
  • For the option load_queries, change the seed from 1234 to 42 to switch workloads from Random Queries to In-Workload Queries of the test workload
  • For the option glob, use the model's name as the format above
  • For the rest options, set them according to the training options above

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors