Duet: efficient and scalable hybriD neUral rElation undersTanding

This project is developed based on Naru's code:https://github.com/naru-project/naru, huge thanks to its authors

Prepare the Anaconda Environment

We recommend Python 3.10.9 with Win10 or Ubuntu
pip3 install -r requirements.txt

Install the sampling algorithm

python3 ./MySampler/setup.py

Dataset Prepare

Download DMV dataset used by Naru:https://github.com/naru-project/naru
Download Kddcup98 and Census used by UAE:https://github.com/pagegitss/UAE
put Vehicle__Snowmobile__and_Boat_Registrations.csv, cup98.csv, census.csv into ./datasets

Workload Generation

run python3 generate_all_workload_gpu.py
run python3 generate_train_workload_gpu_npred.py for queries used to evaluate scalability

Note that 100000queries with seed42 is for training, and 2000queries with seed42 is In-Workload Queries for testing, and 2000queries with seed1234 is Random Queries for testing

train Duet

For DMV, run python3 train_model.py --num-queries=100000 --dataset=dmv --epochs=50 --warmups=12000 --bs=2048 --expand-factor=4 --layers=0 --direct-io --input-encoding=binary --output-encoding=one_hot --multi_pred_embedding=mlp --use-workloads --tag=dmv_mlp_binary_Workloads --gpu-id=0
For Kddcup98, run python3 train_model.py --num-queries=100000 --dataset=cup98 --epochs=50 --warmups=12000 --bs=100 --expand-factor=4 --layers=2 --fc-hiddens=128 --residual --direct-io --input-encoding=binary --output-encoding=one_hot --multi_pred_embedding=mlp --use-workloads --tag=cup98_mlp_binary_Workloads --gpu-id=0
For Census, run python3 train_model.py --num-queries=100000 --dataset=census --epochs=50 --warmups=12000 --bs=100 --expand-factor=4 --layers=2 --fc-hiddens=128 --residual --direct-io --input-encoding=binary --output-encoding=one_hot --multi_pred_embedding=mlp --use-workloads --tag=census_mlp_binary_Workloads --gpu-id=0
For Duet's data-driven version, remove the --use-workloads option

evaluate Duet

Scalability

run python3 run_eval_npred.py
run python3 draw_nfilter_curve.py to draw the scalability plot with the same format as our paper

Accuracy

We give the code to evaluate the error of all epochs and the result of the epoch when the model achieve minium loss
Take DMV as example, run python3 eval_model.py --dataset=dmv --load_queries=dmv-2000queries-oracle-cards-seed1234.pkl --glob=dmv-16.3MB-data19.550-made-hidden512_256_512_128_1024-emb32-directIo-binaryInone_hotOut-inputNoEmbIfLeq-mlp-seed0 --layers=0 --direct_io --input_encoding=binary --output_encoding=one_hot --multi_pred_embedding=mlp --tag=dmv_mlp_binary_noWorkloads --gpu_id=0 --end_epoch=50
For the option load_queries, change the seed from 1234 to 42 to switch workloads from Random Queries to In-Workload Queries of the test workload
For the option glob, use the model's name as the format above
For the rest options, set them according to the training options above

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
MySampler		MySampler
common.py		common.py
datasets.py		datasets.py
draw_nfilter_curve.py		draw_nfilter_curve.py
estimators.py		estimators.py
eval_model.py		eval_model.py
generate_all_workload_gpu.py		generate_all_workload_gpu.py
generate_train_workload_gpu_npred.py		generate_train_workload_gpu_npred.py
made.py		made.py
readme.md		readme.md
requirements.txt		requirements.txt
run_eval_npred.py		run_eval_npred.py
train_model.py		train_model.py
transformer.py		transformer.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Duet: efficient and scalable hybriD neUral rElation undersTanding

Prepare the Anaconda Environment

Install the sampling algorithm

Dataset Prepare

Workload Generation

train Duet

evaluate Duet

Scalability

Accuracy

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Duet: efficient and scalable hybriD neUral rElation undersTanding

Prepare the Anaconda Environment

Install the sampling algorithm

Dataset Prepare

Workload Generation

train Duet

evaluate Duet

Scalability

Accuracy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages