changes to use hs2 interface and to run suite in a loop#5
Conversation
|
| echo "Completed Running PerfData Collection Scripts" | ||
|
|
||
| zip -r $BENCH_HOME/$BENCHMARK/PerfData.zip $PERFDATA_OUTPUTDIR | ||
| zip -r $BENCH_HOME/$BENCHMARK/PerfData_$RUN_ID.zip $PERFDATA_OUTPUTDIR |
There was a problem hiding this comment.
We currently Zip full path in the zip (e.g. home/hdiuser/hive-testbench/PerfData_2/pat/tpch_query_2/.... ). Can we correct the zipping to not include the unnecessary /hdiuser/hive-testbench/ ?
| chmod -R 777 $RESULT_DIR | ||
|
|
||
| LOG_DIR=$BENCH_HOME/$BENCHMARK/logs/ | ||
| LOG_DIR=$BENCH_HOME/$BENCHMARK/logs_$RUN_ID/ |
There was a problem hiding this comment.
Can we include everything about one run under a single dir?
|
|
||
| RESULT_DIR=$BENCH_HOME/$BENCHMARK/results/ | ||
| RESULT_DIR=$BENCH_HOME/$BENCHMARK/results_$RUN_ID/ | ||
|
|
There was a problem hiding this comment.
Can we include everything about one run under a single dir?
| @@ -0,0 +1,22 @@ | |||
| #!/bin/bash | |||
| #usage: ./RunSingleQueryLoop QUERY_NUMBER REPEAT_COUNT SCALCE_FACTOR CLUSTER_SSH_PASSWORD | |||
|
|
||
| PLAN_DIR=$BENCH_HOME/$BENCHMARK/plans/ | ||
| PLAN_DIR=$BENCH_HOME/$BENCHMARK/plans_$RUN_ID/ | ||
|
|
There was a problem hiding this comment.
same as above. Under single dir?
| fi | ||
|
|
||
| timeout ${TIMEOUT} hive -i ${HIVE_SETTING} --database ${DATABASE} -d EXPLAIN="" -f ${QUERY_DIR}/tpch_query${2}.sql > ${RESULT_DIR}/${DATABASE}_query${j}.txt 2>&1 | ||
| beeline -u ${CONNECTION_STRING} -i ${HIVE_SETTING} --hivevar EXPLAIN="" -f ${QUERY_DIR}/tpch_query${2}.sql > ${RESULT_DIR}/${DATABASE}_query${j}.txt 2>&1 |
There was a problem hiding this comment.
nit: extra space at the start.
|
|
||
| hive -d DB=${DATABASE} -f gettpchtablecounts.sql > ${STATS_DIR}/tablecounts_${DATABASE}.txt ; | ||
| hive -d DB=${DATABASE} -f gettpchtableinfo.sql >> ${STATS_DIR}/tableinfo_${DATABASE}.txt ; | ||
| CONNECTION_STRING="jdbc:hive2://localhost:10001/${DATABASE};transportMode=http" |
There was a problem hiding this comment.
Will this work in case of failover?
| if [ $? -ne 0 ]; then | ||
| echo "Generating data at scale factor $SCALE." | ||
| (cd tpch-gen; hadoop jar target/*.jar -d ${DIR}/${SCALE}/ -s ${SCALE}) | ||
| (cd tpch-gen; hadoop jar target/*.jar -D mapreduce.map.memory.mb=8192 -d ${DIR}/${SCALE}/ -s ${SCALE}) |
There was a problem hiding this comment.
We should not hard code settings here. May be have a global variable or something if you really want.
| runcommand "hive -i settings/load-flat.sql -f ddl-tpch/bin_flat/alltables.sql -d DB=tpch_text_${SCALE} -d LOCATION=${DIR}/${SCALE}" | ||
|
|
||
| DATABASE=tpch_text_${SCALE} | ||
| CONNECTION_STRING="jdbc:hive2://localhost:10001/$DATABASE;transportMode=http" |
There was a problem hiding this comment.
Same as above.
Also, may be we should have all of these settings in a config file rather than repeating it everytime. This is prone to error.
No description provided.