Note we updated the code so that it pulls the data from hugging face. It makes the usability a bit better.
See slurm instructions below for how we parallelized the generation and evaluation.
pip install -r requirements.txt
python3 generate_gpt_codes.py --save /path/to/save_dir
python3 test_one_solution.py --save /path/to/save_dir
# because the above may fail on account of poorly generated python programs
# we suggest to run a for loop for each problem index against the "all_codes.json"
for i in {0..#Num_Problems#} ; do
python3 test_one_solution.py --save /path/to/save_dir -i $i ;
done
The above will output the accuracy but to run it again once the evaluations have completed execute the line below:
python3 test_one_solution.py --save /path/to/save_dir --print_results
python3 eval_bleu.py --save /path/to/save_dir
Note: Third step does not depend on the second step.
Need to modify the path to apps in submit_all_jobs.sh to point to the evaluation folder and any other paths in that file as necessary. Install the requirements.txt file if you haven't already.
cd sbatch
bash submit_all_jobs.sh
Once completed we provide a utility function to combine all of the smaller files into one larger file for ease of processing.
python3 merge_codes.py --root /path/to/slurm/save_files.json
After the smaller files are combined we can view our accuracy with the following:
python3 test_one_solution.py --save /path/to/save_dir --print_results