-
Notifications
You must be signed in to change notification settings - Fork 107
Description
Hi,
Thanks for this great work. I am having an issue understanding how the parallel decoding is enabled for LLADA.
In the eval_gsm8k.sh, this is the suggested command for dual cache+parallel.
dual cache+parallel
accelerate launch eval_llada.py --tasks ${task} --num_fewshot ${num_fewshot}
--confirm_run_unsafe_code --model llada_dist
--model_args model_path=${model_path},gen_length=${length},steps=${length},block_length=${block_length},use_cache=True,dual_cache=True,threshold=0.9,show_speed=True
In eval.md,
accelerate launch eval_llada.py --tasks ${task} --num_fewshot ${num_fewshot}
--confirm_run_unsafe_code --model llada_dist
--model_args model_path='GSAI-ML/LLaDA-8B-Instruct',gen_length=${length},steps=${steps},block_length=${block_length},use_cache=True,dual_cache=True,threshold=0.9,show_speed=True
To the best of my knowledge, the steps should enable parallel decoding. However, when I do steps=${steps} for Llada, the generation makes no sense. Do you have any suggestions in this matter? Thank you so much!