Hi!
After seeing some issues related to OOM errors due to high prompt length, I was wondering if an option to generate sequences with Evo using higher prompts (>1kb and so on) would be, rather than GPU sharding, to decrease the float precision?
- I believe that currently it is set to
float16 (as in model.backbone = model.backbone.to(torch.bfloat16) from the generation_to_folding.py script), but would float8 be an option (is it anyhow compatible with Evo?)?
- If so, do you expect a big decrease in generation performance, or do you already have data representing precision vs performance?
Thanks so much!
Hi!
After seeing some issues related to OOM errors due to high prompt length, I was wondering if an option to generate sequences with Evo using higher prompts (>1kb and so on) would be, rather than GPU sharding, to decrease the float precision?
float16(as inmodel.backbone = model.backbone.to(torch.bfloat16)from thegeneration_to_folding.pyscript), but wouldfloat8be an option (is it anyhow compatible with Evo?)?Thanks so much!