Decrease model precision

Hi! 

After seeing some issues related to OOM errors due to high prompt length, I was wondering if an option to generate sequences with Evo using higher prompts (>1kb and so on) would be, rather than GPU sharding, to decrease the float precision? 

* I believe that currently it is set to `float16` (as in `model.backbone = model.backbone.to(torch.bfloat16)` from the `generation_to_folding.py` script), but would `float8` be an option (is it anyhow compatible with Evo?)? 
* If so, do you expect a big decrease in generation performance, or do you already have data representing precision vs performance? 

Thanks so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decrease model precision #119

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Decrease model precision #119

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions