-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
Utilizing DeepSpeed requires model.hidden_size to be available to use auto values in zero optimization for zero.reduce_bucket_size. I'm guessing that config.decoder_embed_dim is the hidden_size.
So we'd just need to add the following to model.init
def __init__(self, config: RetNetConfig, embed_tokens: nn.Embedding = None):
super().__init__(config)
self.config = config
self.dropout_module = torch.nn.Dropout(config.dropout)
self.embed_dim = config.decoder_embed_dim
self.embed_scale = 1.0 if config.no_scale_embedding else math.sqrt(self.embed_dim)
## NEW CODE FOR DEEPSPEED
self.hidden_size = config.decoder_embed_dim
## NEW CODE FOR DEEPSPEED
Metadata
Metadata
Assignees
Labels
No labels