In h3, we should pass batch_first=True to TransformerEncoderLayer

As described in the title. 
Torch by default uses `batch_first=False` for the `TransformerEncoderLayer`, resulting in high training loss and test error.

Using default `batch_first=False`
![Image](https://github.com/user-attachments/assets/1db26e59-bd6c-4e27-a22a-470273ca5cda)
![Image](https://github.com/user-attachments/assets/adbb624c-20d7-406c-90ea-9ef92120f829)

Passing `batch_first=True`
![Image](https://github.com/user-attachments/assets/6af2ff23-ae4f-4862-af40-e790b6e68b3a)
![Image](https://github.com/user-attachments/assets/d5754df3-32f3-42e7-aa78-ff906af295df)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In h3, we should pass batch_first=True to TransformerEncoderLayer #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

In h3, we should pass batch_first=True to TransformerEncoderLayer #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions