Skip to content

Apply activation before gating in MLP layer#4

Open
souvikshanku wants to merge 1 commit intodevvrit:mainfrom
souvikshanku:main
Open

Apply activation before gating in MLP layer#4
souvikshanku wants to merge 1 commit intodevvrit:mainfrom
souvikshanku:main

Conversation

@souvikshanku
Copy link
Copy Markdown

The current MLP implementation applies the activation function after the element-wise multiplication of gate and up projections, which is inconsistent with how gated activation is usually applied (e.g., see here). This fix applies the activation before multiplication which is the intended behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant