You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* update modeling mixtral
* oups[13;2u
* fix
* better naming?
* compute softmax and top_k inside the experts
* update minamax as well
* models that will need an update
* more models that need a fix
* stash
* fix mixtral
* update olmoe
* update
* update
* current changes
* nits
* molmoe is now fixed
* olmoe is good to go!
* refactor qwen2_moe
* fixes
* fixed moe
* fix qwen2 modular
* nit
* qwen2_moie test script works
* tricky rope !
* fix qwen3
* DeepSeek v3 MoE Standardization (#40538)
* DeepSeek-v3
Shared
Shared
* Dependents of DS3
* Standardize GLM4V MoE (#40539)
* up
* Standardize VitPose's MoE (#40549)
* VitPose
* outside
* outside
* outside
* fix
* update dbrx
* dbrx... the magix
* Refactor Ernie 4.5's MoE (#40547)
* Isolate Ernie fixes
* fix moe
---------
Co-authored-by: Vasqu <antonprogamer@gmail.com>
* fix style
* style
* fix copies
* style
* latest changes
* fixes
* had to stage
* current updaters
* up
* another modular
* modular graniteMoe
* some update
* draft another modular moe
* updaters
* up
* fix nit
* q3 nit
* fix phi moe
* we're going up up up up its our mooooment
* fix switch transformers this time around
* up
* gptsan japanese is deprecated forget about it
* fix mixtral to not be a linear (gives us more freedom)
* update
* fix copies gone wrong try catch nothing
* fix mixtral
* new refactor again
* update aria as well
* up dbrx and deepseekv3
* nit
* fix phimoe?
* fix deepseek v3
* nits
* don't bother with this one please
* up olmoe
* ??
* fix olmoe
* yups
* fiupx
* ish
* hot patch
* new qwen3
* updates
* up
* nit
* fix copies
* fix
* nits
* we're going up up up
* nits
* switch_transformesr edge case
* lol modular gptsan?
* fix deepseek
* finally all modeling match modular
* update
* up
* up
* dang
* up
* up aria
* fix dbrx
* nits here and there
* finish fixing dbrx
* fix deepseek
* upd
* up
* fix flex olmo
* updated
* update jamba
* JAMBA is stil a bit todo
* forward forward
* fix dots11
* update
* fix hunyuan
* fix some other
* update phimoe
* fuck you phimoe you are now submitted
* submit granitemoe as well
* try to fix some other models, reduces some of the failures
* fix olmoe and qwem2moe
* up
* up
* fix qwen2_moe
* update modular make it again, simpler
* nits
* up
* up
* fix
* someswitch reductions
* up
* fix qwen3vl
* some fixes to jetmo
* these should be shipped to the modular to fix jetmoe
* fix most of the nllb failures
* more nllb fixes
* fix the modular
* remove nllb modular as it sucks for now
* ?
* fix granitemoe
* granitemoehybrid don't have rope
* use rope when rope, no rope when no rope
* updates
* finish fixing dumbgrainite
* fix most of minimax
* fix
* update modular
* ?
* up
* up jetmoe still broken
* up
* fix, now align the moe
* fix jetmoe
* fix styling and qwen3 repo consitency
* updatge
* up up
* update ruff?
* nits
* modeling is goot now for switch
* fix
* more fixses to switch!
* fix some siwtch test
* ?
* ?
* up
* fix switch modular!
* nit?
* uip
* subtest
* can't believe I wasted so much time on this...
* fix
* updates
* nits
* nit jamba is fucking annoying
* ?
* fix?
* oups
* good good
* styling
* up
* make sure qwen2 sliding works!
* fix dbrx small
* lol
* nits
* fix one test
* fix load balancing loss issue
* fix jamba
* fix nllbmoe
* fix jamba consistency and doc?
* up
* thse are correct
* up
* up
* up
* some of the final cleanup
* update
* up
* fix some revert in granimoe
* bring back attention multipliers for the granite family we'll see later on if they need removal
* small jamba fix docstring and typing
* fix phimoe
* yup
* fix unk returndict in granitemoes
* up
* fix qwen config
* fix phiemoe check quality
* nits
* update based on caught non relative imports!
* fix dbrx
* Apply suggestions from code review
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* fix copies
* fiuxp
* fix dot1 regression!
* fix phimoe issue
* fix phi moe
* fix float() for some models
* fix jamba regression
* ui
* more dtype issues
* fix deepseek2 and 3?
* proper update
* fix modular deepseek!
* jamba jambaaaaaa
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
0 commit comments