refactor(quantization): modularize linear.py and loader.py, add FP8 K…#31
Closed
luozixin2 wants to merge 1 commit intoSJTU-DENG-Lab:mainfrom
Closed
refactor(quantization): modularize linear.py and loader.py, add FP8 K…#31luozixin2 wants to merge 1 commit intoSJTU-DENG-Lab:mainfrom
luozixin2 wants to merge 1 commit intoSJTU-DENG-Lab:mainfrom
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…V cache support
Major refactoring and feature additions:
Code Modularization:
FP8 KV Cache Support:
Compatibility Fixes:
Cleanup:
Net reduction: ~1200 lines of cleaner, more maintainable code
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Refactor