-
Notifications
You must be signed in to change notification settings - Fork 107
[Step1 ]new architecture for auto_round #1542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
n1ck-guo
wants to merge
93
commits into
main
Choose a base branch
from
hengguo/new_ar_arch
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
93 commits
Select commit
Hold shift + click to select a range
7698b93
init
n1ck-guo 75b4141
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo ca17097
update
n1ck-guo a092e37
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo cec4ce4
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo e265b8f
update
n1ck-guo 868a82d
merge main
n1ck-guo 9dc930c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 70a2d02
add switch
n1ck-guo 5998d44
code scan
n1ck-guo 9412596
Merge branch 'hengguo/new_ar_arch' of https://github.com/intel/auto-r…
n1ck-guo 394dcdd
fix
n1ck-guo 7024cad
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo 36daba0
fix
n1ck-guo 6feed99
fix
n1ck-guo 7bd3e62
fix qweight
n1ck-guo 9b14918
fix ut and refactor code
n1ck-guo 2ab9b51
fix ut
n1ck-guo dd5aec7
fix
n1ck-guo d65f1eb
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo bde95c6
fix merge
n1ck-guo 7b4e479
fix
n1ck-guo 9b4cab7
update
n1ck-guo b602e00
merge main
n1ck-guo a1fe717
sync merge change
n1ck-guo b58d55a
fix
n1ck-guo 6a7ac60
fix ut
n1ck-guo 64d4a57
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo b753bab
decoupling quantization and refactor hadamard
n1ck-guo b32bc68
support multi rotation
n1ck-guo dbd1ab0
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo f4da8be
sync compressors_new: add is_dynamic_afp8, is_block_wfp8, _get_safete…
n1ck-guo 75a472a
merge main
n1ck-guo 01f6871
fix
n1ck-guo 53bef7c
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo 20ade76
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo 41e75bd
fix
n1ck-guo 92139d6
fix
n1ck-guo 166b5b6
fix output dir
n1ck-guo 31b2d2b
update by comment
n1ck-guo 4490a17
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo fdc92c2
update
n1ck-guo fb04613
fix
n1ck-guo 4588279
fix by comment
n1ck-guo a313c26
fix output_dir
n1ck-guo 19f95ed
fix
n1ck-guo 29d2b64
fix
n1ck-guo bfec842
merge
n1ck-guo 1c9e529
fix
n1ck-guo 7e7fdeb
fix vlm ut
n1ck-guo 4a035fb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 463bb6c
fix ut
n1ck-guo f5d6ff4
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo 755ab4e
sync merge
n1ck-guo d661e0b
fix by comment
n1ck-guo 7a80deb
merge
n1ck-guo 08770cf
fix
n1ck-guo 709269a
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo 97b89dd
fix
n1ck-guo 0025256
performance
n1ck-guo 1831126
fix
n1ck-guo 8873eca
fix
n1ck-guo a1a4244
fix
n1ck-guo bd75536
preformance
n1ck-guo f2940bd
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo e4ce420
sync
n1ck-guo 1286749
fix
n1ck-guo 9914306
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo 5c212b5
performance
n1ck-guo 550158b
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo 4806d5a
performance
n1ck-guo 1f1fbd9
fix
n1ck-guo e4fdfe6
update
n1ck-guo ec45a1c
fix: skip compile_func for FP8_STATIC on HPU + trim malloc at ModelCo…
n1ck-guo 3cd3c73
fix(memory): reduce peak RSS for new arch via forced malloc_trim and …
n1ck-guo 5d4a85d
merge main
n1ck-guo 14a59db
fix(memory): reduce peak RAM via deferred ShardWriter, intermediate G…
n1ck-guo 29969c8
fix
n1ck-guo 654c733
merge main
n1ck-guo c7f21a7
update
n1ck-guo 72c04f9
Merge branch 'main' of https://github.com/intel/auto-round into hengg…
n1ck-guo 028bb06
sync hadamard transform changes from main branch to new architecture
n1ck-guo 451008b
fix sglang test: switch OPT to Qwen3-0.6B to avoid fused qkv_proj reg…
n1ck-guo 8bdf054
fix: invalidate compiled block forward cache on block change; guard l…
n1ck-guo 8f58039
Merge origin/main: resolve conflicts in test files
n1ck-guo d067b53
sync 8d7bb84c to new arch: enable immediate_saving for nv_fp/mx_fp, f…
n1ck-guo 8d2f341
merge main
n1ck-guo cec0958
fix ut
n1ck-guo 459435c
fix HPU FP8_STATIC peak RAM: disable eager pipeline in new-arch, clea…
n1ck-guo 463f58d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6cbb156
test: add unit tests for HPU FP8_STATIC eager pipeline guard
n1ck-guo 9f88982
fix ut
n1ck-guo 50beadc
Merge branch 'hengguo/new_ar_arch' of https://github.com/intel/auto-r…
n1ck-guo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| # Copyright (c) 2026 Intel Corporation | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| # Copyright (c) 2026 Intel Corporation | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
|
|
||
| class AlgConfig: | ||
|
n1ck-guo marked this conversation as resolved.
n1ck-guo marked this conversation as resolved.
|
||
| def __init__(self): | ||
| pass | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| # Copyright (c) 2026 Intel Corporation | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
|
|
||
| class BaseAlgorithm: | ||
|
n1ck-guo marked this conversation as resolved.
|
||
| pass | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| # Copyright (c) 2026 Intel Corporation | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from auto_round.algorithms.quantization.base import BaseQuantizers | ||
| from auto_round.algorithms.quantization.config import QuantizationConfig | ||
| from auto_round.algorithms.quantization.sign_round.config import SignRoundConfig | ||
| from auto_round.algorithms.quantization.sign_round.quantizer import SignRoundQuantizer | ||
| from auto_round.algorithms.quantization.adam_round.adam import AdamRoundQuantizer | ||
| from auto_round.algorithms.quantization.rtn.config import RTNConfig | ||
| from auto_round.algorithms.quantization.rtn.quantizer import RTNQuantizer, OptimizedRTNQuantizer |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| # Copyright (c) 2026 Intel Corporation | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # Copyright (c) 2026 Intel Corporation | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| from typing import Union | ||
|
|
||
| import torch | ||
|
|
||
| from auto_round.algorithms.quantization.sign_round.quantizer import SignRoundQuantizer | ||
| from auto_round.schemes import QuantizationScheme | ||
| from auto_round.utils import check_is_cpu, htcore, is_hpex_available | ||
|
|
||
|
|
||
| class AdamRoundQuantizer(SignRoundQuantizer): | ||
|
|
||
| def __init__(self, config): | ||
| super().__init__(config) | ||
| self.momentum = None # AdamW handles momentum internally | ||
|
|
||
| def _get_optimizer(self, optimizer): | ||
| if optimizer is None: | ||
| optimizer = torch.optim.AdamW | ||
| elif isinstance(optimizer, str): | ||
| optimizer = getattr(torch.optim, optimizer) | ||
| else: | ||
| optimizer = optimizer | ||
| return optimizer | ||
|
|
||
| def _get_scaler(self): | ||
| scaler = None | ||
| if self.model_context.amp and not check_is_cpu(self.compress_context.device): | ||
| from torch.cuda.amp import GradScaler | ||
|
|
||
| scaler = GradScaler(init_scale=1024, growth_interval=100000) | ||
| return scaler | ||
|
|
||
| def _scale_loss_and_backward(self, scaler, loss): | ||
| if scaler is not None: | ||
| loss = scaler.scale(loss) | ||
|
|
||
| loss.backward() | ||
| if is_hpex_available(): | ||
| htcore.mark_step() | ||
| return loss | ||
|
|
||
| def _step(self, scaler, optimizer, lr_schedule): | ||
| if scaler is not None: | ||
| scaler.step(optimizer) | ||
| optimizer.zero_grad() | ||
| lr_schedule.step() | ||
| scaler.update() | ||
| else: | ||
| optimizer.step() | ||
| optimizer.zero_grad() | ||
| lr_schedule.step() | ||
| if is_hpex_available(): | ||
| htcore.mark_step() |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this folder?