[Bug]: cannot quantize models larger than available VRAM using v0.10.0 unless low_gpu_mem_usage = True

### Problem Description

With release V0.9.7, I was able to begin quantization of google/gemma-3-27b-it on a 48GB GPU (RTX 6000 Ada) despite the model not fitting entirely into VRAM. Now, the exact same call in v0.10.0 fails when the model is moving to CPU:

```python
2026-02-12 15:00:32 INFO autoround.py L165: using MLLM mode for multimodal model.
Loading weights: 100%|████████████████| 1247/1247 [00:00<00:00, 2167.15it/s, Materializing param=model.vision_tower.vision_model.post_layernorm.weight]
2026-02-12 15:01:03 INFO base.py L486: using torch.bfloat16 for quantization tuning
2026-02-12 15:01:03 WARNING formats.py L154: some layers are skipped quantization (shape not divisible by 32).
2026-02-12 15:01:03 INFO base.py L1739: start to cache block inputs
Traceback (most recent call last):
  File "/home/daved/project/code/convert_autoround.py", line 43, in <module>
    ar.quantize_and_save(output_dir = quant_path, format = "auto_round")
  File "/home/daved/miniforge3/envs/autoround/lib/python3.12/site-packages/auto_round/compressors/base.py", line 949, in quantize_and_save
    model, _ = self.quantize()
               ^^^^^^^^^^^^^^^
  File "/home/daved/miniforge3/envs/autoround/lib/python3.12/site-packages/auto_round/compressors/base.py", line 1740, in quantize
    all_inputs = self.try_cache_inter_data_gpucpu(all_first_block_names, self.nsamples, layer_names=layer_names)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/daved/miniforge3/envs/autoround/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/daved/miniforge3/envs/autoround/lib/python3.12/site-packages/auto_round/compressors/base.py", line 2209, in try_cache_inter_data_gpucpu
    new_max_memory[device] = max_memory[device] * 0.9
                             ~~~~~~~~~~^^^^^^^^
KeyError: 0
```

The error occurs here: https://github.com/intel/auto-round/blob/81c8ee4262dd0094d736545deac0af8a393ecfb7/auto_round/compressors/base.py#L2190

The cause is due to a mismatch between the list of devices here:
https://github.com/intel/auto-round/blob/81c8ee4262dd0094d736545deac0af8a393ecfb7/auto_round/compressors/base.py#L2172

And the max memory produced here: 
https://github.com/intel/auto-round/blob/81c8ee4262dd0094d736545deac0af8a393ecfb7/auto_round/compressors/base.py#L2174

The former includes device 0 for the GPU, whereas the latter only picks up the CPU. Hence, there is no 0 key on the later line:

```python
['cuda:0'] # devices
{'cpu': 143219032064} # max_memory
```

By manually setting `low_gpu_mem_usage` to `True`, this `KeyError` can be avoided, but this is suboptimal since it leads to performance reductions. Note that correcting the max_memory to include the gpu (`'0'`) led to additional (different) errors.

### Reproduction Steps

1. Run this command:
auto-round \
    --model model \
    --scheme "W4A16" \
    --format "auto_round"
2. Where model is any model larger than available VRAM (e.g., google/gemma-3-27b-it for a 48GB GPU)
3. Run as shown
4. See error above

### Environment Information

- OS: Ubuntu 24.04
- Python version: 3.12
- AutoRound version: v0.10.0
- Hardware: RTX 6000 Ada 48GB

### Error Logs

```shell

```

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: cannot quantize models larger than available VRAM using v0.10.0 unless low_gpu_mem_usage = True #1451

Problem Description

Reproduction Steps

Environment Information

Error Logs

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: cannot quantize models larger than available VRAM using v0.10.0 unless low_gpu_mem_usage = True #1451

Description

Problem Description

Reproduction Steps

Environment Information

Error Logs

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions