Adds support for int8 w8a8_gemlite quantization by anm-ol · Pull Request #34 · Overworldai/world_engine

anm-ol · 2026-03-20T07:04:31Z

No description provided.

lapp0

Nice work, requesting some cleanup changes. Please merge latest wp-1.5 first.

lapp0 · 2026-03-20T17:21:30Z

examples/benchmark.py

+        # "quant": "int8_weights",
+        "quant": None,
+        "taehv_ae": True,
+    }


You can run with,

MODEL_URI="Overworld-Models/MR160k" pytest ./examples/benchmark.py

no need to specify all these overrides

I recommend updating MODEL_OVERRIDES
to

MODEL_OVERRIDES = [ {}, # default {"quant": "intw8a8"}, ]

lapp0 · 2026-03-20T17:25:41Z

examples/gen_sample.py

+
 frame = cv2.imdecode(np.frombuffer(urllib.request.urlopen(url).read(), np.uint8), cv2.IMREAD_COLOR)
-engine.append_frame(torch.from_numpy(np.repeat(frame[None], 4, axis=0)))
+frame = cv2.resize(frame, (1024, 512))[:, :, ::-1]


No resize needed after #33 merged

lapp0 · 2026-03-20T17:26:40Z

examples/gen_sample.py

+                     device="cuda")
+
+total_linear_params = sum(mod.weight.numel() for _, mod in engine.model.named_modules() if isinstance(mod, torch.nn.Linear))
+print(f"Total linear layer parameters: {total_linear_params:,}")


No need to update gen_sample.py. Could you document the available quants in a brief section in README.md though?

Got it, should quant be included by default in gen_sample.py

# Create inference engine engine = WorldEngine(sys.argv[1], quant="intw8a8", device="cuda")

Or

# Create inference engine engine = WorldEngine(sys.argv[1], quant=None, device="cuda")

lapp0 · 2026-03-20T17:27:13Z

src/quantize.py

+try:
+    from lmdeploy.pytorch.models.q_modules import QLinear
+except ImportError:
+    QLinear = None


Only gemlite works, let's only use it and call it intw8a8 or similar please.

lapp0 · 2026-03-20T17:27:38Z

src/world_engine.py

 from .ae import get_ae
 from .patch_model import apply_inference_patches
-from .quantize import quantize_model
+from .quantize import quantize_model, apply_ptq_model, apply_qat


imports don't exist

lapp0 · 2026-03-20T17:28:07Z

pyproject.toml

  "torchvision==0.25.0",
  "torchaudio==2.10.0",
+  "torchao==0.16.0",
+  "flashinfer-python==0.6.6",


both out of scope for PR

lapp0 · 2026-03-20T17:28:34Z

pyproject.toml

+  "torchao==0.16.0",
+  "flashinfer-python==0.6.6",
  "fbgemm-gpu-genai==1.5.0; sys_platform == 'linux'",
+  "gemlite==0.5.1.post1"


Can you check if this works on Windows?

anm-ol requested a review from lapp0 March 20, 2026 07:04

lapp0 requested changes Mar 20, 2026

View reviewed changes

anm-ol force-pushed the quantization branch from e2e5345 to 7e648f6 Compare March 20, 2026 18:04

anm-ol added 17 commits March 20, 2026 23:34

add torchao quantize_

bce991d

testing

5a5c760

testing yes

d2729ee

use taehv overide

f1d6bd1

yuh

9211de4

add apply qat

100b870

yuh

2a6e075

uh

2ba1a9e

enable int4 benchmarking and inference

d286e3a

apply quantize_model w8a8

4ce7bce

add int8 ptq

79865a4

quant none

38532e5

int8 gemlite implementation

c3f74b8

clean up, remove torchao quantization

7234992

add gemlite to requirements

57c93c5

remove unused quant kernels and imports

6cbb9e7

restore gen_sample.py, more cleanup

0193243

anm-ol force-pushed the quantization branch from 7e648f6 to 0193243 Compare March 20, 2026 18:07

anm-ol added 2 commits March 20, 2026 23:48

update readme with Quantization docs

81dc84f

fixed requirements gemlite

a8fbdac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds support for int8 w8a8_gemlite quantization#34

Adds support for int8 w8a8_gemlite quantization#34
anm-ol wants to merge 19 commits intowp-1.5from
quantization

anm-ol commented Mar 20, 2026

Uh oh!

lapp0 left a comment •

edited

Loading

Uh oh!

lapp0 Mar 20, 2026

Uh oh!

lapp0 Mar 20, 2026

Uh oh!

lapp0 Mar 20, 2026

Uh oh!

anm-ol Mar 20, 2026

Uh oh!

lapp0 Mar 20, 2026

Uh oh!

lapp0 Mar 20, 2026

Uh oh!

lapp0 Mar 20, 2026

Uh oh!

lapp0 Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anm-ol commented Mar 20, 2026

Uh oh!

lapp0 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lapp0 left a comment •

edited

Loading