feat(mobilevit): align factories to paper Table-3 (L=[2,4,3], XXS expand=2) by runwangdl · Pull Request #20 · pulp-platform/Onnx4Deeploy

runwangdl · 2026-05-14T16:31:59Z

Summary

Brings the MobileViT factory functions in line with the MobileViT paper (Mehta & Rastegari, ICLR 2022) Table 3 / §3.3:

num_transformer_blocks per stage is now L = [2, 4, 3] for every variant. Previously every MobileViT block used the default of 2, silently dropping ~3 transformer layers vs. the paper.
XXS now uses MV2 expansion ratio 2, matching the paper. XS / S keep the existing 4. Previously every variant used 4.

Changes

onnx4deeploy/models/pytorch_models/mobilevit/mobilevit.py:

Adds two new keyword arguments to MobileViT.__init__:
- transformer_depths: list = [2, 4, 3]
- mv2_expand_ratio: int = 4
Threads them through to each MobileViTBlock (num_transformer_blocks=) and each InvertedResidual (expand_ratio=).
Updates mobile_vit_xxs / mobile_vit_xs / mobile_vit_s factories to pass transformer_depths=[2, 4, 3] and the per-variant mv2_expand_ratio.

Parameter count check

1 × 3 × 256 × 256 input, 1000 classes:

Variant	Before	After	Paper
XXS	0.93 M	1.04 M	~1.3 M
XS	(n/a)	2.07 M	~2.3 M
S	(n/a)	5.20 M	~5.6 M

Remaining ~0.2–0.4 M gap per variant comes from this deploy version implementation choices (no dropout; separate bias-less Q / K / V linears instead of a single combined QKV linear with bias), not from architectural deviation from Table 3.

…and=2) The existing MobileViT factory functions left `num_transformer_blocks` at its default of 2 for every stage and used the InvertedResidual default expand_ratio of 4 for all variants. Per the MobileViT paper (Mehta & Rastegari, ICLR 2022, Table 3 / §3.3) all three variants use L = [2, 4, 3] transformer layers per MobileViT block, and XXS uses an MV2 expansion factor of 2 (vs. 4 for XS/S). This commit: - Adds `transformer_depths` and `mv2_expand_ratio` keyword arguments to MobileViT.__init__, threading them through to each MobileViTBlock and InvertedResidual. - Updates `mobile_vit_xxs`, `mobile_vit_xs`, `mobile_vit_s` factories to pass the paper-specified values. Param-count check (1×3×256×256 input, 1000 classes): XXS: 0.93 M -> 1.04 M (paper ~1.3 M) XS: ---- -> 2.07 M (paper ~2.3 M) S: ---- -> 5.20 M (paper ~5.6 M) Remaining gap (~0.2–0.4 M per variant) comes from this deploy version's implementation choices (no dropout, separate bias-less Q/K/V projections instead of a single bias-ful combined QKV linear), not from architectural deviation.

Pre-commit hook in pulp-platform/Onnx4Deeploy CI runs psf/black 24.1.1 with --line-length=100. The six new InvertedResidual(...) call sites with expand_ratio=mv2_expand_ratio exceeded 100 chars; black wants them broken across lines. No semantic change.

runwangdl requested a review from Victor-Jung as a code owner May 14, 2026 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mobilevit): align factories to paper Table-3 (L=[2,4,3], XXS expand=2)#20

feat(mobilevit): align factories to paper Table-3 (L=[2,4,3], XXS expand=2)#20
runwangdl wants to merge 2 commits into
pulp-platform:develfrom
runwangdl:feat/mobilevit-paper-arch

runwangdl commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

runwangdl commented May 14, 2026

Summary

Changes

Parameter count check

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant