Skip to content

feat(mobilevit): align factories to paper Table-3 (L=[2,4,3], XXS expand=2)#20

Open
runwangdl wants to merge 2 commits into
pulp-platform:develfrom
runwangdl:feat/mobilevit-paper-arch
Open

feat(mobilevit): align factories to paper Table-3 (L=[2,4,3], XXS expand=2)#20
runwangdl wants to merge 2 commits into
pulp-platform:develfrom
runwangdl:feat/mobilevit-paper-arch

Conversation

@runwangdl
Copy link
Copy Markdown
Collaborator

Summary

Brings the MobileViT factory functions in line with the MobileViT paper (Mehta & Rastegari, ICLR 2022) Table 3 / §3.3:

  • num_transformer_blocks per stage is now L = [2, 4, 3] for every variant. Previously every MobileViT block used the default of 2, silently dropping ~3 transformer layers vs. the paper.
  • XXS now uses MV2 expansion ratio 2, matching the paper. XS / S keep the existing 4. Previously every variant used 4.

Changes

onnx4deeploy/models/pytorch_models/mobilevit/mobilevit.py:

  • Adds two new keyword arguments to MobileViT.__init__:
    • transformer_depths: list = [2, 4, 3]
    • mv2_expand_ratio: int = 4
  • Threads them through to each MobileViTBlock (num_transformer_blocks=) and each InvertedResidual (expand_ratio=).
  • Updates mobile_vit_xxs / mobile_vit_xs / mobile_vit_s factories to pass transformer_depths=[2, 4, 3] and the per-variant mv2_expand_ratio.

Parameter count check

1 × 3 × 256 × 256 input, 1000 classes:

Variant Before After Paper
XXS 0.93 M 1.04 M ~1.3 M
XS (n/a) 2.07 M ~2.3 M
S (n/a) 5.20 M ~5.6 M

Remaining ~0.2–0.4 M gap per variant comes from this deploy version implementation choices (no dropout; separate bias-less Q / K / V linears instead of a single combined QKV linear with bias), not from architectural deviation from Table 3.

…and=2)

The existing MobileViT factory functions left `num_transformer_blocks` at
its default of 2 for every stage and used the InvertedResidual default
expand_ratio of 4 for all variants. Per the MobileViT paper (Mehta &
Rastegari, ICLR 2022, Table 3 / §3.3) all three variants use
L = [2, 4, 3] transformer layers per MobileViT block, and XXS uses an MV2
expansion factor of 2 (vs. 4 for XS/S).

This commit:
- Adds `transformer_depths` and `mv2_expand_ratio` keyword arguments to
  MobileViT.__init__, threading them through to each MobileViTBlock and
  InvertedResidual.
- Updates `mobile_vit_xxs`, `mobile_vit_xs`, `mobile_vit_s` factories to
  pass the paper-specified values.

Param-count check (1×3×256×256 input, 1000 classes):
  XXS: 0.93 M  ->  1.04 M   (paper ~1.3 M)
  XS:  ----    ->  2.07 M   (paper ~2.3 M)
  S:   ----    ->  5.20 M   (paper ~5.6 M)

Remaining gap (~0.2–0.4 M per variant) comes from this deploy version's
implementation choices (no dropout, separate bias-less Q/K/V projections
instead of a single bias-ful combined QKV linear), not from architectural
deviation.
@runwangdl runwangdl requested a review from Victor-Jung as a code owner May 14, 2026 16:32
Pre-commit hook in pulp-platform/Onnx4Deeploy CI runs psf/black 24.1.1 with
--line-length=100. The six new InvertedResidual(...) call sites with
expand_ratio=mv2_expand_ratio exceeded 100 chars; black wants them
broken across lines. No semantic change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant