Skip to content

Latest commit

 

History

History
127 lines (92 loc) · 2.98 KB

File metadata and controls

127 lines (92 loc) · 2.98 KB

Changelog

All notable changes to EASI are documented in this file.


[v0.2.2] – 2026-04-27

Added

  • 3 new models:
    • SenseNova-SI-1.4-InternVL3-8B, SenseNova-SI-1.5-InternVL3-8B, SenseNova-SI-1.3-Qwen3-VL-8B
  • Unified CLI workflow:
    • Added scripts/submissions/run_easi_eval.py for one-command EASI-8 evaluation across VLMEvalKit and lmms-eval backends.
    • Added automated result collection, packaging, and optional leaderboard submission.

Changed

  • Refactored the unified evaluation script with backend adapters for VLMEvalKit and lmms-eval.
  • Added CLI options for backend selection, model arguments, accelerate control, and rerun behavior.

Fixed

  • Improved unified CLI stability and robustness, including lmms-eval multi-GPU execution, result extraction, archive generation, and VLMEvalKit judge handling.

[v0.2.1] – 2026-02-09

Added

  • 2 new benchmarks:
    • ERIQ
    • OSI-Bench
  • More lmms-eval benchmark verification entries.

Fixed

  • VLMEvalKit evaluation issues on MuirBench.

[v0.2.0] – 2026-01-16

  • New Backend Support: Integrated lmms-eval alongside VLMEvalKit, offering flexible evaluation options.
  • Expanded benchmark support: Added DSR-Bench.

[v0.1.5] – 2026-01-09

Added

  • 1 new benchmark:
    • STI-Bench
  • 2 new models:
    • SenseNova-SI-1.1-BAGEL-7B-MoT, SenseNova-SI-1.3-InternVL3-8B
  • Add Benchmark Verification Info

[v0.1.4] – 2025-12-19

Added

  • 4 new benchmarks:
    • SPBench, MMSI-Video-Bench, VSI-SUPER-Recall, VSI-SUPER-Count.

[v0.1.3] – 2025-12-12

Added

  • 3 new image benchmarks:
    • ERQA, RefSpatial-Bench, RoboSpatial-Home
  • Docker support improvements:
    • Added a generic EASI runtime Dockerfile
    • Added model-specific Dockerfiles for Cambrian-S and VLM3R

[v0.1.2] – 2025-12-08

Added

  • 6 new models:
    • SenseNova-SI 1.1: Qwen2.5-VL-3B / Qwen2.5-VL-7B / Qwen3-VL-8B
    • SenseNova-SI 1.2: InternVL3-8B
    • VLM-3R
    • BAGEL-7B-MoT
  • 4 new image benchmarks:
    • STAR-Bench
    • OmniSpatial
    • Spatial-Visualization-Benchmark
    • SPAR-Bench

New Feature

  • LLM-based answer extraction for selected EASI benchmarks
    • Enable via: --judge gpt-4o-1120
    • Routes to gpt-4o-2024-11-20

[v0.1.1] – 2025-11-21

Added

  • 9 Spatial Intelligence models (total expanded from 7 → 16):
    • SenseNova-SI: InternVL3-8B / InternVL3-2B
    • SpaceR-7B
    • VST-3B-SFT / VST-7B-SFT
    • Cambrian-S: 0.5B / 1.5B / 3B / 7B
  • New benchmark:
    • VSI-Bench-Debiased

[v0.1.0] – 2025-11-07

Initial Public Release

Added

  • 7 Spatial Intelligence models:
    • SenseNova-SI (InternVL3-8B / InternVL3-2B)
    • MindCube Series (3 variants)
    • SpatialLadder-3B
    • SpatialMLLM-4B
  • 6 Spatial Intelligence benchmarks:
    • Image: MindCube, ViewSpatial, EmbSpatial, MMSI
    • Video: VSI-Bench, SITE-Bench

Introduced

  • EASI standardized evaluation protocol (as described in the paper)
  • Unified VLMEvalKit-based evaluation pipeline