Skip to content

Commit 84012c9

Browse files
committed
update val
1 parent e700458 commit 84012c9

1 file changed

Lines changed: 47 additions & 5 deletions

File tree

README.md

Lines changed: 47 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,17 @@ This repository is the official implementation of [MMT-Bench](https://arxiv.org/
2121
2222
## 💡 News
2323

24-
- `2024/04/24`: The technical report of [MMT-Bench](https://arxiv.org/abs/2404.16006) is released! And check our [project page](https://mmt-bench.github.io/)!
25-
- `2024/04/26`: We release the evaluation code and the `VAL` split.
26-
- `2024/05/01`: MMT-Bench is accepted by ICML 2024. See you in Vienna! 🇦🇹🇦🇹🇦🇹
27-
- `2024/06/17`: Opencompass [VLMEevalKit](https://github.com/open-compass/VLMEvalKit) supports MMT-Bench now! **We strongly recommend using [VLMEevalKit](https://github.com/open-compass/VLMEvalKit) for its useful features and ready-to-use LVLM implementations**.
24+
25+
26+
27+
- `2024/07/18`: We release the Leaderboard of `VAL` split. Download dataset [here](https://huggingface.co/datasets/Kaining/MMT-Bench)
28+
- `2024/06/25`: We release the `ALL` split and `VAL` split.
2829
- `2024/06/25`: The evaluation of `ALL` split is host on the [EvalAI](https://eval.ai/web/challenges/challenge-page/2328/overview).
29-
- `2024/06/25`: We release the `ALL` split and `VAL` split.
30+
- `2024/06/17`: Opencompass [VLMEevalKit](https://github.com/open-compass/VLMEvalKit) supports MMT-Bench now! **We strongly recommend using [VLMEevalKit](https://github.com/open-compass/VLMEvalKit) for its useful features and ready-to-use LVLM implementations**.
31+
- `2024/05/01`: MMT-Bench is accepted by ICML 2024. See you in Vienna! 🇦🇹🇦🇹🇦🇹
32+
- `2024/04/26`: We release the evaluation code and the `VAL` split.
33+
- `2024/04/24`: The technical report of [MMT-Bench](https://arxiv.org/abs/2404.16006) is released! And check our [project page](https://mmt-bench.github.io/)!
34+
3035

3136
## Introduction
3237
MMT-Bench is a comprehensive benchmark designed to assess LVLMs across massive multimodal tasks requiring expert knowledge and deliberate visual recognition, localization, reasoning, and planning. MMT-Bench comprises 31, 325 meticulously curated multi-choice visual questions from various multimodal scenarios such as vehicle driving and embodied navigation, covering 32 core meta-tasks and 162 subtasks in multimodal understanding.
@@ -42,6 +47,43 @@ MMT-Bench is a comprehensive benchmark designed to assess LVLMs across massive m
4247

4348
## 🏆 Leaderboard
4449

50+
51+
### Val Set
52+
53+
| Rank | Model | Score |
54+
|------|-------------------------------|-------|
55+
| 1 | InternVL2-40B | 66.9 |
56+
| 2 | GPT4o | 65.4 |
57+
| 3 | GeminiPro1-5 | 64.5 |
58+
| 4 | GPT4V-20240409-HIGH | 64.3 |
59+
| 4 | InternVL-Chat-V1-2 | 64.3 |
60+
| 6 | Claude3-Opus | 62.5 |
61+
| 7 | InternVL2-26B | 60.6 |
62+
| 8 | LLavA-next-Yi-34B | 60.4 |
63+
| 9 | InternVL2-8B | 60.0 |
64+
| 10 | QwenVLMax | 59.7 |
65+
| 11 | GeminiProVision | 59.1 |
66+
| 12 | Mini-InternVL-Chat-4B-V1-5 | 58.4 |
67+
| 13 | XComposer2 | 56.3 |
68+
| 14 | Yi-VL-6B | 54.7 |
69+
| 15 | Phi-3-Vision | 54.5 |
70+
| 15 | TransCore-M | 54.5 |
71+
| 17 | deepseek-vl-7B | 54.0 |
72+
| 17 | Yi-VL-34B | 54.0 |
73+
| 19 | LLavA-internlm2-7B | 53.4 |
74+
| 19 | Monkey-Chat | 53.4 |
75+
| 21 | LLavA-next-vicuna-13B | 52.4 |
76+
| 22 | LLavA-v1.5-13B | 52.1 |
77+
| 23 | sharegpt4v-7B | 51.6 |
78+
| 24 | LLavA-v1.5-13B-xtuner | 50.7 |
79+
| 25 | mPLUG-Owl2 | 50.5 |
80+
| 26 | LLavA-next-vicuna-7B | 50.4 |
81+
| 27 | LLavA-v1.5-7B | 49.6 |
82+
| 28 | LLavA-v1.5-7B-xtuner | 49.3 |
83+
| 29 | LLavA-internlm-7B | 48.3 |
84+
| 30 | Qwen-Chat | 47.9 |
85+
| 30 | sharecaptioner | 47.9 |
86+
4587
### Full Set
4688

4789
| Rank | Model | Score |

0 commit comments

Comments
 (0)