Skip to content

YanNeu/DASH-B

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DASH-B

Object Hallucination Benchmark for Vision Language Models (VLMs)

from the paper

DASH: Detection and Assessment of Systematic Hallucinations of VLMs

Leaderboard | Download | Model Evaluation | Hugging Face | Citation

Dataset

The benchmark consists of 2682 images for a range of 70 different objects. The used query is "Can you see a object in this image. Please answer only with yes or no." 1341 of the images do not contain the corresponding object but trigger object hallucinations. They were retrieved using the DASH pipeline. The remaining 1341 images contain the objects.

Examples of images that do not contain the object:

Examples from the benchmark

Download images

You can either use Hugging Face Datasets:

dataset = load_dataset("YanNeu/DASH-B")

Or download and unzip in images:

cd images
wget https://nc.mlcloud.uni-tuebingen.de/index.php/s/HJKbBWpgLFz4rN5/download/neg.zip
wget https://nc.mlcloud.uni-tuebingen.de/index.php/s/ppZeYNayJXiogjP/download/pos.zip
unzip neg.zip
unzip pos.zip
rm neg.zip
rm pos.zip

Evaluating VLMs on DASH-B

(a) Using Hugging Face Datasets

We provide a template script in src/evaluate_hf.py.

(b) Without Hugging Face Datasets (requires downloading the images manually)

You can easily evaluate a custom VLM by implementing the two functions

def load_vlm(self, *args, **kwargs):
    """
    Loads the model and processors/tokenizers required for inference.

    """
    # Implement loading of model and processors/tokenizers here
    raise NotImplementedError()

and

def evaluate_dataset(self, data_dicts, *args, **kwargs):
    """
    Args:
        data_dicts: list of dictionaries, each containing the following keys:
            - image_path: path to the image
            - prompt: text query 
            
    Returns a list of model response strings for each image-query in the dictionaries.
    """
    # Implement model evaluation here
    raise NotImplementedError()

in src/evaluate.py and running

CUDA_VISIBLE_DEVICES=0 python src/evaluate.py --vlm_name custom

See src/vlms for several implementations used in the paper.

(c) Reproduce DASH results

Install the requirements as explained in the DASH repository.

Example for Paligemma-3B:

CUDA_VISIBLE_DEVICES=0 python src/evaluate.py --vlm_name dash_paligemma 

Available VLMs:

available_models = [
    "AIDC-AI/Ovis2-1B",
    "AIDC-AI/Ovis2-2B",
    "AIDC-AI/Ovis2-4B",
    "AIDC-AI/Ovis2-8B",
    "dash_paligemma",
    "dash_llava1.6vicuna",
    "dash_llava1.6mistral",
    "dash_llava1.6llama",
    "dash_llava_onevision",
    "dash_paligemma2-3b",
    "dash_paligemma2-10b",
    "OpenGVLab/InternVL2_5-8B",
    "OpenGVLab/InternVL2_5-26B",
    "OpenGVLab/InternVL2_5-38B",
    "OpenGVLab/InternVL2_5-78B",
    "OpenGVLab/InternVL2_5-8B-MPO",
    "OpenGVLab/InternVL2_5-26B-MPO",
]

Citation

@inproceedings{augustin2025dash,
    title={DASH: Detection and Assessment of Systematic Hallucinations of VLMs},
    author={Augustin, Maximilian and Neuhaus, Yannic and Hein, Matthias},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
    year={2025}
}

About

Object Hallucination Benchmark for Vision Language Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages