This repository contains the code and resources necessary to reproduce our experiments on detecting hallucinations in multi-modal large language models using evidential conflict and semantic entropy. Our research focuses on evaluating model uncertainty and its relation to hallucination occurrences in LLaVA and mPLUG-Owl series models.
We here discuss hardware and software system requirements.
Generally speaking, our experiments require modern computer hardware which is suited for usage with Large Vision-Language Models (LVLMs). Replicating our experiments within a reasonable timeframe would be challenging without a GPU. We used NVIDIA RTX 3090 GPUs for all tests, evaluating LLaVA models of 7B, 13B, 34B scales and mPLUG-Owl2, mPLUG-Owl3. Specifically, the 7B/13B models were LLaVA-v1.5, while the 34B model adopted LLaVA-v1.6 (since v1.5 lacks a 34B variant), all executed on RTX 3090s. For accelerated inference, 4-bit quantization is recommended to significantly reduce processing time while maintaining acceptable accuracy.
Our experiments utilized Python 3.10, PyTorch 2.0.1, and the Ubuntu 20.04.4 LTS operating system.
Please refer to the installation guide at the following URL: https://github.com/X-PLUG/mPLUG-Owl/tree/main
Please refer to the installation guide at the following URL: https://github.com/X-PLUG/mPLUG-Owl/tree/main
Please refer to the installation guide at the following URL: https://github.com/haotian-liu/LLaVA/tree/main
All the required packages can be installed via conda.
When performing inference using these models, you can directly run the corresponding model's inference Python file in the infer directory.
For example:
python mplug_owl2_infer.py
When using Evidential Conflict to quantify the uncertainty of large language models, model weight files are required. For ease of computation, we have placed these weight files on Hugging Face at: https://huggingface.co/datasets/thuang5288/PRE-HAL/tree/main/model_weights.
Since our experiments involve extracting information from the Transformers decoder, reproducing our work requires modifying the Transformers source code to simultaneously return the model's output probability distribution and the state of the decoder's final hidden layer during inference.
Taking the mPLUG-Owl2 as an example, first we need to determine the location of the Transformers source code. On my machine, its location is
/data/username/miniconda3/envs/mplug_owl2/lib/python3.10/site-packages/transformers/generation/utils.py.
Then we need to determine the sampling method used by the model and enter this function.
For example, when using mPLUG-Owl2, we need to enter the sample function, then extract information such as the probability of the next token, the probability distribution, and the last hidden states of the decoder for return.
Which specific function to enter should be determined based on the model's official documentation or by using code debugging tools to step into the function gradually.
Please refer to our provided transformers/generation/utils.py file for detailed modification.
This respository is devided into five files, which are "infer", "infer_results", "measures", "models", "model_weights". Among them, the "infer" folder stores the code for model inference. The "infer_results" folder stores the results generated by running the model inference files. The "model_weights" folder stores the weights of the last hidden layer of each model. The "models" folder stores the files required for the deployment of each model. The "measures" folder contains code for processing model inference results, including code for verifying the correctness of model outputs using GPT-4o, code for calculating various uncertainty metrics, and code for computing evaluation metrics such as AUROC and ACC on the final results.
If you find this work is useful, please consider citing:
@article{huang2025visual,
title={Visual hallucination detection in large vision-language models via evidential conflict},
author={Huang, Tao and Liu, Zhekun and Wang, Rui and Zhang, Yang and Jing, Liping},
journal={International Journal of Approximate Reasoning},
pages={109507},
year={2025},
publisher={Elsevier}
}
Please feel free to email thuang@bjtu.edu.cn.