Skip to content

HT86159/Evidential-Conflict

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visual-Hallucination-Detection-in-Large-Vision-Language-Models-via-Evidential-Conflict

This repository contains the code and resources necessary to reproduce our experiments on detecting hallucinations in multi-modal large language models using evidential conflict and semantic entropy. Our research focuses on evaluating model uncertainty and its relation to hallucination occurrences in LLaVA and mPLUG-Owl series models.

System Requipment

We here discuss hardware and software system requirements.

Hardware Dependencies

Generally speaking, our experiments require modern computer hardware which is suited for usage with Large Vision-Language Models (LVLMs). Replicating our experiments within a reasonable timeframe would be challenging without a GPU. We used NVIDIA RTX 3090 GPUs for all tests, evaluating LLaVA models of 7B, 13B, 34B scales and mPLUG-Owl2, mPLUG-Owl3. Specifically, the 7B/13B models were LLaVA-v1.5, while the 34B model adopted LLaVA-v1.6 (since v1.5 lacks a 34B variant), all executed on RTX 3090s. For accelerated inference, 4-bit quantization is recommended to significantly reduce processing time while maintaining acceptable accuracy.

Software Dependencies

Our experiments utilized Python 3.10, PyTorch 2.0.1, and the Ubuntu 20.04.4 LTS operating system.

Installation Guide

mPLUG-Owl2

Please refer to the installation guide at the following URL: https://github.com/X-PLUG/mPLUG-Owl/tree/main

mPLUG-Owl3

Please refer to the installation guide at the following URL: https://github.com/X-PLUG/mPLUG-Owl/tree/main

LLaVA

Please refer to the installation guide at the following URL: https://github.com/haotian-liu/LLaVA/tree/main

All the required packages can be installed via conda.

When performing inference using these models, you can directly run the corresponding model's inference Python file in the infer directory. For example:

python mplug_owl2_infer.py

When using Evidential Conflict to quantify the uncertainty of large language models, model weight files are required. For ease of computation, we have placed these weight files on Hugging Face at: https://huggingface.co/datasets/thuang5288/PRE-HAL/tree/main/model_weights.

Transformers source code modifying instruction

Since our experiments involve extracting information from the Transformers decoder, reproducing our work requires modifying the Transformers source code to simultaneously return the model's output probability distribution and the state of the decoder's final hidden layer during inference.

Taking the mPLUG-Owl2 as an example, first we need to determine the location of the Transformers source code. On my machine, its location is

/data/username/miniconda3/envs/mplug_owl2/lib/python3.10/site-packages/transformers/generation/utils.py.

Then we need to determine the sampling method used by the model and enter this function. For example, when using mPLUG-Owl2, we need to enter the sample function, then extract information such as the probability of the next token, the probability distribution, and the last hidden states of the decoder for return. Which specific function to enter should be determined based on the model's official documentation or by using code debugging tools to step into the function gradually.

Please refer to our provided transformers/generation/utils.py file for detailed modification.

Repository Structure

This respository is devided into five files, which are "infer", "infer_results", "measures", "models", "model_weights". Among them, the "infer" folder stores the code for model inference. The "infer_results" folder stores the results generated by running the model inference files. The "model_weights" folder stores the weights of the last hidden layer of each model. The "models" folder stores the files required for the deployment of each model. The "measures" folder contains code for processing model inference results, including code for verifying the correctness of model outputs using GPT-4o, code for calculating various uncertainty metrics, and code for computing evaluation metrics such as AUROC and ACC on the final results.

Citation

If you find this work is useful, please consider citing:

@article{huang2025visual,
  title={Visual hallucination detection in large vision-language models via evidential conflict},
  author={Huang, Tao and Liu, Zhekun and Wang, Rui and Zhang, Yang and Jing, Liping},
  journal={International Journal of Approximate Reasoning},
  pages={109507},
  year={2025},
  publisher={Elsevier}
}

Contact

Please feel free to email thuang@bjtu.edu.cn.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages