Skip to content

deema-A/FactCheckmate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

♟️FactCheckmate: Preemptively Detecting and Mitigating Hallucinations in LMs

Paper

📌 Overview

Language models (LMs) hallucinate. Can we detect and mitigate hallucinations before they occur? We introduce FactCheckmate, a method that preemptively detects hallucinations by classifying hidden states before decoding. If hallucination is predicted, FactCheckmate intervenes by adjusting hidden states to enhance factuality. It provides insights into LMs' internal mechanisms, operates efficiently with minimal overhead, and outperforms post-hoc methods. Evaluations across various LM families (Llama, Mistral, Qwen, Gemma) and QA datasets demonstrate over 70% detection accuracy and a 34.4% improvement in factuality.

📊 Main Results

Preemptive allucination detection test accuracy. I+O indicates a “reactive” baseline that classifies the LMs’ hidden states produced over both input questions and output answers, while I preemptively classifies hallucinations based on the hidden states over only the inputs. −n indicates that the classifier only sees a prefix of the input excluding the last n tokens.

LM NQ (I+O) NQ (I) NQ (-1) NQ (-2) NQ (-3) MMLU (I+O) MMLU (I) MMLU (-1) MMLU (-2) MMLU (-3) MedMCQA (I+O) MedMCQA (I) MedMCQA (-1) MedMCQA (-2) MedMCQA (-3) GSM8K (I+O) GSM8K (I) GSM8K (-1) GSM8K (-2) GSM8K (-3)
Llama2-7B 72.8 71.8 71.1 68.1 65.7 91.7 91.9 91.7 91.7 91.7 77.0 72.9 72.9 72.9 74.5 65.8 66.0 66.0 63.5 63.5
Llama2-13B 74.4 72.0 70.6 71.4 69.7 94.0 93.0 84.1 92.7 85.7 76.0 78.3 78.6 78.3 74.2 68.4 69.1 68.4 66.8 63.8
Llama3-8B 74.9 70.2 68.5 66.8 66.8 93.8 94.0 87.5 87.1 77.3 77.1 76.3 74.3 71.2 67.3 71.3 72.5 72.9 71.3 66.2
Llama3.1-8B 74.3 73.1 70.9 68.9 68.1 94.5 92.3 86.3 80.0 78.0 78.4 76.2 74.9 73.6 69.4 72.3 69.1 61.2 60.2 60.6
Mistral-7B 73.3 72.5 71.4 71.1 70.3 93.2 90.2 83.0 82.5 82.8 77.9 75.4 75.2 73.9 72.8 69.4 70.0 70.0 71.8 71.8
Gemma-7B 80.2 74.5 74.4 74.2 73.9 92.2 96.9 91.3 81.5 89.6 77.0 77.5 74.7 75.2 75.2 70.9 67.4 67.0 67.4 67.8
Qwen2.5-7B 76.1 74.5 72.7 71.2 69.2 94.3 94.0 71.3 85.4 74.4 78.9 76.6 76.9 75.7 74.9 67.0 67.2 67.2 67.2 67.0
Llama2-7B-chat 76.2 74.5 67.5 69.4 69.2 94.8 90.9 79.1 83.3 83.0 81.1 79.3 79.3 79.0 79.3 72.3 73.6 72.2 72.9 72.2
Llama2-13B-chat 74.8 72.4 70.7 70.9 68.5 93.8 93.9 80.1 78.6 92.1 81.3 73.3 70.8 70.0 71.9 72.3 71.9 71.9 71.9 71.9
Llama3-8B-Instruct 81.5 78.6 77.2 76.4 75.0 93.8 95.6 87.4 85.0 79.5 81.4 77.3 72.2 71.1 68.8 74.7 74.3 74.3 74.3 74.3
Llama3.1-8B-Instruct 83.3 74.5 71.3 70.9 66.7 93.1 91.8 86.4 85.4 80.1 81.7 78.8 76.5 71.7 70.1 76.2 78.4 78.0 78.0 78.4
Llama3-70B-Instruct 81.0 77.1 73.3 69.6 65.9 87.6 79.6 76.5 76.4 73.4 74.7 67.6 64.5 63.6 61.0 82.7 78.8 72.5 71.3 69.4

Citation

If you find our work useful, please feel free to cite our work:

@misc{alnuhait2024factcheckmatepreemptivelydetectingmitigating,
      title={FactCheckmate: Preemptively Detecting and Mitigating Hallucinations in LMs}, 
      author={Deema Alnuhait and Neeraja Kirtane and Muhammad Khalifa and Hao Peng},
      year={2024},
      eprint={2410.02899},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.02899}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages