torch==2.5.1
transformer==4.47.0
datasets==3.1.0
numpy==1.26.4
Download the hugging face checkpoints of LMMs and LLMs (Qwen2VL-7B-Instruct, Llava1.6-Mistral-7B, Qwen2.5-14B, Mistral-12B, Llama3.1-8B and Qwen2.5-7B-Instruct) to dir ./models/models_hf/xxx_hf/, e.g., ./models/models_hf/llama3_hf/8bf/, ./models/models_hf/qwen2vl_hf/7bf/, etc.
- Download FHM to
./data/FHM/v1/and place all images under./data/FHM/v1/Images - Download HarMeme annotations to
./data/HarMeme_V1/Annotations(including Harm-C and Harm-P) and images to./data/HarMeme_V1/HarMeme_Images - Download the csv file of PrideMM to
./data/PrideMM/and images to./data/PrideMM/Images - Download MultiOFF to
./data/MultiOFF/MultiOFF_Dataset - Download the .tsv files of MAMI to
./data/MAMI/and images to./data/MAMI/MAMI_2022_images
With all data prepared, go to the preprocess.ipynb under each dataset directory and run the preprocessing codes accordingly.
We provide shell script templates ./run_cmd/run_cmd.py for reproducing results demonstrated in our paper.
Simply run this command to evaluate all the small-scale LLMs on all the datasets used in the paper:
python ./run_cmd/run_cmd.py
Run this command to evaluate GPT-4 like models:
python ./run_cmd/run_gpt.py
TBC.
