This repository contains the implementation and experiments for VQA4Mix, focusing on visual question answering (VQA) in mixed datasets. It includes training scripts, inference scripts, and the necessary steps to reproduce the reported results.
- Python >= 3.8
- PyTorch >= 1.9.0
-
Clone this repository:
git clone https://github.com/godlikeS97/VQA4Mix.git cd VQA4Mix -
Create and activate a virtual environment:
python3 -m venv vqa4mix_env source vqa4mix_env/bin/activate
-
Download the datasets:
- Specify the datasets used (e.g., COCO, Art Cap).
-
Unzip the downloaded datasets and place them in the
data/directory:mkdir data mv <downloaded_dataset> data/
-
Preprocess the data by running the preprocessing jupyter Notebook under each corresponding folder in demo:
- Accuracy in MCQ
Evaluating the Limitations of Generative Models in Image Captioning
For questions or issues, please create an issue in this repository or contact the authors.
