2017 VQA Challenge Winner (CVPR'17 Workshop)

Pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge by Teney et al.

Prerequisites

Python 2.7+
NumPy
PyTorch
tqdm (visualizing preprocessing progress only)
nltk (and this to tokenize questions)

Data

Preparation

For questions and answers, go to data/ folder and execute preproc.py directly.
You'll need to install the Stanford Tokenizer, follow the instructions in their page.
The tokenizing step may take up to 36 hrs to process the training questions (I have a Xeon E5 CPU already), write a pure java code to tokenize them should be a lot faster. (Since python nltk will call the java binding, and python is slow)
For image feature, slightly modify this code to convert tsv to a npy file coco_features.npy that contains a list of dictionaries with key being image id and value being the feature (shape: 36, 2048).
Download and extract GloVe to data/ folder as well.

Now we should be able to train, reassure that the data/ folder should now contain at least:

- glove.6B.300d.txt
- vqa_train_final.json
- coco_features.npy
- train_q_dict.p
- train_a_dict.p

(Update) For convenience, here is the link to tokenized questions vqa_train_toked.json and vqa_val_toked.json, make sure you run data/preproc.py to generate vqa_train_final.json, train_q_dict.p, etc.

Train

Use default parameters:

python main.py --train

Train from a previous checkpoint:

python main.py --train --modelpath=/path/to/saved.pth.tar

Check out tunable parameters:

python main.py

Evaluate

python main.py --eval

This will generate result.json (validation set only), format is referred to vqa evaluation format.

Notes

The default classifier is softmax classifier, sigmoid multi-label classifier is also implemented but I can't train based on that.
Training for 50 epochs reach around 64.42% training accuracy.
For the output classifier, I did not use the pretrained weight since it's hard to retrieve so I followed eq. 5 in the paper.
To prepare validation data you need to uncomment some line of code in data/preproc.py.
coco_features.npy is a really fat file (34GB including train+val image features), you can split it and modify the data loading mechanisms in loader.py.
This code is tested with train = train and eval = val, no test data included.
Issues are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
README.md		README.md
loader.py		loader.py
main.py		main.py
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2017 VQA Challenge Winner (CVPR'17 Workshop)

Prerequisites

Data

Preparation

Train

Evaluate

Notes

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

2017 VQA Challenge Winner (CVPR'17 Workshop)

Prerequisites

Data

Preparation

Train

Evaluate

Notes

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages