Pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge by Teney et al.
- Python 2.7+
- NumPy
- PyTorch
- tqdm (visualizing preprocessing progress only)
- nltk (and this to tokenize questions)
- For questions and answers, go to
data/folder and executepreproc.pydirectly. - You'll need to install the Stanford Tokenizer, follow the instructions in their page.
- The tokenizing step may take up to 36 hrs to process the training questions (I have a Xeon E5 CPU already), write a pure java code to tokenize them should be a lot faster. (Since python nltk will call the java binding, and python is slow)
- For image feature, slightly modify this code to convert tsv to a npy file
coco_features.npythat contains a list of dictionaries with key being image id and value being the feature (shape: 36, 2048). - Download and extract GloVe to
data/folder as well. - Now we should be able to train, reassure that the
data/folder should now contain at least:- glove.6B.300d.txt - vqa_train_final.json - coco_features.npy - train_q_dict.p - train_a_dict.p - (Update) For convenience, here is the link to tokenized questions
vqa_train_toked.jsonandvqa_val_toked.json, make sure you rundata/preproc.pyto generatevqa_train_final.json,train_q_dict.p, etc.
Use default parameters:
python main.py --trainTrain from a previous checkpoint:
python main.py --train --modelpath=/path/to/saved.pth.tarCheck out tunable parameters:
python main.pypython main.py --evalThis will generate result.json (validation set only), format is referred to vqa evaluation format.
- The default classifier is softmax classifier, sigmoid multi-label classifier is also implemented but I can't train based on that.
- Training for 50 epochs reach around 64.42% training accuracy.
- For the output classifier, I did not use the pretrained weight since it's hard to retrieve so I followed eq. 5 in the paper.
- To prepare validation data you need to uncomment some line of code in
data/preproc.py. coco_features.npyis a really fat file (34GB including train+val image features), you can split it and modify the data loading mechanisms inloader.py.- This code is tested with train = train and eval = val, no test data included.
- Issues are welcome!
