In this baseline, fine-tune resnet model to classify each person crop over 9 actions, after fine-tuning last layer features are pooled over all people, and are fed to a softmax classifier to recognize group activities in each single frame.
This baseline is similar to the previous baseline (Person Classification) with one distinction. The resnet model on each player is fine-tuned to recognize person-level actions. Then, last_layer is pooled over all players to recognize group activities in a scene without any finetuning of the AlexNet model. The rationale behind this baseline is to examine a scenario where person-level action annotations as well as group activity annotations are used in a deep learning model that does not model the temporal aspect of group activities. This is very similar to our two-stage model without the temporal modeling.
In this baseline, fine-tune
resnetmodel to classify each person crop over 9 actions, after fine-tuninglast layerfeatures are pooled over all people, and are fed to asoftmaxclassifier to recognize group activities in each single frame.This baseline is similar to the previous baseline (
Person Classification) with one distinction. Theresnetmodel on each player is fine-tuned to recognize person-level actions. Then,last_layeris pooled over all players to recognize group activities in a scene without any finetuning of the AlexNet model. The rationale behind this baseline is to examine a scenario where person-level action annotations as well as group activity annotations are used in a deep learning model that does not model the temporal aspect of group activities. This is very similar to our two-stage model without the temporal modeling.