This repository includes baselines of the perceived emotion estimation. The perceived emotion estimation is also known as Affective Computing [1], which is to estimate the emotions that are expected to be evoked in viewers by visual contents.
We welcome contributes! Feel free to submit pull requests!
Emotions can be described by arousal and valence (VA). Some papers also use 3 values (arousal, valence and dominance) (VAD) to describe. Continuous Emotion Prediction is to estimate the VA or VAD values.
The performance of the baseline is as follows
| Model | Dataset | MSE |
|---|---|---|
| Resnet50 | IAPS [3] | 0.01742 |
| Resnet50 | CGNA [2] | 0.02069 |
Note that both the MSE and RMSE are calculated based on the normalized score, i.e., the range of scores is [0,1].
Pretrained models can be downloaded in this link.
Add Discrete Emotion Classification for Images;
Add Models for Videos;
[1] Zhao, Sicheng, et al. "Affective Computing for Large-scale Heterogeneous Multimedia Data: A Survey." ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15.3s (2019): 1-32.
[2] Kim, Hye-Rin, et al. "Building emotional machines: Recognizing image emotions through deep neural networks." IEEE Transactions on Multimedia 20.11 (2018): 2980-2992.
[3] Lang, Peter J., Margaret M. Bradley, and Bruce N. Cuthbert. "International affective picture system (IAPS): Technical manual and affective ratings." NIMH Center for the Study of Emotion and Attention 1 (1997): 39-58