Facial expression recognition, which aims at recognizing the human expression information by cameras, has attracted more interest thanks to its significance and potential application. However, the existing expression recognition methods are based on convolution and pooling stack to expand the receptive field, ignoring the interaction of the pixel in the feature maps. To solve this problem, we propose a disentangled-region non-local (DRNL) neural network, which can capture the long-range dependencies directly by calculating the interaction between the pixel and the region, not limited to the adjacent pixels, to maintain more information. At the same time, we decouple the DRNL block into two terms to extract clearer visual features, one of which is a whitened paired term to model the relationship between pixels and regions, and the other is a unary term to represent the saliency information of each pixel. We evaluate the proposed network on two public datasets, including the real-world affective faces database (RAFDB) and the static facial expressions in the wild (SFEW), and show the performance using the visualization method. The accuracy rates of our method achieve 90.450% on RAFDB and 63.855% on SFEW, respectively. Abundant experiments demonstrate state-of-the-art performance by comparing with the previous methods. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
CITATIONS
Cited by 1 scholarly publication.
Convolution
Neural networks
Facial recognition systems
Visualization
Sun
Feature extraction
Data modeling