Paper
11 October 2023 Efficient transformer for human-object interaction detection
Nannan Yang, Yan Zheng, Xiaoming Guo
Author Affiliations +
Proceedings Volume 12800, Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023); 1280027 (2023) https://doi.org/10.1117/12.3003996
Event: 6th International Conference on Computer Information Science and Application Technology (CISAT 2023), 2023, Hangzhou, China
Abstract
The challenge of detecting Human-Object Interaction (HOI) in photos is addressed in this study through the introduction of an effective Transformer approach. The Transformer decoder incorporates Spatially Modulated Co-Attention (SMCA) to preset the position of the target (human or object) in the image, narrow the search range of the query vector and accelerate the convergence of the model. In order to fuse multi-scale features and increase model recognition accuracy, at the same time, the intra-scale self-attention mechanism is introduced into the encoder as an auxiliary operator to fuse multi-scale features. The model adopts multiple Multilayer Perceptron (MLP) structures to locate and identify HOI instances. The loss function of this model is the combination of classification loss and bounding-box regression loss. The HICO-DET dataset had a detection accuracy of 27.19%mAP on average, while the V-COCO dataset had a detection accuracy of 53.2%mAP.
(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Nannan Yang, Yan Zheng, and Xiaoming Guo "Efficient transformer for human-object interaction detection", Proc. SPIE 12800, Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023), 1280027 (11 October 2023); https://doi.org/10.1117/12.3003996
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Education and training

Target detection

Detection and tracking algorithms

Visualization

Feature extraction

Modulation

Back to Top