Visible-infrared person re-identification, which combines visible light and thermal imaging data, provides more comprehensive and accurate monitoring capabilities under various environmental conditions. However, the differences between image modalities pose significant challenges in this field. Existing methods often focus on learning and analyzing the differences and consistencies between the semantics of different modalities, without considering the variability of background information and the considerable impact of surrounding noise on learning effectiveness. In view of this, we propose a novel multimodal pedestrian re-identification model, namely the feature fusion and spatial information adaptive network (FSIANet). Specifically, we designed an enhanced multi-feature aggregation module, which comprehensively mines semantic features from multiple scales while fusing shallow and deep features to explore feature representations from different channels and spaces. Furthermore, we also designed a spatially informed adaptive module, which simulates people’s attention to regions of interest during filming and can adaptively identify and focus on areas with dense information distribution, effectively reducing the interference of surrounding noise and irrelevant information. The superiority of our proposed FSIANet has been demonstrated through extensive experimentation on the SYSU-MM01 and RegDB datasets, compared with several other state-of-the-art methods. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Feature fusion
Visible radiation
Infrared imaging
Infrared radiation
Image fusion
Semantics
Feature extraction