Multi-sensor fusion algorithms combine information from different sensors to exceed the performance of a single sensor for a given task. In this work, we focus on fusing imagery from electro-optical (EO) and synthetic aperture radar (SAR) sensors for target identification. In addition to the imagery itself, large amounts of metadata or “side information” may be available as well. This data can include important characteristics about the operating conditions (OCs) under which the images were taken. On its own, this metadata is not useful for target identification. However, this extra information can potentially be leveraged to learn better representations of EO and SAR images and improve classification performance. In this work, we assume that side information is available only during training and leverage this information to build contextual deep representations of the target classes. At test time, we fuse the EO and SAR representations to classify the input images without accessing the metadata. We examine the impact of these OC-aware target representations on fusion performance under various forms of OC mismatch between training and testing and show that fusing models trained with side information improves classification accuracy when compared to classifiers trained without side information, especially under more significant train/test OC shifts. We also observe that the inclusion of side information may reduce the trained network’s capacity, which implies that side information introduces a regularizing effect. To further study this effect, we empirically compare our approach to classifiers trained with weight decay and bottleneck layers and find that our approach achieves higher accuracy, implying that the inclusion of side information has additional impacts on the learned representations beyond simple regularization.
|