Open Access Paper
28 December 2022 Alleviating shortcut learning behavior of VQA model with context augmentation and adaptive loss adjustment
Zerong Zeng, Ruifang Liu, Huan Wang
Author Affiliations +
Proceedings Volume 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022); 125062A (2022) https://doi.org/10.1117/12.2661996
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2022), 2022, Beijing, China
Abstract
Despite the impressive improvements of Visual Question Answer (VQA), it still remains a challenge of how to avoid the suffering of spurious correlations from textual content to answer. Previous researches have shown that due to the existence of language bias in the VQA dataset, VQA models may tend to capture superficial statistical correlation and suffer from the poor generalization capability in the out-of-distribution data. To alleviate the biases caused by language modality, we propose a method of context augmentation and adaptive loss adjustment, which can alleviate shortcut learning behavior of VQA models. Specifically, the existence of language bias is due to the high co-occurrence frequency of categories and the words in “Question”, therefore, we propose to use “Paraphrase Generation” to produce paraphrases with diverse contexts, so as to mitigate such correlation. Secondly, we use adaptive loss adjustment to adjust the importance of samples, that is, reduce the importance of bias-aligned samples and improve the importance of bias-conflicting samples, so as to guide the model to capture the intrinsic attributes that are beneficial to generalization. The experiments have demonstrated the feasibility and validity of our method on a variety of VQA models.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zerong Zeng, Ruifang Liu, and Huan Wang "Alleviating shortcut learning behavior of VQA model with context augmentation and adaptive loss adjustment", Proc. SPIE 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022), 125062A (28 December 2022); https://doi.org/10.1117/12.2661996
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Statistical modeling

Performance modeling

Visualization

Process modeling

Visual process modeling

Artificial intelligence

Back to Top