Alleviating shortcut learning behavior of VQA model with context  augmentation and adaptive loss adjustment

Zerong Zeng; Ruifang Liu; Huan Wang

doi:10.1117/12.2661996

28 December 2022 Alleviating shortcut learning behavior of VQA model with context augmentation and adaptive loss adjustment

Zerong Zeng, Ruifang Liu, Huan Wang

Author Affiliations +

Proceedings Volume 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022); 125062A (2022) https://doi.org/10.1117/12.2661996
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2022), 2022, Beijing, China

Abstract

Despite the impressive improvements of Visual Question Answer (VQA), it still remains a challenge of how to avoid the suffering of spurious correlations from textual content to answer. Previous researches have shown that due to the existence of language bias in the VQA dataset, VQA models may tend to capture superficial statistical correlation and suffer from the poor generalization capability in the out-of-distribution data. To alleviate the biases caused by language modality, we propose a method of context augmentation and adaptive loss adjustment, which can alleviate shortcut learning behavior of VQA models. Specifically, the existence of language bias is due to the high co-occurrence frequency of categories and the words in “Question”, therefore, we propose to use “Paraphrase Generation” to produce paraphrases with diverse contexts, so as to mitigate such correlation. Secondly, we use adaptive loss adjustment to adjust the importance of samples, that is, reduce the importance of bias-aligned samples and improve the importance of bias-conflicting samples, so as to guide the model to capture the intrinsic attributes that are beneficial to generalization. The experiments have demonstrated the feasibility and validity of our method on a variety of VQA models.

Citation Download Citation

Zerong Zeng, Ruifang Liu, and Huan Wang "Alleviating shortcut learning behavior of VQA model with context augmentation and adaptive loss adjustment", Proc. SPIE 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022), 125062A (28 December 2022); https://doi.org/10.1117/12.2661996

Access the abstract

PROCEEDINGS
7 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Data modeling

Statistical modeling

Performance modeling

Visualization

Process modeling

Visual process modeling

Artificial intelligence

Show All Keywords

RELATED CONTENT

Progressively consolidating historical visual explorations for new discoveries
Proceedings of SPIE (February 03 2014)

Credit card fraud detection using supervised machine learning methods
Proceedings of SPIE (April 22 2022)

Modeling of intelligence recommendation based on UML
Proceedings of SPIE (August 30 2022)

A time varying subjective quality model for mobile streaming videos...
Proceedings of SPIE (September 22 2015)

Visualization for enhancing the data mining process
Proceedings of SPIE (March 27 2001)

Applying an integrated neuro expert system model in a real...
Proceedings of SPIE (March 23 1993)

Lightweight modeling environment for network-centric systems
Proceedings of SPIE (August 29 2001)

Subscribe to Digital Library

Receive Erratum Email Alert

Show All Keywords

Keywords/Phrases

Search In:

Publication Years