27 September 2022 Graph neural network-based visual relationship and multilevel attention for image captioning
Himanshu Sharma, Swati Srivastava
Author Affiliations +
Abstract

With the remarkable success of the image captioning tasks, visual attention methods have become a vital part of captioning models. However, most attention-based image captioning methods do not consider any relationship among regions, which play a significant role in better image understanding. We proposed an image captioning method based on local relation network using a multilevel attention approach with graph neural network. It not only fully explores the relationship between the object and the image regions but also generates significant and context-based features corresponding to every region in the image. The attention employed in our work enhances the image representation capability of our method by focusing on a given image region and its related image regions. Thus addressing the relevant contextual information, spatial locations, and deep visual features leads to improve caption generation. We verified the effectiveness of the proposed model by conducting extensive experiments on three benchmark datasets: Flickr30k, MSCOCO, and nocaps. The results show the superiority of the proposed method over the existing methods both in quantitative and qualitative manners. Detailed ablation studies are conducted to communicate how each part would contribute to the final performance.

© 2022 SPIE and IS&T
Himanshu Sharma and Swati Srivastava "Graph neural network-based visual relationship and multilevel attention for image captioning," Journal of Electronic Imaging 31(5), 053022 (27 September 2022). https://doi.org/10.1117/1.JEI.31.5.053022
Received: 26 March 2022; Accepted: 12 September 2022; Published: 27 September 2022
Lens.org Logo
CITATIONS
Cited by 5 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Image enhancement

Neural networks

Data modeling

Image understanding

Visual process modeling

Performance modeling

Back to Top