Skip to main content
placeholder image

Topic-Guided Local-Global Graph Neural Network for Image Captioning

Conference Paper


Abstract


  • Image captioning is to generate textual descriptions for a given image by analyzing its visual semantics. It can be applied for numerous applications such as surveillance, where generating descriptions of images enables a more efficient workflow. However, accurate descriptions require to formulate the interactions among visual objects and semantics, which have not been adequately exploited yet. Therefore, a novel architecture is proposed, namely topic-guided local-global graph neural network, to address the interactions in a two-level scheme. Local information is characterized through visual objects and semantic graphs are introduced to formulate their relations. Global information is characterized with a topic graph to analyze captioning context and guides the semantic graphs for captioning. Particularly, graph convolutions and graph transformers with a connection between the adjacency matrices are explored. Experimental results on MS-COCO dataset demonstrate the effectiveness of our proposed method.

Publication Date


  • 2021

Citation


  • Kan, J., Hu, K., Wang, Z., Wu, Q., Hagenbuchner, M., & Tsoi, A. C. (2021). Topic-Guided Local-Global Graph Neural Network for Image Captioning. In 2021 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2021. doi:10.1109/ICMEW53276.2021.9455991

Scopus Eid


  • 2-s2.0-85130724708

Web Of Science Accession Number


Abstract


  • Image captioning is to generate textual descriptions for a given image by analyzing its visual semantics. It can be applied for numerous applications such as surveillance, where generating descriptions of images enables a more efficient workflow. However, accurate descriptions require to formulate the interactions among visual objects and semantics, which have not been adequately exploited yet. Therefore, a novel architecture is proposed, namely topic-guided local-global graph neural network, to address the interactions in a two-level scheme. Local information is characterized through visual objects and semantic graphs are introduced to formulate their relations. Global information is characterized with a topic graph to analyze captioning context and guides the semantic graphs for captioning. Particularly, graph convolutions and graph transformers with a connection between the adjacency matrices are explored. Experimental results on MS-COCO dataset demonstrate the effectiveness of our proposed method.

Publication Date


  • 2021

Citation


  • Kan, J., Hu, K., Wang, Z., Wu, Q., Hagenbuchner, M., & Tsoi, A. C. (2021). Topic-Guided Local-Global Graph Neural Network for Image Captioning. In 2021 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2021. doi:10.1109/ICMEW53276.2021.9455991

Scopus Eid


  • 2-s2.0-85130724708

Web Of Science Accession Number