Skip to main content
placeholder image

Sign Language Translation with Hierarchical Spatio-Temporal Graph Neural Network

Conference Paper


Abstract


  • Sign language translation (SLT), which generates text in a spoken language from visual content in a sign language, is important to assist the hard-of-hearing community for their communications. Inspired by neural machine translation (NMT), most existing SLT studies adopted a general sequence to sequence learning strategy. However, SLT is significantly different from general NMT tasks since sign languages convey messages through multiple visual-manual aspects. Therefore, in this paper, these unique characteristics of sign languages are formulated as hierarchical spatio-temporal graph representations, including high-level and fine-level graphs of which a vertex characterizes a specified body part and an edge represents their interactions. Particularly, high-level graphs represent the patterns in the regions such as hands and face, and fine-level graphs consider the joints of hands and landmarks of facial regions. To learn these graph patterns, a novel deep learning architecture, namely hierarchical spatio-temporal graph neural network (HST-GNN), is proposed. Graph convolutions and graph self-attentions with neighborhood context are proposed to characterize both the local and the global graph properties. Experimental results on benchmark datasets demonstrated the effectiveness of the proposed method.

Publication Date


  • 2022

Citation


  • Kan, J., Hu, K., Hagenbuchner, M., Tsoi, A. C., Bennamoun, M., & Wang, Z. (2022). Sign Language Translation with Hierarchical Spatio-Temporal Graph Neural Network. In Proceedings - 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022 (pp. 2131-2140). doi:10.1109/WACV51458.2022.00219

Scopus Eid


  • 2-s2.0-85126096691

Start Page


  • 2131

End Page


  • 2140

Abstract


  • Sign language translation (SLT), which generates text in a spoken language from visual content in a sign language, is important to assist the hard-of-hearing community for their communications. Inspired by neural machine translation (NMT), most existing SLT studies adopted a general sequence to sequence learning strategy. However, SLT is significantly different from general NMT tasks since sign languages convey messages through multiple visual-manual aspects. Therefore, in this paper, these unique characteristics of sign languages are formulated as hierarchical spatio-temporal graph representations, including high-level and fine-level graphs of which a vertex characterizes a specified body part and an edge represents their interactions. Particularly, high-level graphs represent the patterns in the regions such as hands and face, and fine-level graphs consider the joints of hands and landmarks of facial regions. To learn these graph patterns, a novel deep learning architecture, namely hierarchical spatio-temporal graph neural network (HST-GNN), is proposed. Graph convolutions and graph self-attentions with neighborhood context are proposed to characterize both the local and the global graph properties. Experimental results on benchmark datasets demonstrated the effectiveness of the proposed method.

Publication Date


  • 2022

Citation


  • Kan, J., Hu, K., Hagenbuchner, M., Tsoi, A. C., Bennamoun, M., & Wang, Z. (2022). Sign Language Translation with Hierarchical Spatio-Temporal Graph Neural Network. In Proceedings - 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022 (pp. 2131-2140). doi:10.1109/WACV51458.2022.00219

Scopus Eid


  • 2-s2.0-85126096691

Start Page


  • 2131

End Page


  • 2140