Skip to main content
placeholder image

Self-attention guided deep features for action recognition

Conference Paper


Abstract


  • Skeleton based human action recognition is an important task in computer vision. However, it is very challenging due to the complex spatio-temporal variations of skeleton joints. In this work, we propose an end-to-end trainable network consisting of a Deep Convolutional Model (DCM) and a Self-Attention Model (SAM) for human action recognition from skeleton data. Specifically, skeleton sequences are encoded into color images and fed into DCM to extract deep features. In the SAM, handcrafted features representing the motion degree of joints are extracted and the attention weights are learned by a simple yet effective linear mapping. The effectiveness of proposed method has been verified on NTU RGB+D, SYSU-3D and UTD-MHAD datasets and achieved state-of-the-art results.

UOW Authors


  •   Xiao, Renyi (external author)
  •   Hou, Yonghong (external author)
  •   Guo, Zihui (external author)
  •   Li, Chuankun (external author)
  •   Wang, Pichao (external author)
  •   Li, Wanqing

Publication Date


  • 2019

Citation


  • Xiao, R., Hou, Y., Guo, Z., Li, C., Wang, P. & Li, W. (2019). Self-attention guided deep features for action recognition. IEEE International Conference on Multimedia and Expo (ICME) 2019 (pp. 1060-1065). United States: IEEE.

Scopus Eid


  • 2-s2.0-85070966958

Start Page


  • 1060

End Page


  • 1065

Place Of Publication


  • United States

Abstract


  • Skeleton based human action recognition is an important task in computer vision. However, it is very challenging due to the complex spatio-temporal variations of skeleton joints. In this work, we propose an end-to-end trainable network consisting of a Deep Convolutional Model (DCM) and a Self-Attention Model (SAM) for human action recognition from skeleton data. Specifically, skeleton sequences are encoded into color images and fed into DCM to extract deep features. In the SAM, handcrafted features representing the motion degree of joints are extracted and the attention weights are learned by a simple yet effective linear mapping. The effectiveness of proposed method has been verified on NTU RGB+D, SYSU-3D and UTD-MHAD datasets and achieved state-of-the-art results.

UOW Authors


  •   Xiao, Renyi (external author)
  •   Hou, Yonghong (external author)
  •   Guo, Zihui (external author)
  •   Li, Chuankun (external author)
  •   Wang, Pichao (external author)
  •   Li, Wanqing

Publication Date


  • 2019

Citation


  • Xiao, R., Hou, Y., Guo, Z., Li, C., Wang, P. & Li, W. (2019). Self-attention guided deep features for action recognition. IEEE International Conference on Multimedia and Expo (ICME) 2019 (pp. 1060-1065). United States: IEEE.

Scopus Eid


  • 2-s2.0-85070966958

Start Page


  • 1060

End Page


  • 1065

Place Of Publication


  • United States