Skip to main content
placeholder image

Learning attentive dynamic maps (ADMs) for Understanding Human Actions

Journal Article


Abstract


  • This paper presents a novel end-to-end trainable deep architecture to learn an attentive dynamic map (ADM) for understanding human motion from skeleton data. An ADM intends not only to capture the dynamic information over the period of human motion, referred to as an action, as the conventional dynamic image/map does, but also to embed in it the spatio-temporal attention for the classification of the action. Specifically, skeleton sequences are encoded into sequences of Skeleton Joint Maps (STMs), each STM encodes both joint location (i.e. spatial) and relative temporal order (i.e. temporal) of the skeleton in the sequence. The STM sequences are fed into a customized 3DConvLSTM to explore the local and global spatio-temporal information from which a dynamic map is learned. This dynamic map is subsequently used to learn the spatio-temporal attention at each time-stamp. ADMs are then generated from the learned attention weights and all hidden states of the 3DConvLSTM and used for action classification. The proposed method achieved competitive performance compared with the state-of-the-art results on the Large Scale Combined dataset, MSRC-12 dataset and NTU RGB+D dataset.

UOW Authors


  •   Li, Chuankun (external author)
  •   Hou, Yonghong (external author)
  •   Li, Wanqing
  •   Wang, Pichao (external author)

Publication Date


  • 2019

Citation


  • Li, C., Hou, Y., Li, W. & Wang, P. (2019). Learning attentive dynamic maps (ADMs) for Understanding Human Actions. Journal of Visual Communication and Image Representation, 65 102640-1-102640-10.

Scopus Eid


  • 2-s2.0-85073620107

Ro Metadata Url


  • http://ro.uow.edu.au/eispapers1/3304

Start Page


  • 102640-1

End Page


  • 102640-10

Volume


  • 65

Place Of Publication


  • United States

Abstract


  • This paper presents a novel end-to-end trainable deep architecture to learn an attentive dynamic map (ADM) for understanding human motion from skeleton data. An ADM intends not only to capture the dynamic information over the period of human motion, referred to as an action, as the conventional dynamic image/map does, but also to embed in it the spatio-temporal attention for the classification of the action. Specifically, skeleton sequences are encoded into sequences of Skeleton Joint Maps (STMs), each STM encodes both joint location (i.e. spatial) and relative temporal order (i.e. temporal) of the skeleton in the sequence. The STM sequences are fed into a customized 3DConvLSTM to explore the local and global spatio-temporal information from which a dynamic map is learned. This dynamic map is subsequently used to learn the spatio-temporal attention at each time-stamp. ADMs are then generated from the learned attention weights and all hidden states of the 3DConvLSTM and used for action classification. The proposed method achieved competitive performance compared with the state-of-the-art results on the Large Scale Combined dataset, MSRC-12 dataset and NTU RGB+D dataset.

UOW Authors


  •   Li, Chuankun (external author)
  •   Hou, Yonghong (external author)
  •   Li, Wanqing
  •   Wang, Pichao (external author)

Publication Date


  • 2019

Citation


  • Li, C., Hou, Y., Li, W. & Wang, P. (2019). Learning attentive dynamic maps (ADMs) for Understanding Human Actions. Journal of Visual Communication and Image Representation, 65 102640-1-102640-10.

Scopus Eid


  • 2-s2.0-85073620107

Ro Metadata Url


  • http://ro.uow.edu.au/eispapers1/3304

Start Page


  • 102640-1

End Page


  • 102640-10

Volume


  • 65

Place Of Publication


  • United States