Skip to main content
placeholder image

Investigation of different skeleton features for CNN-based 3D action recognition

Conference Paper


Abstract


  • © 2017 IEEE. Deep learning techniques are being used in skeleton based action recognition tasks and outstanding performance has been reported. Compared with RNN based methods which tend to overemphasize temporal information, CNN-based approaches can jointly capture spatio-temporal information from texture color images encoded from skeleton sequences. There are several skeleton-based features that have proven effective in RNN-based and handcrafted-feature-based methods. However, it remains unknown whether they are suitable for CNN-based approaches. This paper proposes to encode five spatial skeleton features into images with different encoding methods. In addition, the performance implication of different joints used for feature extraction is studied. The proposed method achieved state-of-the-art performance on NTU RGB+D dataset for 3D human action analysis. An accuracy of 75.32% was achieved in Large Scale 3D Human Activity Analysis Challenge in Depth Videos.

Publication Date


  • 2017

Citation


  • Ding, Z., Wang, P., Ogunbona, P. & Li, W. (2017). Investigation of different skeleton features for CNN-based 3D action recognition. 2017 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2017 (pp. 617-622). United States: IEEE Computer Society.

Scopus Eid


  • 2-s2.0-85031670034

Ro Metadata Url


  • http://ro.uow.edu.au/eispapers1/939

Start Page


  • 617

End Page


  • 622

Place Of Publication


  • United States

Abstract


  • © 2017 IEEE. Deep learning techniques are being used in skeleton based action recognition tasks and outstanding performance has been reported. Compared with RNN based methods which tend to overemphasize temporal information, CNN-based approaches can jointly capture spatio-temporal information from texture color images encoded from skeleton sequences. There are several skeleton-based features that have proven effective in RNN-based and handcrafted-feature-based methods. However, it remains unknown whether they are suitable for CNN-based approaches. This paper proposes to encode five spatial skeleton features into images with different encoding methods. In addition, the performance implication of different joints used for feature extraction is studied. The proposed method achieved state-of-the-art performance on NTU RGB+D dataset for 3D human action analysis. An accuracy of 75.32% was achieved in Large Scale 3D Human Activity Analysis Challenge in Depth Videos.

Publication Date


  • 2017

Citation


  • Ding, Z., Wang, P., Ogunbona, P. & Li, W. (2017). Investigation of different skeleton features for CNN-based 3D action recognition. 2017 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2017 (pp. 617-622). United States: IEEE Computer Society.

Scopus Eid


  • 2-s2.0-85031670034

Ro Metadata Url


  • http://ro.uow.edu.au/eispapers1/939

Start Page


  • 617

End Page


  • 622

Place Of Publication


  • United States