Skip to main content
placeholder image

Large-scale multimodal gesture recognition using heterogeneous networks

Conference Paper


Abstract


  • © 2017 IEEE. This paper presents the method designed for the 2017 ChaLearn LAP Large-scale Gesture Recognition Challenge. The proposed method converts a video sequence into multiple body level dynamic images and hand level dynamic images as the inputs to Convolutional Neural Networks (ConvNets) respectively through bidirectional rank pooling and adopts Convolutional LSTM Networks (ConvLSTM) to learn long-term spatiotemporal features from short-term spatiotemporal features extracted using a 3D convolutional neural network (3DCNN) at body and hand level. Such a heterogeneous network system learns effectively different levels of spatiotemporal features that are complementary to each other to improve the recognition accuracy largely. The method has been evaluated on the 2017 isolated and continuous ChaLearn LAP Large-scale Gesture Recognition Challenge datasets and the results are ranked among the top performances.

UOW Authors


  •   Wang, Huogen (external author)
  •   Wang, Pichao (external author)
  •   Song, Zhanjie (external author)
  •   Li, Wanqing

Publication Date


  • 2018

Citation


  • Wang, H., Wang, P., Song, Z. & Li, W. (2018). Large-scale multimodal gesture recognition using heterogeneous networks. 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017 (pp. 3129-3137). IEEE Xplore: IEEE.

Scopus Eid


  • 2-s2.0-85046292749

Ro Metadata Url


  • http://ro.uow.edu.au/eispapers1/1374

Start Page


  • 3129

End Page


  • 3137

Place Of Publication


  • IEEE Xplore

Abstract


  • © 2017 IEEE. This paper presents the method designed for the 2017 ChaLearn LAP Large-scale Gesture Recognition Challenge. The proposed method converts a video sequence into multiple body level dynamic images and hand level dynamic images as the inputs to Convolutional Neural Networks (ConvNets) respectively through bidirectional rank pooling and adopts Convolutional LSTM Networks (ConvLSTM) to learn long-term spatiotemporal features from short-term spatiotemporal features extracted using a 3D convolutional neural network (3DCNN) at body and hand level. Such a heterogeneous network system learns effectively different levels of spatiotemporal features that are complementary to each other to improve the recognition accuracy largely. The method has been evaluated on the 2017 isolated and continuous ChaLearn LAP Large-scale Gesture Recognition Challenge datasets and the results are ranked among the top performances.

UOW Authors


  •   Wang, Huogen (external author)
  •   Wang, Pichao (external author)
  •   Song, Zhanjie (external author)
  •   Li, Wanqing

Publication Date


  • 2018

Citation


  • Wang, H., Wang, P., Song, Z. & Li, W. (2018). Large-scale multimodal gesture recognition using heterogeneous networks. 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017 (pp. 3129-3137). IEEE Xplore: IEEE.

Scopus Eid


  • 2-s2.0-85046292749

Ro Metadata Url


  • http://ro.uow.edu.au/eispapers1/1374

Start Page


  • 3129

End Page


  • 3137

Place Of Publication


  • IEEE Xplore