Skip to main content
placeholder image

Human interaction recognition using low-rank matrix approximation and super descriptor tensor decomposition

Conference Paper


Abstract


  • Audio-visual recognition systems rely on efficient feature extraction. Many spatio-temporal interest point detectors for visual feature extraction are either too sparse, leading to loss of information, or too dense resulting in noisy and redundant information. Furthermore, interest point detectors designed for a controlled environment can be affected by camera motion. In this paper, a salient spatio-temporal interest point detector is proposed based on a low-rank and group-sparse matrix approximation. The detector handles the camera motion through a short-window video stabilization. The multimodal audio-visual features from multiple descriptors are represented by a super descriptor, from which a compact set of features is extracted through a tensor decomposition and feature selection. This tensor decomposition retains the spatiotemporal structure among features obtained from multiple descriptors. Experimental validation is conducted using two benchmark human interaction recognition datasets: TVHID and Parliament. Experimental results are presented which show that the proposed approach outperforms many state-ofthe- art methods, achieving classification rates of 74.7% and 88.5% on the TVHID and Parliament datasets, respectively.

Publication Date


  • 2017

Citation


  • M. Khokher, A. Bouzerdoum & S. Phung, "Human interaction recognition using low-rank matrix approximation and super descriptor tensor decomposition," in 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2017, pp. 1847-1851.

Scopus Eid


  • 2-s2.0-85023752750

Ro Metadata Url


  • http://ro.uow.edu.au/eispapers1/476

Start Page


  • 1847

End Page


  • 1851

Place Of Publication


  • New York, United States

Abstract


  • Audio-visual recognition systems rely on efficient feature extraction. Many spatio-temporal interest point detectors for visual feature extraction are either too sparse, leading to loss of information, or too dense resulting in noisy and redundant information. Furthermore, interest point detectors designed for a controlled environment can be affected by camera motion. In this paper, a salient spatio-temporal interest point detector is proposed based on a low-rank and group-sparse matrix approximation. The detector handles the camera motion through a short-window video stabilization. The multimodal audio-visual features from multiple descriptors are represented by a super descriptor, from which a compact set of features is extracted through a tensor decomposition and feature selection. This tensor decomposition retains the spatiotemporal structure among features obtained from multiple descriptors. Experimental validation is conducted using two benchmark human interaction recognition datasets: TVHID and Parliament. Experimental results are presented which show that the proposed approach outperforms many state-ofthe- art methods, achieving classification rates of 74.7% and 88.5% on the TVHID and Parliament datasets, respectively.

Publication Date


  • 2017

Citation


  • M. Khokher, A. Bouzerdoum & S. Phung, "Human interaction recognition using low-rank matrix approximation and super descriptor tensor decomposition," in 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2017, pp. 1847-1851.

Scopus Eid


  • 2-s2.0-85023752750

Ro Metadata Url


  • http://ro.uow.edu.au/eispapers1/476

Start Page


  • 1847

End Page


  • 1851

Place Of Publication


  • New York, United States