Skip to main content
placeholder image

A part-based spatial and temporal aggregation method for dynamic scene recognition

Journal Article


Abstract


  • Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed to aggregate local features from video frames. A pre-trained Fast R-CNN model is used to extract local convolutional features from the regions of interest of training images. These features are clustered to locate representative parts. A set cover problem is then formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN model. Local features from a video segment are extracted at different layers of the fine-tuned Fast R-CNN model and aggregated both spatially and temporally. Extensive experimental results show that the proposed method is very competitive with state-of-the-art approaches.

Publication Date


  • 2021

Citation


  • Peng, X., Bouzerdoum, A., & Phung, S. L. (2021). A part-based spatial and temporal aggregation method for dynamic scene recognition. Neural Computing and Applications, 33(13), 7353-7370. doi:10.1007/s00521-020-05415-3

Scopus Eid


  • 2-s2.0-85092733678

Start Page


  • 7353

End Page


  • 7370

Volume


  • 33

Issue


  • 13

Abstract


  • Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed to aggregate local features from video frames. A pre-trained Fast R-CNN model is used to extract local convolutional features from the regions of interest of training images. These features are clustered to locate representative parts. A set cover problem is then formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN model. Local features from a video segment are extracted at different layers of the fine-tuned Fast R-CNN model and aggregated both spatially and temporally. Extensive experimental results show that the proposed method is very competitive with state-of-the-art approaches.

Publication Date


  • 2021

Citation


  • Peng, X., Bouzerdoum, A., & Phung, S. L. (2021). A part-based spatial and temporal aggregation method for dynamic scene recognition. Neural Computing and Applications, 33(13), 7353-7370. doi:10.1007/s00521-020-05415-3

Scopus Eid


  • 2-s2.0-85092733678

Start Page


  • 7353

End Page


  • 7370

Volume


  • 33

Issue


  • 13