Skip to main content
placeholder image

Part-Based Feature Aggregation Method for Dynamic Scene Recognition

Conference Paper


Abstract


  • Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed for aggregating local features from multiple video frames. A pre-trained Fast R-CNN model is used to extract local convolutional layer features from the regions of interest (ROIs) of training images. These features are then clustered to locate representative parts. A set cover problem is formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN. Local convolutional layer features and fully-connected layer features are extracted using the fine-tuned Fast R-CNN model, and then aggregated separately from a video segment to form two feature representations. They are concatenated into a global feature representation. Experimental results show that the proposed method outperforms several state-of-the-art features on two dynamic scene datasets.

Publication Date


  • 2019

Citation


  • Peng, X., & Bouzerdoum, A. (2019). Part-Based Feature Aggregation Method for Dynamic Scene Recognition. In 2019 Digital Image Computing: Techniques and Applications, DICTA 2019. doi:10.1109/DICTA47822.2019.8946036

Scopus Eid


  • 2-s2.0-85078699046

Web Of Science Accession Number


Volume


Issue


Place Of Publication


Abstract


  • Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed for aggregating local features from multiple video frames. A pre-trained Fast R-CNN model is used to extract local convolutional layer features from the regions of interest (ROIs) of training images. These features are then clustered to locate representative parts. A set cover problem is formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN. Local convolutional layer features and fully-connected layer features are extracted using the fine-tuned Fast R-CNN model, and then aggregated separately from a video segment to form two feature representations. They are concatenated into a global feature representation. Experimental results show that the proposed method outperforms several state-of-the-art features on two dynamic scene datasets.

Publication Date


  • 2019

Citation


  • Peng, X., & Bouzerdoum, A. (2019). Part-Based Feature Aggregation Method for Dynamic Scene Recognition. In 2019 Digital Image Computing: Techniques and Applications, DICTA 2019. doi:10.1109/DICTA47822.2019.8946036

Scopus Eid


  • 2-s2.0-85078699046

Web Of Science Accession Number


Volume


Issue


Place Of Publication