Abstract
-
Natural language processing techniques have witnessed a notable success in many applications, such as dialogue generation, machine translation, and document summarization. Among them, the study of document summarization is an active research area, which aims to generate a concise version of the original document and preserve its most informative content. However, the existing work fails to consider the interaction among raw data (in terms of contextual similarity and difference), which results in inaccurate and even conflicting outcomes. In this paper, a novel context-aware extractive summarization algorithm is proposed by utilizing the concept of multi-view learning. Multi-view data are represented using latent topics, by parameterizing multiple Gaussian distributions with trainable hyper-parameters. Then those extracted latent topics are fused and integrated to produce the final summarization output. The experiments on one of the largest online-available cartoon data resources demonstrate the superior performance of the proposed algorithm, by achieving the state-of-the-art summarization results when compared with other contemporary approaches.