Tagungsbeiträge
Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis
Autor | Leonhard Hennig | |
Quelle | Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2009) | |
Links | Download | BibTeX | |
We consider the problem of query-focused multi-document summarization, where a summary containing the information most relevant to a user's information need is produced from a set of topic-related documents. We propose a new method based on probabilistic latent semantic analysis, which allows us to represent sentences and queries as probability distributions over latent topics. Our approach combines query-focused and thematic features computed in the latent topic space to estimate the summary-relevance of sentences. In addition, we evaluate several different similarity measures for computing sentence-level feature scores. Experimental results show that our approach outperforms the best reported results on DUC 2006 data, and also compares well on DUC 2007 data. |