Skip to main content Skip to main navigation

Publikation

Interactive Topic Graph Extraction and Exploration of Web Content

Günter Neumann; Sven Schmeier
In: T. Poibeau; H. Saggion; J. Piskorski; R. Yangarber. Multi-source, Multilingual Information Extraction and Summarization. Pages 1-24, Theory and Applications of Natural Language Processing, ISBN ISBN 978-3-642-28568-4, Springer, 6/2012.

Zusammenfassung

In the following, we present an approach using interactive topic graph extraction for the exploration of web content. The initial information request, in the form of a query topic description, is issued online by a user to the system. The topic graph is then constructed from N web snippets that are produced by a standard search engine. We consider the extraction of a topic graph to be a specific empirical collocation extraction task, where collocations are extracted between chunks. Our measure of association strength is based on the pointwise mutual information between chunk pairs which explicitly takes their distance into account. This topic graph can then be further analyzed by users so that they can request additional background information with the help of interesting nodes and pairs of nodes in the topic graph, e.g., explicit relationships extracted from Wikipedia or those automatically extracted from additional Web content as well as conceptual information of the topic in form of semantically oriented clusters of descriptive phrases. This information is presented to the users, who can investigate the identified information nuggets to refine their information search. An initial user evaluation shows that our approach is especially helpful for finding new interesting information on topics about which the user has only a vague idea or no idea, at all.

Projekte

Weitere Links