Large-Scale Learning of Relation-Extraction Rules with Distant Supervision from the Web

Sebastian Krause, Hong Li, Hans Uszkoreit, Feiyu Xu

In: Proceedings of the 11th International Semantic Web Conference. International Semantic Web Conference (ISWC-2012) 11th November 11-15 Boston Masachusetts United States Springer 11/2012.


We present a large-scale domain-adaptive relation extraction (RE) system, which learns grammar-based RE rules from the Web by utilizing large numbers of known relation instances as seed. The system does not only detect binary but also nary relations such as events. Our goal is to discover rule sets large enough for the actual range of linguistic variation, thus solving the notorious long-tail problem of real-world applications for the Semantic Web. The system utilizes distant supervision by taking Freebase as seed and the web as learning corpus. By a novel variant of distant supervision many relations are learned in parallel, which enables a new method of rule filtering. In the experiments, 39 semantic relations are targeted with 2.8m seed instances extracted from Freebase. 3m sentences extracted from 20m web pages serve as the basis for learning an average of 40k distinctive rules for each relation. Given an efficient dependency parser, the average running time for each relation takes only 19 hours. Evaluation on the ACE '05 data and a specially annotated corpus shows high recall. A comparison with a baseline system learning from a smaller corpus shows that even with bootstrapping and with the same massive seed, the recall of Web based learning cannot be matched. Rule filtering effectively improves precision.


distantly_supervised_dare.pdf (pdf, 521 KB )

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz