Publication

Interlinking Slovene Language Datasets

Lenka Bajčetić; Thierry Declerck

In: Zoe Gavriilidou; Maria Mitsiaki; Asimakis Fliatouras (Hrsg.). Proceedings of XIX EURALEX Congress. EURALEX International Congress (EURALEX-2020), Lexicography for Inclusion, September 7-11, Pages 73-80, Vol. 1, ISBN 978-618-85138-1-5, Euralex, 11/2020.

Abstract

We present the current implementation state of our work consisting in interlinking language data and linguistic information included in different types of Slovenian language resources. The types of resources we currently deal with are a lexical database (which also contains collocations and example sentences), a morphological lexicon, and the Slovene WordNet. We first transform the encoding of the original data into the OntoLex-Lemon model and map the different descriptors used in the original sources onto the LexInfo vocabulary. This harmonization step is enabling the interlinking of the various types of information included in the different resources, by using relations defined in OntoLex-Lemon. As a result, we obtain a partial merging of the information that was originally distributed over different resources, which is leading to a cross-enrichment of those original data sources. A final goal of the presented work is to publish the linked and merged Slovene linguistic datasets in the Linguistic Linked Open Data cloud.

Projects

Pret-a-LLOD - Scalable Open Linked Data environment

EURALEX2020_ProceedingsBook-p073-080.pdf (pdf, 696 KB )