Knowledge graphs (KGs) have recently gained attention due to their flexible data model, which reduces the effort needed for integration across different, possibly heterogeneous, data sources. In this tutorial, we learn how to access scientific data stored in a relational database through the virtual knowledge graph (VKG) approach. In such an approach, the data are exposed as a KG and enriched with semantic information coming from a domain ontology. The KG is “virtual” in the sense that the data are not replicated but stay within the data sources and are accessed at query time.
We demonstrate the approach over scientific data coming from the biomedical domain and using the open-source VKG system Ontop. Since legacy data are exposed as a KG, users can access the data by means of a more convenient vocabulary provided by the domain ontology, benefit from automated reasoning capabilities, and do not need to focus on how the data are actually stored. Furthermore, the virtual approach allows for the use of KGs even in those contexts where the user does not own the data nor is granted the rights to make a copy of them.
By relying on existing federation tools, the approach described here for accessing scientific data can also be used to integrate multiple, heterogeneous, and possibly semi-structured and unstructured data sources.
Summary: In this tutorial, we learn how to set up and exploit the virtual knowledge graph (VKG) approach to access data stored in relational legacy systems and to enrich such data with domain knowledge coming from different heterogeneous (biomedical) resources. The VKG approach is based on an ontology that describes a domain of interest in terms of a vocabulary familiar to the user and exposes a high-level conceptual view of the data. Users can access the data by exploiting the conceptual view, and in this way they do not need to be aware of low-level storage details. They can easily integrate ontologies coming from different sources and can obtain richer answers thanks to the interaction between data and domain knowledge.
Elsevier, 2021. Vol. 2, nr 10, artikel-id 100346