WP4 - Bio-Ontology Management

WP4 is focused on Bio-Ontology management, with emphasis on integrating and enriching experimental data, as collected in the common data model, with
ontological knowledge, extracted from ontological sources or produced by experiments, so as to make available to biologists a large body of cumulated science.

TASK 4.1 EXTRACTING AND MAINTAINING GENDATA 2020 ONTOLOGIES
This task is concerned with techniques and methods for building and maintaining ontological knowledge. Whenever in a certain use case or a certain domain the need
arises of using ontological knowledge, the corresponding knowledge body will be represented in the form of an ontology, acting as standardized reference models to
support knowledge sharing, integration and exploitation in the context of such domain. Methods will semi-automatically “bootstrap” the required ontology with
suitable fragments of existing ontologies, such as the Open Biological and Biomedical Ontologies collection.

TASK 4.2 MAPPING DATA TO THE GENDATA 2020 ONTOLOGIES
GenData 2020 adheres to the ontology-based data access (OBDA) paradigm, where the ontology is used as a sophisticated, semantic mediator for accessing the data
stored in a set of heterogeneous sources, and the relationships between data and concepts in the ontology is represented in terms of a set of mappings. Moreover, new
data sources may be added, and this would again require extension and adjustment of the mappings. The management of large, evolving sets of mappings is an
engineering problem of the same complexity as ontology management.

TASK 4.3 ONTOLOGY-BASED QUERYING
The ontology management system should be able to answer queries posed in term of the ontology concepts and relationships. The query language is specifically
designed for an optimal compromise between expressive power and possibility of reasoning about large extensional levels. An important feature is that genomic data
is ordered, e.g., by position in the sequence, by physical proximity in chromosomal loops, by recency of the sequence, by precision of the DNA sequencer, by data
provenance, etc. Moreover, answers are also often required to come in an ordered fashion: prototypical queries ask for highly correlated similarities of DNA
sub-sequences and phenotypes, considering also recency and sequencer's precision.