WP1 - Genomic Data Modeling
The objective of WP1 is the design of a multi-level GenData 2020 data model, expressing the various interesting features that are embedded in the produced
biomolecular data, or in their correlated phenotypic data that can be extracted from clinical databases, or in the information inferred by applying data analysis methods
to them.
TASK 1.1 MODEL DESIGN
The GenData2020 data model will be abstract, flexible and extensible, enabling the dynamic addition of new data types and their relationships, thereby
accommodating the tumultuous evolution of DNA-related research. Physical data with support post-alignment data formats and formats produced by next generation
DNA sequencing hardware. Conceptual data will be organized through views, expressing: the specific biomolecular experiment (the genome fragment, the sequencing
technology being used, and so on), the genomic tracts (e.g., genetic or epigenetic properties of the considered fragment), and the specific pathology or research
protocol which has motivated the experiment (integrating clinical data).
TASK 1.2 QUERY LANGUAGE DESIGN
The query language will support selective data extraction and hierarchical reduction, opting for the best-suited level of observation to express micro vs. macro
properties. The language will be high-level, with effective tools to support the specification of queries by means of non-textual interaction whenever possible. At the
same time, its syntax will be completely defined to allow for the specification of complex queries.
TASK 1.3 MODEL STANDARDIZATION
The GenData 2020 model and protocol will be proposed as standards to Web standard bodies (e.g., W3C and OMG), reusing the experience that was done for the
standardization of the WebML language (www.webml.org, implemented within the tool WebRatio, www.webratio.com). Application to OMG/W3C may require
several years and therefore extend beyond project termination.
biomolecular data, or in their correlated phenotypic data that can be extracted from clinical databases, or in the information inferred by applying data analysis methods
to them.
TASK 1.1 MODEL DESIGN
The GenData2020 data model will be abstract, flexible and extensible, enabling the dynamic addition of new data types and their relationships, thereby
accommodating the tumultuous evolution of DNA-related research. Physical data with support post-alignment data formats and formats produced by next generation
DNA sequencing hardware. Conceptual data will be organized through views, expressing: the specific biomolecular experiment (the genome fragment, the sequencing
technology being used, and so on), the genomic tracts (e.g., genetic or epigenetic properties of the considered fragment), and the specific pathology or research
protocol which has motivated the experiment (integrating clinical data).
TASK 1.2 QUERY LANGUAGE DESIGN
The query language will support selective data extraction and hierarchical reduction, opting for the best-suited level of observation to express micro vs. macro
properties. The language will be high-level, with effective tools to support the specification of queries by means of non-textual interaction whenever possible. At the
same time, its syntax will be completely defined to allow for the specification of complex queries.
TASK 1.3 MODEL STANDARDIZATION
The GenData 2020 model and protocol will be proposed as standards to Web standard bodies (e.g., W3C and OMG), reusing the experience that was done for the
standardization of the WebML language (www.webml.org, implemented within the tool WebRatio, www.webratio.com). Application to OMG/W3C may require
several years and therefore extend beyond project termination.