WP5 - Genomic Data Analytics

The objective of WP5 is the definition of analysis services for genomic data, exploiting the GenData 2020 data model to provide powerful analysis tools, allowing
scientists to thoroughly analyze all the available heterogeneous information by selecting the appropriate granularity level and dimensions.

TASK 5.1 DIMENSIONAL REDUCTION
With highly complex systems, qualified by an extremely high number of observed features w.r.t. the number of sample units, the application of traditional dimensional
reduction methods poses new scientific challenges. Noise suppression can be conveniently carried out on the basis of an ad hoc knowledge of the problem, by
identifying effective model-driven strategies to drop out spurious correlations and make sure that only real correlations emerge. All considered methods find their
challenge in the huge dimension of the statistical units that characterize gene expression and its regulation, especially at genomic and multi-organism level.

TASK 5.2 SMART INDEXING FOR LARGE SCALE DATA ANALYSIS
Traditional data mining approaches typically explore data in memory, thus they can hardly cope with very large databases. To overcome the limits of current in-core
data mining algorithms, the research activity will address large scale data mining for databases whose main memory analysis is unfeasible by using current
state-of-the-art algorithms. These issues may be addressed by defining compact and persistent physical data representations stored in secondary memory. Specialized
techniques, which greedily explore reduced portions of the data, are then exploited as building blocks for different mining algorithms.

TASK 5.3 ANALYSIS SERVICES
This task focuses on integrating both OLAP and data mining technologies in the On-Line Analytical Mining (OLAM) paradigm. Devising an OLAM framework can
be highly beneficial in the genomic area. On the one hand, it will provide facilities for seamlessly mining at different levels of abstraction, by drilling, pivoting, dicing,
and slicing on a data cube and on the patterns resulting from the application of mining algorithms. On the other hand, it will give flexibility in selecting desired mining
functions and effectiveness in exploring their results. Mining algorithms will be semantic-aware, i.e., capable of exploiting background knowledge (e.g., taxonomies,
ontologies) both to produce intentional answers to complex queries and to improve the quality and usability of the retrieved knowledge.