WP2 - Publishing, Crawling, Searching Genomic Data
The objective of WP2 is to support the publishing, crawling, and searching of genomic data. A paradigm shift in genomic data search will empower scientists and give
them a boost of productivity, similar to the revolution that has characterized the beginning of this millennium, under the boost of search technology.
TASK 2.1 QUERY-ENABLED GENOME BROWSERS
New-generation genome browser will be visual tool for asking queries about the genome. The scientists will “draw” or select the peaks and patterns that they would
like to see, e.g., by analogy with genome areas that they are observing, and the browser will point to other genome regions with such peaks and patterns, with a “query
by example” explorative analysis. Sophisticated correlations and clustering queries, involving multiple data tracks and variables that represent distant genome
locations, will also be expressed graphically.
TASK 2.2 PUBLISHING GENOME DATA
New-generation browsers will require an underlying, efficient data exchange protocol (an extension or adaptation of HTTP). As in today's Internet, the protocol will
be based on providing to each genome a unique identifier and will support queries extracting genome data (or portions of it) for loading in the GenData 2020 model
and immediate viewing through the browser.
TASK 2.3 GENOME CRAWLING AND INDEXING FOR SUPPORTING SEARCH
Once the network of genomes will become established, genomic search queries will be possible, enabling scientists to locate the genomes with given features, hence
replicating lab's queries upon the world of interconnected genomes. Genome crawlers will gather information, and genome search systems will index them according
to query patterns. This research will consist of identifying the most relevant queries for genome crawling, based upon the classical interactions that are performed by
biologists. Queries will be classified relative to the type of data that they address (e.g., their scope upon the logical views of the data model), their complexity, their
popularity.
them a boost of productivity, similar to the revolution that has characterized the beginning of this millennium, under the boost of search technology.
TASK 2.1 QUERY-ENABLED GENOME BROWSERS
New-generation genome browser will be visual tool for asking queries about the genome. The scientists will “draw” or select the peaks and patterns that they would
like to see, e.g., by analogy with genome areas that they are observing, and the browser will point to other genome regions with such peaks and patterns, with a “query
by example” explorative analysis. Sophisticated correlations and clustering queries, involving multiple data tracks and variables that represent distant genome
locations, will also be expressed graphically.
TASK 2.2 PUBLISHING GENOME DATA
New-generation browsers will require an underlying, efficient data exchange protocol (an extension or adaptation of HTTP). As in today's Internet, the protocol will
be based on providing to each genome a unique identifier and will support queries extracting genome data (or portions of it) for loading in the GenData 2020 model
and immediate viewing through the browser.
TASK 2.3 GENOME CRAWLING AND INDEXING FOR SUPPORTING SEARCH
Once the network of genomes will become established, genomic search queries will be possible, enabling scientists to locate the genomes with given features, hence
replicating lab's queries upon the world of interconnected genomes. Genome crawlers will gather information, and genome search systems will index them according
to query patterns. This research will consist of identifying the most relevant queries for genome crawling, based upon the classical interactions that are performed by
biologists. Queries will be classified relative to the type of data that they address (e.g., their scope upon the logical views of the data model), their complexity, their
popularity.