The results of a data analysis are as realiable as the source where they come from.
One of the most important tasks to be performed when dealing with genomic data is the identification of the adequate data sources. But how can this be done in a context where new databases are being constantly created? Furthermore, how can we decide among over thousand repositories which ones are realiable when there is so much variability in the quality of the information they store?
The SILE project faces this challenge from two different angles:
This approach helps us to ensure that no repository that might be relevant is missed in our workflow.
To ensure the highest coverage, our team performs a systematic analysis and classification of the biological resources available at:
Separating the wheat from the chaff
We live in an age where data acquisition is no longer a problem and the real challenge is how to determine which information is the right one to take important and sometimes difficult decisions. Genomic data sources store information that sometimes is conflicting, redundant and difficult to interconnect. Under such complex scenario, researchers must face the uncertainty as to whether they are using the adequate data sources, combined with the effort that requires to determine the validity of the information they store.
SILE provides a solution to these problems in two different ways:
The information managed with SILE is focused on the purpose, insights and resulting outcomes what makes it valuable and cross functional for daily work.
Only the maximum value of the data is obtained when the storage infrastructure works in harmony to exploit the available data, at any scale.
Genomic data storage is not just about loading information into a database. There are different technologies to assure the persistency of the information but at the end, the selection of the adequate one depends on the volume of data to be managed, their structure and what is going to be done once the data is loaded into the database.
In our team, we prepare the information for its further exploitation and select the most adequate storage infrastructure depending on the data analysis requirements:
Data alone is power, but well-presented data is wisdom
Data by itself is meaningless in a clinical contex. In order to get some insights from the genomic data, they must be appropriately analyzed and visualized. But which kind of analysis and visualization may be the most useful for each situation? And in what combinations is that data most enlightening?
SILE provides the mechanisms to analyze and represent the information in a way were the discovery of new insights stops being a challenge.