“If you want to answer complex biological questions, you often need to combine data that are scattered on the web”. In this in silico talk, SIB’s Tarcisio Mendes de Farias at the University of Lausanne presents an approach - and a tool – to face this challenge. Called BioQuery, the interface he developed with his colleagues enables biologists to quickly run predefined queries in natural language across multiple data sources, from a single-entry-point. Right now, the tool - described in a paper published in the journal Database – relies on data integration across several leading databases, including SIB Resources, and already provides users with new biological insights.

About the in silico talks series – The latest in bioinformatics by SIB Scientists

The in silico talks online series aims to inform bioinformaticians, life scientists and clinicians about the latest advances led by SIB Scientists on a wide range of topics in bioinformatics methods, research and resources. Stay abreast of the latest developments, get exclusive insights into recent papers, and discover how these advances might help you in your work or research, by subscribing to the in silico talks mailing list.

Complex bioinformatics databases hold enormous amounts of knowledge that can be retrieved with in-depth technical know-how. The recent study presented here enables an easy access to the wealth of complementary information contained in different resources, through editable template queries in natural language.

An example? In his talk, Tarcisio takes the example of typical research question a molecular biologist studying a certain type of brain cancer may have: “what are the human genes associated with the disease, for which orthologs exist in the rat and which are expressed in its brain?”.

The answer to this question would indeed allow her to: 1) identify all the genes involved in the disease in human – an information available in UniProtKB, 2) find out which are the “corresponding” (orthologous) genes in a model species such as the rat – an information available in OMA, and from these, 3) identify those that are specifically expressed in its brain – an information available in Bgee.

Listen to Tarcisio as he presents the approach he and his colleagues took to integrate the data from these different resources, and see the tool they developed in action, allowing researchers to ask, without in-depth technical know-how, this very same question and retrieve the answer in a short time.

The study was powered by the BioSODA project, supported by the National Research Programme “Big Data” (NRP75)

Reference(s)

Sima A C, Mendes de Farias T et al. Enabling semantic queries across federated bioinformatics databases. Database (2019).

Mendes de Farias T et al. VoIDext: Vocabulary and Patterns for Enhancing Interoperable Datasets with Virtual Links. In: On the Move to Meaningful Internet Systems: OTM 2019 Conferences. OTM 2019. Lecture Notes in Computer Science, vol 11877. Springer, Cham.