Can technologies like ChatGPT support life science researchers in exploring data they are not familiar with? This is the question our new Knowledge Representation unit investigated, through concrete examples from SIB’s leading open databases and software tools. They show the potential of conversational AI to describe datasets, as well as to generate and explain complex queries across datasets, i.e., federated knowledge graphs. Find out how these technologies can assist life science researchers in benefiting from the wealth of open data, how they can help make these data FAIR (Findable, Accessible, Interoperable, and Reusable), and reasons why caution should still be exercised in the process.

What are knowledge graphs

A knowledge graph is a type of graph database that stores information about entities (e.g. proteins, genes, organs) and their relationships to one another (e.g. ‘is expressed in’, ‘codes for’). Entities are represented by nodes, and relationships between entities by edges. Knowledge graphs allow users to better understand complex data. They make it possible to link various interoperable databases, for example, by performing federated queries across them to reveal new biological insights.

Democratizing access to knowledge representation

“Knowledge graphs are a simple, yet powerful way to organize and connect information in an intuitive manner,” says Ana Claudia Sima, who leads the new Knowledge Representation unit with Tarcisio Mendes de Farias, as part of the SIB’s Vital-IT Group. “In recent years, they have seen increasing adoption across academia and industry, with a wide range of applications including in search engines, improved diagnostics or drug repurposing,” she explains. However, retrieving information from knowledge graphs is still beyond the expertise of most users, since it requires familiarity with technical query languages. Together the team co-authored a paper where they reflect on the role of artificial intelligence (AI) chatbots, such as ChatGPT, in facilitating data access to complex knowledge graphs.

Conversational AI to bring data closer to the users

Using some of SIB’s leading Open Science databases and software (Bgee, OMA and UniProt), the team shows how an AI chatbot can be used to accelerate FAIRification of datasets, leveraging both existing public documentation and expert input. For instance, by accurately summarizing datasets in a high-level description, understandable for end users, it contributes towards data Findability. And by generating federated queries across public knowledge graphs based on natural language questions provided by users (e.g. “Provide me with the list of human genes associated with cancer and their orthologs expressed in the rat brain“), it facilitates Accessibility and Reuse. The team also discusses the limits of current Conversational AI technologies and caution to be exercised in using them.  
The preliminary overview provided in the preprint has been accepted at a workshop on Semantic Web solutions for analysing biomedical data, be further expanded to more use cases, and discussed in the bioinformatics community in the coming months.

 Explore the growing catalogue of interoperable bioinformatics knowledge graphs at SIB   

 Find out more about SIB’s service offering in FAIRification

Reference(s)

Sima A.C. and de Farias T.M., On the potential of artificial intelligence chatbots for data exploration of federated bioinformatics knowledge graphs, SeWebMeDa’23: 6th workshop on semantic web solutions for large-scale biomedical data analytics. Preprint available on arXiv.