A team led by Joana Pereira and Torsten Schwede at the Biozentrum of the University of Basel and SIB have uncovered a treasure trove of uncharacterized proteins. Embracing the recent deep learning revolution and benefiting from a dedicated kickstarter grant from SIB, they discovered hundreds of new protein families, including a novel predicted protein fold. The study has now been published in the journal Nature, together with another study from the SIB Group of Pedro Beltrao at ETH on a related topic.
In the past years, AlphaFold has revolutionized protein science. This artificial intelligence (AI) tool was trained on protein data collected by life scientists for over 50 years and is able to predict the 3D shape of proteins with high accuracy. Its success prompted the modelling of an astounding 215 million proteins last year, providing insights into the shapes of almost any protein. This is particularly interesting for proteins that have not been studied experimentally, a complex and time-consuming process.
“There are now many sources of protein information, enclosing valuable insights into how proteins evolve and work,” says Joana Pereira, the leader of the study. Nevertheless, research has long been faced with a data jungle. The research team led by Torsten Schwede, Group Leader at the Biozentrum, University of Basel, has now succeeded in decrypting some of the concealed information.
A bird’s eye view reveals new protein families and folds
The researchers constructed an interactive network of 53 million proteins with high-quality AlphaFold structures. "This network serves as a valuable source for theoretically predicting unknown protein families and their functions on a large scale," underlines Janani Durairaj, the first author. The team was able to identify 290 new protein families and one new protein fold that resembles the shape of a flower.
Building on the expertise of the Schwede group in developing and maintaining the leading software SWISS-MODEL, supported by SIB, they made the network available as an interactive web resource, termed the “Protein Universe Atlas”.
One of two breakthrough papers by SIB Groups to explore the protein dark matter
The paper described here is one of two led by groups at SIB to be published on the topic. Using a complementary approach, based on protein structure rather than sequences, the team of Pedro Beltrao at ETH Zurich and the team of Martin Steinegger and Seoul National University, proposes an efficient method to cluster the proteins present in the AlphaFold database. Their key outcome was finding out how many different types of protein shapes there are, how many of these are understudied and to figure out when in evolution they may have arisen. Both papers were highlighted in an interview by Nature.
Artificial intelligence as a valuable tool in research
The team has employed deep learning-based tools for finding novelties in this network, paving the way to innovations in life sciences, from basic to applied research. “Understanding the structure and function of proteins is typically one of the first steps to develop a new drug, or modify their functions by protein engineering, for example”, says Pereira. The work was supported by a ‘kickstarter’ grant from SIB to encourage the adoption of AI in life science resources. It underscores the transformative potential of deep learning and intelligent algorithms in research.
With the Protein Universe Atlas, scientists can now learn more about proteins relevant to their research. “We hope this resource will help not only researchers and biocurators but also students and teachers by providing a new platform for learning about protein diversity, from structure, to function, to evolution”, says Durairaj.
The interactive network is freely accessible worldwide here.
Reference(s)
Durairaj J et al. Uncovering new families and folds in the natural protein universe. Nature. Published online 13 September 2023.
Barrio-Hernandez I et al. Clustering-predicted structures at the scale of the known protein universe. Nature. Published online 13 September 2023.