Focus on the group's mission

The Swiss-Prot group, co-led by Alan Bridge and Paul Thomas, excels in generating machine-readable knowledge of biology from the ever-growing body of scientific publications. The team harnesses the power of deep learning methods to accelerate literature triage and data extraction, thus delivering the most accurate and comprehensive information to users in a timely manner. 

Annual key figures

  • Over 150,000 resource citations in scientific publications
  • Over 180,000 resource mentions in patents
  • Over 8 million resource users annually

Sources and user definitions: lens.org, Web of Science, PubMed, Google Analytics, Matomo

Enabling discoveries through renowned biodata resources

The expertly curated, interlinked knowledge resources developed by the group’s biocurators and software developers are internationally recognized for their fundamental importance to life-science research and innovation – and used in applications from bioremediation of contaminated soils to drug discovery and development.

  • UniProtKB/Swiss-Prot is the most widely used protein information resource in the world, and recognized as a SIB Resource, ELIXIR Core Data Resource, and Global Core Data Resource.
  • Rhea, a database of biochemical reactions, is recognized as a SIB Resource, ELIXIR Core Data Resource, and Global Core Data Resource.
  • Gene Ontology (GO) is a key source of information on gene function and recognized as Global Core Data Resource.
  • SwissLipids, a database of lipid structures and biological knowledge, is recognized as a SIB Resource.
  • The HAMAP and PROSITE are widely used databases of protein families and domains.
  • The widely used ENZYME database provides information on enzyme nomenclature.
  • ViralZone provides information on all viral genus and families.
  • SwissBioPics offers a library of interactive cell images.

The Swiss-Prot group also supports the custom development of tools and resources for researchers and clinicians.

See more about our biocuration and software development services

Supporting AI with machine-readable biological knowledge

Life science knowledgebases are also an essential part of the AI ecosystem. The collective biological knowledge they contain – in the form of molecular sequences and structures, biochemical pathways, and the relationships between these – can be used to train and fine-tune AI systems that reveal complex biological mechanisms and generate actionable insights.

Knowledgebases developed by the Swiss-Prot group have contributed to:

  • the 2024 Nobel-winning AlphaFold model for predicting protein structure, which learned to identify relationships between amino acid sequences and 3D structures by analysing hundreds of millions of proteins in UniProt (see AI focus in the SIB Profile 2025);
  • a Large Language Model (LLM) for improving mRNA vaccine delivery, which was trained using SwissLipids (source: Bioinformatics, 2024);
  • an LLM to design new proteins with desirable functions, which was trained using the expertly curated UniProtKB/Swiss-Prot part of UniProt (source: Nature Biotechnology, 2023); 

The group also developed a benchmarking dataset, EnzChemRED, to fine-tune LLMs for specialized data curation – which was recognized as a SIB Remarkable Output. The team is applying the dataset to guide biocuration efforts for the UniProt and Rhea knowledgebases.

Members

View our group members here