The SIB Swiss Institute of Bioinformatics was recently invited at the AI-Bioscience Collaborative Summit in Washington DC for its leading role in high quality data provision. SIB’s role as co-lead organization of UniProt—the world’s leading open-access, high-quality database of protein sequence and functional information—was highlighted throughout the event. The Summit underscored the critical role curated databases play in advancing AI models, with UniProt cited as an essential resource that enabled the groundbreaking work recognized by the recent Nobel Prize in Chemistry in the domain of structural biology.
UniProt is a consortium joining forces of teams at the SIB Swiss Institute of Bioinformatics, the European Bioinformatics Institute (EMBL-EBI), and the Protein Information Resource (PIR) in the US*. Across these institutes, over 100 experts are involved in tasks ranging from curating reliable protein data to software development and user support. This resource is one of the most widely used knowledge bases in life sciences, with more than 5.5 million users and 2'000 citations annually. Thanks to its high-quality datasets, UniProt supports AI models and accelerates discoveries and applications across diverse fields—from medicine to environmental preservation.
* order by number of staff involved
More on SIB Resource's impact
Themes of convergence, open data, and collaboration
A central focus of the Summit was the convergence of AI and biotechnology, with a strong emphasis on large, open datasets—an area directly relevant to SIB’s mission.
UniProt is a consortium joining forces of teams at the SIB Swiss Institute of Bioinformatics, the European Bioinformatics Institute (EMBL-EBI), and the Protein Information Resource (PIR) in the US*. Across these institutes, over 100 experts are involved in tasks ranging from curating reliable protein data to software development and user support. This resource is one of the most widely used knowledge bases in life sciences, with more than 5.5 million users and 2'000 citations annually. Thanks to its high-quality datasets, UniProt supports AI models and accelerates discoveries and applications across diverse fields—from medicine to environmental preservation.
* order by number of staff involved
More on SIB Resource's impact
The discussion on the importance of collaboration across disciplines and countries to develop AI echoed the positioning of our institute. SIB is bridging all fields of life sciences with resources, services and training to enable the integration of a large diversity of datasets and the application of discoveries in a wide variety of domains.
High-quality reference datasets as catalysts for Nobel-winning discoveries
A recurring theme was the urgent need for comprehensive datasets to advance AI models and the potential risks of proprietary data siloing, which stifles innovation. The recent Nobel Prize in Chemistry (AlphaFold and RosettaFold) exemplifies AI’s potential in biosciences and biotechnology. High-quality reference datasets like UniProt were highlighted as vital enablers of these achievements.
Funding challenges for data resources at the forefront
SIB has long advocated for sustainable funding models to support essential research resources. At the Summit, participants underscored the critical importance of this issue. Despite their proven value in AI-driven bioscience advancements and their long-standing role as accelerators of research, many data resources face funding shortages and lack incentives for long-term maintenance.
The need for benchmarking and evaluation
Benchmarking and evaluating AI models—fields where SIB is also active and recognized—are essential for ensuring effectiveness and building user trust. Evaluation is key both for validating model performance and increasing acceptance among potential users. Data resource challenges, such as data heterogeneity, machine readability limitations, and complex metadata needs, present significant barriers to dataset interoperability. The lack of standardized evaluation methods for machine learning models adds complexity to AI integration efforts. CASP and CAMEO (developed at SIB) were cited multiple times as instrumental in advancing AI reliability and performance in structural bioinformatics.
A call for standardized data sharing and security frameworks
The AI-Bioscience Summit was an inspiring platform to reinforce the call for standardized data sharing and security frameworks, essential for transparency and cross-border collaboration. SIB is confident that AI will play a crucial role in solving complex biological questions. This Summit was an invaluable opportunity to share perspectives on advancing towards this goal and to emphasize the indispensable role of curated data in AI.
Global participation and high-level support
The Summit was co-organized by the US Department of State, Microsoft, the US National Academies, the US National Science Foundation, and the National Institute of Standards and Technology (NIST). Attendees included representatives from academic institutions, industry, and science offices from Europe, the United Kingdom, France, Germany, Brazil, India, Japan, Korea, Israel, and South Africa. SIB was invited for its world-leading role in data resource provision. The Summit’s importance was underscored by an address from US Secretary of State Antony Blinken on Day 2.