A scientific collective, led by the Earth BioGenome Project, aims to sequence and assemble the genomes of 1.8 million animal species by 2035. The initiative will produce data that is freely accessible, supported by public cloud infrastructures, and available to researchers worldwide. These reference genomes are crucial for developing effective conservation strategies and combating biodiversity decline. SIB contributes its extensive experience in data science to ensure the development of best practices, the provisioning of computational infrastructure, dissemination of results and training. 

Sequencing to better preserve biodiversity

Biodiversity is declining at an alarming rate in recent years, primarily due to human activities. Producing the reference genome for a species offers the foundation for building invaluable insights into its evolutionary history, genetic diversity, and unique adaptations. This information is crucial for developing more effective conservation strategies, such as identifying genetically relevant individuals for breeding programs, restoring genetic diversity, and understanding vulnerabilities to diseases and environmental changes.

The Earth BioGenome Project (EBP) aims to sequence, catalog and characterize the genomes of all of Earth’s eukaryotic biodiversity. To optimize the genome assembly processes and devise best practices, two genome projects – the Vertebrate Genomes Project (VGP) and the European Reference Genome Atlas (ERGA) – combined forces with the Galaxy Platform. As one of the coordinating partners, SIB’s Environmental Bioinformatics group offers its expertise in best practices development, computational infrastructure provisioning, dissemination and training. 

About the Galaxy framework

Galaxy is an open-source, web-based platform for biodata analysis that allows users to execute complex workflows on thousands of datasets and terabytes of data either via a graphical user interface or programmatically via application programming interface scripts. Major global Galaxy instances in the United States, the European Union and Australia are freely accessible to researchers worldwide and supported by public cloud infrastructures so that users are not required to install any tools or procure any infrastructure. 

A pipeline to democratize biodiversity genomics

Researchers came together to develop a digital pipeline within the Galaxy ecosystem – an open-source platform for FAIR data analysis – to generate near-complete reference genome assemblies. To streamline the assembly process and ensure quality, this bioinformatics workflow includes extensive quality control functions at every step. In the future, integration of complementary sequencing technologies will make the pipeline even more effective at generating complete and accurate reference genomes for a wide variety of species.

About the Galaxy framework

Galaxy is an open-source, web-based platform for biodata analysis that allows users to execute complex workflows on thousands of datasets and terabytes of data either via a graphical user interface or programmatically via application programming interface scripts. Major global Galaxy instances in the United States, the European Union and Australia are freely accessible to researchers worldwide and supported by public cloud infrastructures so that users are not required to install any tools or procure any infrastructure. 

In a recent publication, authors describe the approach as designed to be useful across the full spectrum of user skill levels and analysis scenarios. For this purpose, they created dedicated tutorials distributed via the Galaxy Training Network portal. These tutorials provide an in-depth overview of the assembly process and include a simplified version designed to facilitate immediate use of the workflows.

For SIB’s Environmental Bioinformatics group Director and chair of ERGA Robert Waterhouse, this collaboration “Brought together bioinformaticians with genomics experts to build state-of-the-art genome assembly and quality control workflows and make them freely accessible to researchers worldwide“.

Reference(s)

Larivière, D., Abueg, L., Brajuka, N. et al. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nat Biotechnol 42, 367–370 (2024). 

Image legend: Phylogenetic tree and assembly statistics of genomes assembled using the Galaxy assembly pipeline.

Related topics