What we do

In the Text Mining Group, we carry out activities in text analytics applied to health and life sciences. Previously hosted by the University Hospitals of Geneva, our group moved to the University of Applied Sciences Geneva (HES-SO- HEG Geneva) in 2008, where we develop research to support a wide range of biomedical professionals in fields such as oncology (e.g. decision-support instruments to interpret somatic variants), radiology (e.g. automatic classification of disease progression) or toxicology/drug development (e.g. detection of adverse effects). We are used to work with public contents (e.g. social media, biomedical and biodiversity literature, chemical patents), as well as with more sensitive data (e.g. Electronic Health Records).

At SIB, the team is also maintaining a portfolio of curation-support tools. We thus maintain SIBiLS, a mirror of MEDLINE and PMC, which also includes additional contents (e.g. ClinicalTrials.gov, CORD-19 pre-prints, ...) and a set of more specialized custom search tools such as Variomes or CovidTriage.

Find out more about the Group’s activities

Highlights 2021

In 2021, and in response to the SARS-Cov-2 pandemic we developed, in collaboration with Helen Parkinson's team at the EBI the CovidTriage custom search engine, which is powered with the COVoc ontology.
We also updated the UPCLASS deep learning classifier, which supports the curation of UniProt.
Further, we released Variomes, a high recall search engine to help Swiss-Prot curators to evaluate the pathogenicity and actionability of somatic variants thanks to the support of SPHN.
Finally, we started a new European project in the field of biodiversity to establish bridges between the biomedical and biodiversity world.

Main publications 2021

  • The UniProt Consortium
    UniProt: the universal protein knowledgebase in 2021
    Nucleic Acids Research, 10.1093/nar/gkaa1100
  • Pasche E et al.
    Variomes: a high recall search engine to support the curation of genomic variants
    Bioinformatics, 10.1101/2021.05.29.446224
  • Naderi N et al.
    Ensemble of Deep Masked Language Models for Effective Named Entity Recognition in Health and Life Science Corpora
    Front. Res. Metr. Anal., 10.3389/frma.2021.689803


University geneva

Patrick Ruch
Text Mining
HES-SO - Geneva School of Business Administration (HEG)
Group Webpage

Twitter button

Domain(s) of activity:

  • Text mining and machine learning
  • Machine learning
  • Ontology
  • Oncology
  • Semantic web format

Domain(s) of application:

  • Medicine and health