Skip to content
Snippets Groups Projects
Forked from mcariou / 2020_dginn_covid19
Source project has a limited visibility.

Evolutionary history of SARS-CoV-2 interactome in bats and primates identifies key virus-host interfaces and conflicts

Introduction

The current COVID-19 pandemic is caused by a novel coronavirus strain, SARS-CoV-2. It originated from the cross-species transmission of a coronavirus from the bat reservoir, directly or through an intermediate host to humans. This catastrophic spillover underlines the necessity to better understand how viruses and hosts have shaped one another over evolutionary time.

Pathogenic viruses put a selective pressure on the host-viral interacting proteins. Identifying which host genes bear signatures of such evolutionary conflict (e.g. positive selection) can lead to the identification of the proteins that have been the most relevant in the response to a virus family. Here, we have used this evolutionary framework to decipher which interactions between the SARS-CoV-2-like viruses and our cells have been important in vivo. In addition, identifying traces of positive selection in different hosts phylogenetic lineages also sheds lights on ancient epidemics and how virus-host determinants may be species specific. This may help to understand differences in susceptibility and pathogenicity to SARS-CoV-like viruses between hosts.

To achieve this, we characterized the evolutionary history of the SARS-CoV-2 interactome identified in in vitro studies: 332 host proteins identified by mass-spectrometry by Gordon and collaborators [1], as well as two essential SARS-CoV-2 entry factors, the angiotensin converting enzyme 2 (ACE2) and the transmembrane serine protease 2 (TMPRSS2) genes. We characterized their evolution in primates (tracing the human history) and in bats (the natural viral reservoir). To do so, we used DGINN [2], a novel computational pipeline to Detect Genetic INNovations in protein-coding genes, which embeds gold-standard methods to perform phylogenetic and positive selection analyses in a high-throughput manner.

Data formating

requisite R packages: formatR, tinytex

Script to merge DGINN outputs from different batch of analysis and included or correct rows corresponding to genes ran on corrected alignmenents.

rnw_scripts/

Input tables in data/.

Output tables in out_tab

The tables output from this script will be used for the following analysis steps.

Primates and bats

Dataset comparison