Skip to content
Snippets Groups Projects
Forked from mcariou / 2020_dginn_covid19
20 commits ahead of the upstream repository.
mcariou's avatar
mcariou authored
149af7b8
History
Name Last commit Last update
data
figure
out_tab
rnw_scripts
README.md

Evolutionary history of SARS-CoV-2 interactome in bats and primates identifies key virus-host interfaces and conflicts

Introduction

The current COVID-19 pandemic is caused by a novel coronavirus strain, SARS-CoV-2. It originated from the cross-species transmission of a coronavirus from the bat reservoir, directly or through an intermediate host to humans. This catastrophic spillover underlines the necessity to better understand how viruses and hosts have shaped one another over evolutionary time.

Pathogenic viruses put a selective pressure on the host-viral interacting proteins. Identifying which host genes bear signatures of such evolutionary conflict (e.g. positive selection) can lead to the identification of the proteins that have been the most relevant in the response to a virus family. Here, we have used this evolutionary framework to decipher which interactions between the SARS-CoV-2-like viruses and our cells have been important in vivo. In addition, identifying traces of positive selection in different hosts phylogenetic lineages also sheds lights on ancient epidemics and how virus-host determinants may be species specific. This may help to understand differences in susceptibility and pathogenicity to SARS-CoV-like viruses between hosts.

To achieve this, we characterized the evolutionary history of the SARS-CoV-2 interactome identified in in vitro studies: 332 host proteins identified by mass-spectrometry by Gordon and collaborators, as well as two essential SARS-CoV-2 entry factors, the angiotensin converting enzyme 2 (ACE2) and the transmembrane serine protease 2 (TMPRSS2) genes. We characterized their evolution in primates (tracing the human history) and in bats (the natural viral reservoir). To do so, we used DGINN, a novel computational pipeline to Detect Genetic INNovations in protein-coding genes, which embeds gold-standard methods to perform phylogenetic and positive selection analyses in a high-throughput manner.

Data formating

Requisite R packages: formatR, tinytex

~

Script to merge DGINN outputs from different batch of analysis and included or correct rows corresponding to genes ran on corrected alignmenents.

rnw_scripts/covid_comp_script0_table.pdf

Input tables in data/.

Output tables in out_tab/

The tables output from this script will be used for the following analysis steps.

Primates Results

Requisite R packages: Mondrian, UpSetR

~

Script to compare primates screen with Gordon et al.'s positive selection analysis.

rnw_scripts/covid_comp_dataset.pdf

Main input tables in out_tab/ and Young's result table in data/.

Output tables in figure/1_xxx

Comparison between datasets primates and bats

Requisite R packages: Mondrian, UpSetR, dendextend, ggraph, igraph, tidyverse,viridis.

~

Script to compare bats and primates screen.

rnw_scripts/covid_comp_dataset.pdf

Input tables in out_tab/.

Output tables in figure/1_xxx

Comparaison with MAIC score and pancorona analysis

Script to compare the DGINN screen results to MAIC score and pancorona data.

rnw_scripts/covid_comp_maic_pancorona.pdf

Input tables in out_tab/.

Output tables in figure/2_xxx