Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • master
1 result

Target

Select target project
  • mcariou/2020_dginn_covid19
  • ciri/ps_sars-cov-2/2021_dginn_covid19
2 results
Select Git revision
  • master
1 result
Show changes
Commits on Source (61)
Showing
with 943 additions and 4432 deletions
Source diff could not be displayed: it is too large. Options to address this: view the blob.
Source diff could not be displayed: it is too large. Options to address this: view the blob.
# Evolutionary history of SARS-CoV-2 interactome in bats and primates identifies key virus-host interfaces and conflicts
## Introduction
The current COVID-19 pandemic is caused by a novel coronavirus strain, SARS-CoV-2. It originated from the cross-species transmission of a coronavirus from the bat reservoir, directly or through an intermediate host to humans. This catastrophic spillover underlines the necessity to better understand how viruses and hosts have shaped one another over evolutionary time.
Pathogenic viruses put a selective pressure on the host-viral interacting proteins. Identifying which host genes bear signatures of such evolutionary conflict (e.g. positive selection) can lead to the identification of the proteins that have been the most relevant in the response to a virus family. Here, we have used this evolutionary framework to decipher which interactions between the SARS-CoV-2-like viruses and our cells have been important in vivo. In addition, identifying traces of positive selection in different hosts phylogenetic lineages also sheds lights on ancient epidemics and how virus-host determinants may be species specific. This may help to understand differences in susceptibility and pathogenicity to SARS-CoV-like viruses between hosts.
To achieve this, we characterized the evolutionary history of the SARS-CoV-2 interactome identified in in vitro studies: 332 host proteins identified by mass-spectrometry by [Gordon and collaborators](https://www.nature.com/articles/s41586-020-2286-9), as well as two essential SARS-CoV-2 entry factors, the angiotensin converting enzyme 2 (ACE2) and the transmembrane serine protease 2 (TMPRSS2) genes. We characterized their evolution in primates (tracing the human history) and in bats (the natural viral reservoir). To do so, we used [DGINN](https://academic.oup.com/nar/article/48/18/e103/5907962?login=true), a novel computational pipeline to Detect Genetic INNovations in protein-coding genes, which embeds gold-standard methods to perform phylogenetic and positive selection analyses in a high-throughput manner.
## Data formating
**Requisite R packages**: *formatR, tinytex*
Script to merge DGINN outputs from different batch of analysis and included or correct rows corresponding to genes ran on corrected alignmenents.
```
rnw_scripts/covid_comp_script0_table.pdf
```
Input tables in **data/**.
Output tables in **out_tab/**
The tables output from this script will be used for the following analysis steps.
## Primates and Bats Results
**Requisite R packages**: *openxlsx, tools, gplots, Mondrian, UpSetR*
Script to compare primates screen with Gordon et al.'s positive selection analysis.
```
rnw_scripts/covid_comp_dataset.pdf
```
Main input tables in **out_tab/** and Young's result table in **data/**.
Figures in **figure/1_xxx**
## Comparison between datasets primates and bats
**Requisite R packages**: *Mondrian, UpSetR, dendextend, ggraph, igraph, tidyverse, viridis.*
Script to compare bats and primates screen.
```
rnw_scripts/covid_comp_dataset.pdf
```
Input tables in **out_tab/**.
Figures in **figure/2_xxx**
## Comparison with MAIC score and pancorona analysis
Script to compare the DGINN screen results to [MAIC](https://www.nature.com/articles/s41598-020-79033-3) score and [pancorona data](https://translational-medicine.biomedcentral.com/articles/10.1186/s12967-020-02480-z).
```
rnw_scripts/covid_comp_maic_pancorona.pdf
```
Input tables in **out_tab/**.
Figures in **figure/3_xxx**
## GWAS analysis
Get sites under positive selection in FYCO1.
```
rnw_scripts/covid_comp_gwas.pdf
```
Input tables in **out_tab/**.
Figures in **figure/4_xxx**
\documentclass[11pt, oneside]{article} % use "amsart" instead of "article" for AMSLaTeX format
%\usepackage{geometry} % See geometry.pdf to learn the layout options. There are lots.
%\geometry{letterpaper} % ... or a4paper or a5paper or ...
%\geometry{landscape} % Activate for for rotated page geometry
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
%\usepackage{graphicx} % Use pdf, png, jpg, or eps with pdflatex; use eps in DVI mode
% TeX will automatically convert eps --> pdf in pdflatex
%\usepackage{amssymb}
\usepackage[utf8]{inputenc}
%\usepackage[cyr]{aeguill}
%\usepackage[francais]{babel}
%\usepackage{hyperref}
\title{Positive selection on genes interacting with SARS-Cov2, comparison of different analysis}
\author{Marie Cariou}
\date{Mars 2021} % Activate to display a given date or no date
\begin{document}
\maketitle
\tableofcontents
\newpage
\section{Data}
Analysis were formatted by the script covid\_comp\_script0\_table.Rnw.
<<>>=
home<-"/home/adminmarie/Documents/"
workdir<-paste0(home,"CIRI_BIBS_projects/2020_05_Etienne_covid/")
tab<-read.delim(paste0(workdir,
"covid_comp/covid_comp_complete.txt"), h=T, sep="\t")
dim(tab)
tab$Gene.name<-as.character(tab$Gene.name.x)
tab$Gene.name[tab$PreyGene=="MTARC1"]<-"MTARC1"
@
\section{Comparison Bats}
\subsection{Cooper-bats results VS DGINN-bats results}
<<omegaM7M8bats>>=
tab$bats_omegaM0codeml[tab$bats_omegaM0codeml=="na"]<-NA
plot(tab$cooper.batsAverage_dNdS,
as.numeric(as.character(tab$bats_omegaM0codeml)),
xlab="Omega Cooper-bats",
ylab="Omega DGINN-bats")
abline(0,1)
abline(lm(as.numeric(as.character(tab$bats_omegaM0codeml))~
tab$cooper.batsAverage_dNdS),
col="red")
outlier<-tab[tab$cooper.batsAverage_dNdS>0.35 &
as.numeric(as.character(tab$bats_omegaM0codeml))<0.3,]
text(x=outlier$cooper.batsAverage_dNdS,
y=as.numeric(as.character(outlier$bats_omegaM0codeml)),
outlier$Gene.name)
@
\section{Overlap}
\subsection{Data}
<<subbats>>=
tmp<-na.omit(tab[,c("Gene.name", "bats_codemlM7M8_p.value",
"hawkins_Positive.Selection..M8vM8a.p.value",
"cooper.batsM7.M8_p_value", "bats_BUSTED",
"bats_BppM1M2", "bats_BppM7M8", "bats_codemlM1M2",
"bats_codemlM7M8")])
tmp$bats_codemlM7M8_p.value[tmp$bats_codemlM7M8_p.value=="na"]<-NA
tmp$bats_codemlM7M8_p.value<-as.numeric(
as.character(tmp$bats_codemlM7M8_p.value))
dim(tmp)
@
170 genes (present in the 3 experiments)
\subsection{Mondrian}
<<mondrianbats>>=
library(Mondrian)
monddata<-as.data.frame(tmp$Gene.name)
monddata$bats_hawkins<-ifelse(
tmp$hawkins_Positive.Selection..M8vM8a.p.value<0.05, 1, 0)
monddata$bats_cooper<-ifelse(
tmp$cooper.batsM7.M8_p_value<0.05, 1, 0)
dginntmp<-rowSums(cbind(tmp$bats_codemlM1M2=="Y",
tmp$bats_codemlM7M8=="Y",
tmp$bats_BppM1M2=="Y",
tmp$bats_BppM7M8=="Y",
tmp$bats_BUSTED=="Y"))
monddata$bats_dginn<-ifelse(dginntmp>=3, 1,0)
mondrian(monddata[,2:4],
labels=c("DGINN >=3", "hawkins", "Cooper"))
monddata$bats_dginn<-ifelse(dginntmp>=4, 1,0)
mondrian(monddata[,2:4],
labels=c("DGINN >=4", "hawkins", "Cooper"))
@
\subsection{subsetR}
<<subsetbats>>=
library(UpSetR)
upsetdata<-as.data.frame(tmp$Gene.name)
upsetdata$bats_hawkins<-ifelse(
tmp$hawkins_Positive.Selection..M8vM8a.p.value<0.05, 1, 0)
upsetdata$bats_cooper<-ifelse(
tmp$cooper.batsM7.M8_p_value<0.05, 1, 0)
upsetdata$bats_dginn<-ifelse(dginntmp>=3, 1,0)
upset(upsetdata, nsets = 3, matrix.color = "#DC267F",
main.bar.color = "#648FFF", sets.bar.color = "#FE6100")
upsetdata$bats_dginn<-ifelse(dginntmp>=4, 1,0)
upset(upsetdata, nsets = 3, matrix.color = "#DC267F",
main.bar.color = "#648FFF", sets.bar.color = "#FE6100")
@
<<>>=
source("covid_comp_shiny.R")
df<-read.delim(paste0(workdir,
"/data/DGINN_202005281649summary_cleaned.csv"),
fill=T, h=T, sep=",")
names(df)
dftmp<-tab[,c("bats_File", "bats_Name",
"Gene.name", "bats_GeneSize",
"bats_NbSpecies", "bats_omegaM0Bpp",
"bats_omegaM0codeml", "bats_BUSTED",
"bats_BUSTED_p.value", "bats_MEME_NbSites",
"bats_MEME_PSS", "bats_BppM1M2",
"bats_BppM1M2_p.value", "bats_BppM1M2_NbSites",
"bats_BppM1M2_PSS", "bats_BppM7M8",
"bats_BppM7M8_p.value", "bats_BppM7M8_NbSites",
"bats_BppM7M8_PSS", "bats_codemlM1M2",
"bats_codemlM1M2_p.value", "bats_codemlM1M2_NbSites",
"bats_codemlM1M2_PSS", "bats_codemlM7M8",
"bats_codemlM7M8_p.value", "bats_codemlM7M8_NbSites" ,
"bats_codemlM7M8_PSS")]
names(dftmp)<-names(df)
makeFig1(dftmp)
@
\end{document}
File deleted
\documentclass[11pt, oneside]{article}\usepackage[]{graphicx}\usepackage[]{color}
% maxwidth is the original width if it is less than linewidth
% otherwise use linewidth (to make sure the graphics do not exceed the margin)
\makeatletter
\def\maxwidth{ %
\ifdim\Gin@nat@width>\linewidth
\linewidth
\else
\Gin@nat@width
\fi
}
\makeatother
\definecolor{fgcolor}{rgb}{0.345, 0.345, 0.345}
\newcommand{\hlnum}[1]{\textcolor[rgb]{0.686,0.059,0.569}{#1}}%
\newcommand{\hlstr}[1]{\textcolor[rgb]{0.192,0.494,0.8}{#1}}%
\newcommand{\hlcom}[1]{\textcolor[rgb]{0.678,0.584,0.686}{\textit{#1}}}%
\newcommand{\hlopt}[1]{\textcolor[rgb]{0,0,0}{#1}}%
\newcommand{\hlstd}[1]{\textcolor[rgb]{0.345,0.345,0.345}{#1}}%
\newcommand{\hlkwa}[1]{\textcolor[rgb]{0.161,0.373,0.58}{\textbf{#1}}}%
\newcommand{\hlkwb}[1]{\textcolor[rgb]{0.69,0.353,0.396}{#1}}%
\newcommand{\hlkwc}[1]{\textcolor[rgb]{0.333,0.667,0.333}{#1}}%
\newcommand{\hlkwd}[1]{\textcolor[rgb]{0.737,0.353,0.396}{\textbf{#1}}}%
\let\hlipl\hlkwb
\usepackage{framed}
\makeatletter
\newenvironment{kframe}{%
\def\at@end@of@kframe{}%
\ifinner\ifhmode%
\def\at@end@of@kframe{\end{minipage}}%
\begin{minipage}{\columnwidth}%
\fi\fi%
\def\FrameCommand##1{\hskip\@totalleftmargin \hskip-\fboxsep
\colorbox{shadecolor}{##1}\hskip-\fboxsep
% There is no \\@totalrightmargin, so:
\hskip-\linewidth \hskip-\@totalleftmargin \hskip\columnwidth}%
\MakeFramed {\advance\hsize-\width
\@totalleftmargin\z@ \linewidth\hsize
\@setminipage}}%
{\par\unskip\endMakeFramed%
\at@end@of@kframe}
\makeatother
\definecolor{shadecolor}{rgb}{.97, .97, .97}
\definecolor{messagecolor}{rgb}{0, 0, 0}
\definecolor{warningcolor}{rgb}{1, 0, 1}
\definecolor{errorcolor}{rgb}{1, 0, 0}
\newenvironment{knitrout}{}{} % an empty environment to be redefined in TeX
\usepackage{alltt} % use "amsart" instead of "article" for AMSLaTeX format
%\usepackage{geometry} % See geometry.pdf to learn the layout options. There are lots.
%\geometry{letterpaper} % ... or a4paper or a5paper or ...
%\geometry{landscape} % Activate for for rotated page geometry
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
%\usepackage{graphicx} % Use pdf, png, jpg, or eps with pdflatex; use eps in DVI mode
% TeX will automatically convert eps --> pdf in pdflatex
%\usepackage{amssymb}
\usepackage[utf8]{inputenc}
%\usepackage[cyr]{aeguill}
%\usepackage[francais]{babel}
%\usepackage{hyperref}
\title{Positive selection on genes interacting with SARS-Cov2, comparison of different analysis}
\author{Marie Cariou}
\date{Mars 2021} % Activate to display a given date or no date
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\begin{document}
\maketitle
\tableofcontents
\newpage
\section{Data}
Analysis were formatted by the script covid\_comp\_script0\_table.Rnw.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{home}\hlkwb{<-}\hlstr{"/home/adminmarie/Documents/"}
\hlstd{workdir}\hlkwb{<-}\hlkwd{paste0}\hlstd{(home,}\hlstr{"CIRI_BIBS_projects/2020_05_Etienne_covid/"}\hlstd{)}
\hlstd{tab}\hlkwb{<-}\hlkwd{read.delim}\hlstd{(}\hlkwd{paste0}\hlstd{(workdir,}
\hlstr{"covid_comp/covid_comp_complete.txt"}\hlstd{),} \hlkwc{h}\hlstd{=T,} \hlkwc{sep}\hlstd{=}\hlstr{"\textbackslash{}t"}\hlstd{)}
\hlkwd{dim}\hlstd{(tab)}
\end{alltt}
\begin{verbatim}
## [1] 332 141
\end{verbatim}
\begin{alltt}
\hlstd{tab}\hlopt{$}\hlstd{Gene.name}\hlkwb{<-}\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstd{Gene.name.x)}
\hlstd{tab}\hlopt{$}\hlstd{Gene.name[tab}\hlopt{$}\hlstd{PreyGene}\hlopt{==}\hlstr{"MTARC1"}\hlstd{]}\hlkwb{<-}\hlstr{"MTARC1"}
\end{alltt}
\end{kframe}
\end{knitrout}
\section{Comparison Bats}
\subsection{Cooper-bats results VS DGINN-bats results}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{tab}\hlopt{$}\hlstd{bats_omegaM0codeml[tab}\hlopt{$}\hlstd{bats_omegaM0codeml}\hlopt{==}\hlstr{"na"}\hlstd{]}\hlkwb{<-}\hlnum{NA}
\hlkwd{plot}\hlstd{(tab}\hlopt{$}\hlstd{cooper.batsAverage_dNdS,}
\hlkwd{as.numeric}\hlstd{(}\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstd{bats_omegaM0codeml)),}
\hlkwc{xlab}\hlstd{=}\hlstr{"Omega Cooper-bats"}\hlstd{,}
\hlkwc{ylab}\hlstd{=}\hlstr{"Omega DGINN-bats"}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlnum{0}\hlstd{,}\hlnum{1}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlkwd{lm}\hlstd{(}\hlkwd{as.numeric}\hlstd{(}\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstd{bats_omegaM0codeml))}\hlopt{~}
\hlstd{tab}\hlopt{$}\hlstd{cooper.batsAverage_dNdS),}
\hlkwc{col}\hlstd{=}\hlstr{"red"}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{cooper.batsAverage_dNdS}\hlopt{>}\hlnum{0.35} \hlopt{&}
\hlkwd{as.numeric}\hlstd{(}\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstd{bats_omegaM0codeml))}\hlopt{<}\hlnum{0.3}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{cooper.batsAverage_dNdS,}
\hlkwc{y}\hlstd{=}\hlkwd{as.numeric}\hlstd{(}\hlkwd{as.character}\hlstd{(outlier}\hlopt{$}\hlstd{bats_omegaM0codeml)),}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/omegaM7M8bats-1}
\end{knitrout}
\subsection{Cooper-bats VS Hawkins-bats and DGINN-bats VS Hawkins-bats}
\section{Overlap}
\subsection{Data}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{tmp}\hlkwb{<-}\hlkwd{na.omit}\hlstd{(tab[,}\hlkwd{c}\hlstd{(}\hlstr{"Gene.name"}\hlstd{,} \hlstr{"bats_codemlM7M8_p.value"}\hlstd{,}
\hlstr{"hawkins_Positive.Selection..M8vM8a.p.value"}\hlstd{,}
\hlstr{"cooper.batsM7.M8_p_value"}\hlstd{,} \hlstr{"bats_BUSTED"}\hlstd{,}
\hlstr{"bats_BppM1M2"}\hlstd{,} \hlstr{"bats_BppM7M8"}\hlstd{,} \hlstr{"bats_codemlM1M2"}\hlstd{,}
\hlstr{"bats_codemlM7M8"}\hlstd{)])}
\hlstd{tmp}\hlopt{$}\hlstd{bats_codemlM7M8_p.value[tmp}\hlopt{$}\hlstd{bats_codemlM7M8_p.value}\hlopt{==}\hlstr{"na"}\hlstd{]}\hlkwb{<-}\hlnum{NA}
\hlstd{tmp}\hlopt{$}\hlstd{bats_codemlM7M8_p.value}\hlkwb{<-}\hlkwd{as.numeric}\hlstd{(}
\hlkwd{as.character}\hlstd{(tmp}\hlopt{$}\hlstd{bats_codemlM7M8_p.value))}
\hlkwd{dim}\hlstd{(tmp)}
\end{alltt}
\begin{verbatim}
## [1] 174 9
\end{verbatim}
\end{kframe}
\end{knitrout}
170 genes (present in the 3 experiments)
\subsection{Mondrian}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{library}\hlstd{(Mondrian)}
\hlstd{monddata}\hlkwb{<-}\hlkwd{as.data.frame}\hlstd{(tmp}\hlopt{$}\hlstd{Gene.name)}
\hlstd{monddata}\hlopt{$}\hlstd{bats_hawkins}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(}
\hlstd{tmp}\hlopt{$}\hlstd{hawkins_Positive.Selection..M8vM8a.p.value}\hlopt{<}\hlnum{0.05}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlstd{monddata}\hlopt{$}\hlstd{bats_cooper}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(}
\hlstd{tmp}\hlopt{$}\hlstd{cooper.batsM7.M8_p_value}\hlopt{<}\hlnum{0.05}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlstd{dginntmp}\hlkwb{<-}\hlkwd{rowSums}\hlstd{(}\hlkwd{cbind}\hlstd{(tmp}\hlopt{$}\hlstd{bats_codemlM1M2}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tmp}\hlopt{$}\hlstd{bats_codemlM7M8}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tmp}\hlopt{$}\hlstd{bats_BppM1M2}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tmp}\hlopt{$}\hlstd{bats_BppM7M8}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tmp}\hlopt{$}\hlstd{bats_BUSTED}\hlopt{==}\hlstr{"Y"}\hlstd{))}
\hlstd{monddata}\hlopt{$}\hlstd{bats_dginn}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginntmp}\hlopt{>=}\hlnum{3}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{mondrian}\hlstd{(monddata[,}\hlnum{2}\hlopt{:}\hlnum{4}\hlstd{],}
\hlkwc{labels}\hlstd{=}\hlkwd{c}\hlstd{(}\hlstr{"DGINN >=3"}\hlstd{,} \hlstr{"hawkins"}\hlstd{,} \hlstr{"Cooper"}\hlstd{))}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/mondrianbats-1}
\begin{kframe}\begin{alltt}
\hlstd{monddata}\hlopt{$}\hlstd{bats_dginn}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginntmp}\hlopt{>=}\hlnum{4}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{mondrian}\hlstd{(monddata[,}\hlnum{2}\hlopt{:}\hlnum{4}\hlstd{],}
\hlkwc{labels}\hlstd{=}\hlkwd{c}\hlstd{(}\hlstr{"DGINN >=4"}\hlstd{,} \hlstr{"hawkins"}\hlstd{,} \hlstr{"Cooper"}\hlstd{))}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/mondrianbats-2}
\end{knitrout}
\subsection{subsetR}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{library}\hlstd{(UpSetR)}
\hlstd{upsetdata}\hlkwb{<-}\hlkwd{as.data.frame}\hlstd{(tmp}\hlopt{$}\hlstd{Gene.name)}
\hlstd{upsetdata}\hlopt{$}\hlstd{bats_hawkins}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(}
\hlstd{tmp}\hlopt{$}\hlstd{hawkins_Positive.Selection..M8vM8a.p.value}\hlopt{<}\hlnum{0.05}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlstd{upsetdata}\hlopt{$}\hlstd{bats_cooper}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(}
\hlstd{tmp}\hlopt{$}\hlstd{cooper.batsM7.M8_p_value}\hlopt{<}\hlnum{0.05}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlstd{upsetdata}\hlopt{$}\hlstd{bats_dginn}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginntmp}\hlopt{>=}\hlnum{3}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{upset}\hlstd{(upsetdata,} \hlkwc{nsets} \hlstd{=} \hlnum{3}\hlstd{,} \hlkwc{matrix.color} \hlstd{=} \hlstr{"#DC267F"}\hlstd{,}
\hlkwc{main.bar.color} \hlstd{=} \hlstr{"#648FFF"}\hlstd{,} \hlkwc{sets.bar.color} \hlstd{=} \hlstr{"#FE6100"}\hlstd{)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/subsetbats-1}
\begin{kframe}\begin{alltt}
\hlstd{upsetdata}\hlopt{$}\hlstd{bats_dginn}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginntmp}\hlopt{>=}\hlnum{4}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{upset}\hlstd{(upsetdata,} \hlkwc{nsets} \hlstd{=} \hlnum{3}\hlstd{,} \hlkwc{matrix.color} \hlstd{=} \hlstr{"#DC267F"}\hlstd{,}
\hlkwc{main.bar.color} \hlstd{=} \hlstr{"#648FFF"}\hlstd{,} \hlkwc{sets.bar.color} \hlstd{=} \hlstr{"#FE6100"}\hlstd{)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/subsetbats-2}
\end{knitrout}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{source}\hlstd{(}\hlstr{"covid_comp_shiny.R"}\hlstd{)}
\hlstd{df}\hlkwb{<-}\hlkwd{read.delim}\hlstd{(}\hlkwd{paste0}\hlstd{(workdir,}
\hlstr{"/data/DGINN_202005281649summary_cleaned.csv"}\hlstd{),}
\hlkwc{fill}\hlstd{=T,} \hlkwc{h}\hlstd{=T,} \hlkwc{sep}\hlstd{=}\hlstr{","}\hlstd{)}
\hlkwd{names}\hlstd{(df)}
\end{alltt}
\begin{verbatim}
## [1] "File" "Name" "Gene"
## [4] "GeneSize" "NbSpecies" "omegaM0Bpp"
## [7] "omegaM0codeml" "BUSTED" "BUSTED.p.value"
## [10] "MEME.NbSites" "MEME.PSS" "BppM1M2"
## [13] "BppM1M2.p.value" "BppM1M2.NbSites" "BppM1M2.PSS"
## [16] "BppM7M8" "BppM7M8.p.value" "BppM7M8.NbSites"
## [19] "BppM7M8.PSS" "codemlM1M2" "codemlM1M2.p.value"
## [22] "codemlM1M2.NbSites" "codemlM1M2.PSS" "codemlM7M8"
## [25] "codemlM7M8.p.value" "codemlM7M8.NbSites" "codemlM7M8.PSS"
\end{verbatim}
\begin{alltt}
\hlstd{dftmp}\hlkwb{<-}\hlstd{tab[,}\hlkwd{c}\hlstd{(}\hlstr{"bats_File"}\hlstd{,} \hlstr{"bats_Name"}\hlstd{,}
\hlstr{"Gene.name"}\hlstd{,} \hlstr{"bats_GeneSize"}\hlstd{,}
\hlstr{"bats_NbSpecies"}\hlstd{,} \hlstr{"bats_omegaM0Bpp"}\hlstd{,}
\hlstr{"bats_omegaM0codeml"}\hlstd{,} \hlstr{"bats_BUSTED"}\hlstd{,}
\hlstr{"bats_BUSTED_p.value"}\hlstd{,} \hlstr{"bats_MEME_NbSites"}\hlstd{,}
\hlstr{"bats_MEME_PSS"}\hlstd{,} \hlstr{"bats_BppM1M2"}\hlstd{,}
\hlstr{"bats_BppM1M2_p.value"}\hlstd{,} \hlstr{"bats_BppM1M2_NbSites"}\hlstd{,}
\hlstr{"bats_BppM1M2_PSS"}\hlstd{,} \hlstr{"bats_BppM7M8"}\hlstd{,}
\hlstr{"bats_BppM7M8_p.value"}\hlstd{,} \hlstr{"bats_BppM7M8_NbSites"}\hlstd{,}
\hlstr{"bats_BppM7M8_PSS"}\hlstd{,} \hlstr{"bats_codemlM1M2"}\hlstd{,}
\hlstr{"bats_codemlM1M2_p.value"}\hlstd{,} \hlstr{"bats_codemlM1M2_NbSites"}\hlstd{,}
\hlstr{"bats_codemlM1M2_PSS"}\hlstd{,} \hlstr{"bats_codemlM7M8"}\hlstd{,}
\hlstr{"bats_codemlM7M8_p.value"}\hlstd{,} \hlstr{"bats_codemlM7M8_NbSites"} \hlstd{,}
\hlstr{"bats_codemlM7M8_PSS"}\hlstd{)]}
\hlkwd{names}\hlstd{(dftmp)}\hlkwb{<-}\hlkwd{names}\hlstd{(df)}
\hlkwd{makeFig1}\hlstd{(dftmp)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/unnamed-chunk-2-1}
\end{knitrout}
\end{document}
Source diff could not be displayed: it is too large. Options to address this: view the blob.
File deleted
\documentclass[11pt, oneside]{article}\usepackage[]{graphicx}\usepackage[]{color}
% maxwidth is the original width if it is less than linewidth
% otherwise use linewidth (to make sure the graphics do not exceed the margin)
\makeatletter
\def\maxwidth{ %
\ifdim\Gin@nat@width>\linewidth
\linewidth
\else
\Gin@nat@width
\fi
}
\makeatother
\definecolor{fgcolor}{rgb}{0.345, 0.345, 0.345}
\makeatletter
\@ifundefined{AddToHook}{}{\AddToHook{package/xcolor/after}{\definecolor{fgcolor}{rgb}{0.345, 0.345, 0.345}}}
\makeatother
\newcommand{\hlnum}[1]{\textcolor[rgb]{0.686,0.059,0.569}{#1}}%
\newcommand{\hlstr}[1]{\textcolor[rgb]{0.192,0.494,0.8}{#1}}%
\newcommand{\hlcom}[1]{\textcolor[rgb]{0.678,0.584,0.686}{\textit{#1}}}%
\newcommand{\hlopt}[1]{\textcolor[rgb]{0,0,0}{#1}}%
\newcommand{\hlstd}[1]{\textcolor[rgb]{0.345,0.345,0.345}{#1}}%
\newcommand{\hlkwa}[1]{\textcolor[rgb]{0.161,0.373,0.58}{\textbf{#1}}}%
\newcommand{\hlkwb}[1]{\textcolor[rgb]{0.69,0.353,0.396}{#1}}%
\newcommand{\hlkwc}[1]{\textcolor[rgb]{0.333,0.667,0.333}{#1}}%
\newcommand{\hlkwd}[1]{\textcolor[rgb]{0.737,0.353,0.396}{\textbf{#1}}}%
\let\hlipl\hlkwb
\usepackage{framed}
\makeatletter
\newenvironment{kframe}{%
\def\at@end@of@kframe{}%
\ifinner\ifhmode%
\def\at@end@of@kframe{\end{minipage}}%
\begin{minipage}{\columnwidth}%
\fi\fi%
\def\FrameCommand##1{\hskip\@totalleftmargin \hskip-\fboxsep
\colorbox{shadecolor}{##1}\hskip-\fboxsep
% There is no \\@totalrightmargin, so:
\hskip-\linewidth \hskip-\@totalleftmargin \hskip\columnwidth}%
\MakeFramed {\advance\hsize-\width
\@totalleftmargin\z@ \linewidth\hsize
\@setminipage}}%
{\par\unskip\endMakeFramed%
\at@end@of@kframe}
\makeatother
\definecolor{shadecolor}{rgb}{.97, .97, .97}
\definecolor{messagecolor}{rgb}{0, 0, 0}
\definecolor{warningcolor}{rgb}{1, 0, 1}
\definecolor{errorcolor}{rgb}{1, 0, 0}
\makeatletter
\@ifundefined{AddToHook}{}{\AddToHook{package/xcolor/after}{
\definecolor{shadecolor}{rgb}{.97, .97, .97}
\definecolor{messagecolor}{rgb}{0, 0, 0}
\definecolor{warningcolor}{rgb}{1, 0, 1}
\definecolor{errorcolor}{rgb}{1, 0, 0}
}}
\makeatother
\newenvironment{knitrout}{}{} % an empty environment to be redefined in TeX
\usepackage{alltt} % use "amsart" instead of "article" for AMSLaTeX format
%\usepackage{geometry} % See geometry.pdf to learn the layout options. There are lots.
%\geometry{letterpaper} % ... or a4paper or a5paper or ...
%\geometry{landscape} % Activate for for rotated page geometry
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
%\usepackage{graphicx} % Use pdf, png, jpg, or eps with pdflatex; use eps in DVI mode
% TeX will automatically convert eps --> pdf in pdflatex
%\usepackage{amssymb}
\usepackage[utf8]{inputenc}
%\usepackage[cyr]{aeguill}
%\usepackage[francais]{babel}
%\usepackage{hyperref}
\title{Positive selection on genes interacting with SARS-Cov2, comparing GWAS and PS on GWAS}
\author{Marie Cariou}
\date{March 2021} % Activate to display a given date or no date
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\begin{document}
\maketitle
\tableofcontents
\newpage
\section{Data}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{library}\hlstd{(shape)}
\end{alltt}
{\ttfamily\noindent\bfseries\color{errorcolor}{\#\# Error in library(shape): there is no package called 'shape'}}\end{kframe}
\end{knitrout}
\subsection{GWAS 1}
-Article:
https://www.nejm.org/doi/full/10.1056/NEJMoa2020283
-Data:
https://ikmb.shinyapps.io/COVID-19\_GWAS\_Browser/
-Locally:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{home}\hlkwb{<-}\hlstr{"/home/cariou/Documents/CIRI2020-2021/"}
\hlstd{datapath}\hlkwb{<-}\hlkwd{paste0}\hlstd{(home,} \hlstr{"Etienne_2020/data/GWAS/"}\hlstd{)}
\hlstd{gwas1}\hlkwb{<-}\hlkwd{paste0}\hlstd{(datapath,} \hlstr{"meta_analysis_II.hg38.gwascatalogformat.tsv"}\hlstd{)}
\end{alltt}
\end{kframe}
\end{knitrout}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{home}\hlkwb{<-}\hlstr{"/home/adminmarie/Documents/CIRI_BIBS_projects/"}
\hlstd{datapath}\hlkwb{<-}\hlkwd{paste0}\hlstd{(home,} \hlstr{"2020_05_Etienne_covid/data/GWAS/"}\hlstd{)}
\hlstd{gwas1}\hlkwb{<-}\hlkwd{paste0}\hlstd{(datapath,} \hlstr{"meta_analysis_II.hg38.gwascatalogformat.tsv"}\hlstd{)}
\end{alltt}
\end{kframe}
\end{knitrout}
\subsection{GWAS 2}
-Article:
Genetic mechanisms of critical illness in COVID-19
https://www.nature.com/articles/s41586-020-03065-y
-Data:
https://genomicc.org/data/
-Locally:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{gwas2}\hlkwb{<-}\hlkwd{list.files}\hlstd{(datapath,} \hlkwc{pattern}\hlstd{=}\hlstr{"genomicc"}\hlstd{)}
\hlstd{gwas2}\hlkwb{<-}\hlkwd{paste0}\hlstd{(datapath, gwas2)}
\end{alltt}
\end{kframe}
\end{knitrout}
\subsection{GWAS 2}
-Article:
Mapping the human genetic architecture of COVID-19
COVID-19 Host Genetics Initiative
https://www.nature.com/articles/s41586-021-03767-x
-Data:
https://www.covid19hg.org/results/r6/
-Locally:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{gwas3}\hlkwb{<-}\hlkwd{list.files}\hlstd{(datapath,} \hlkwc{pattern}\hlstd{=}\hlstr{"COVID19"}\hlstd{)}
\hlstd{gwas3}\hlkwb{<-}\hlkwd{paste0}\hlstd{(datapath, gwas3)}
\end{alltt}
\end{kframe}
\end{knitrout}
\subsection{DGINN data}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{home}\hlkwb{<-}\hlstr{"/home/cariou/Documents/CIRI2020-2021/"}
\hlstd{tabpath}\hlkwb{<-}\hlkwd{paste0}\hlstd{(home,} \hlstr{"Etienne_2020/2020_dginn_covid19/"}\hlstd{)}
\end{alltt}
\end{kframe}
\end{knitrout}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlcom{#table}
\hlcom{#tab<-read.delim(paste0(tabpath, }
\hlcom{# "out_tab/covid_comp_alldginn.txt"), h=T, sep="\textbackslash{}t")}
\hlcom{#dim(tab)}
\hlstd{tab}\hlkwb{<-}\hlkwd{read.delim}\hlstd{(}\hlkwd{paste0}\hlstd{(tabpath,}
\hlstr{"data/DGINN_202102210716_summary.txt"}\hlstd{),} \hlkwc{h}\hlstd{=T,} \hlkwc{sep}\hlstd{=}\hlstr{"\textbackslash{}t"}\hlstd{)}
\hlcom{# fasta}
\hlstd{fasta}\hlkwb{<-}\hlkwd{list.files}\hlstd{(datapath,} \hlkwc{pattern}\hlstd{=}\hlstr{"FYCO"}\hlstd{)}
\end{alltt}
\end{kframe}
\end{knitrout}
\subsection{FYCO1}
https://www.ensembl.org/Homo\_sapiens/Gene/Summary?g=ENSG00000163820;r=3:45917899-45995824
coordinate: Chromosome 3: 45,917,899-45,995,824
\subsection{Objectif}
Table :
\begin{itemize}
\item pos dans genome de ref
\item pos dans l'aln
\item under PS oui/non
\item GWAS1 oui/non
\end{itemize}
\section{Get coordinates}
\subsection{Get GWAS coordinates}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{deb}\hlkwb{<-}\hlnum{45917899}
\hlstd{end}\hlkwb{<-}\hlnum{45995824}
\hlstd{len}\hlkwb{<-}\hlstd{end}\hlopt{-}\hlstd{deb}
\hlstd{thres}\hlkwb{<-}\hlnum{1e-8}
\hlstd{posref}\hlkwb{<-}\hlstd{deb}\hlopt{:}\hlstd{end}
\hlstd{fyco1tab}\hlkwb{<-}\hlkwd{as.data.frame}\hlstd{(posref)}
\hlcom{# GWAS 1}
\hlstd{cmd}\hlkwb{<-}\hlkwd{paste0}\hlstd{(}\hlstr{"cat "}\hlstd{, gwas1,} \hlstr{" | grep '^3' > tmp"}\hlstd{)}
\hlkwd{system}\hlstd{(cmd)}
\hlstd{gwastab}\hlkwb{<-}\hlkwd{read.table}\hlstd{(}\hlstr{"tmp"}\hlstd{,} \hlkwc{h}\hlstd{=}\hlnum{FALSE}\hlstd{)}
\hlkwd{system}\hlstd{(}\hlstr{"rm tmp"}\hlstd{)}
\hlkwd{names}\hlstd{(gwastab)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"chromosome"}\hlstd{,} \hlstr{"base_pair_location"}\hlstd{,} \hlstr{"variant_id"}\hlstd{,}
\hlstr{"effect_allele"}\hlstd{,} \hlstr{"other_allele"}\hlstd{,} \hlstr{"beta"}\hlstd{,} \hlstr{"standard_error"}\hlstd{,} \hlstr{"p_value"}\hlstd{)}
\hlstd{gwastab}\hlkwb{<-}\hlstd{gwastab[(gwastab}\hlopt{$}\hlstd{chromosome}\hlopt{==}\hlnum{3}
\hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{base_pair_location}\hlopt{>}\hlstd{deb}
\hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{base_pair_location}\hlopt{<}\hlstd{end),]}
\hlcom{## filtrer sur pvalue et créer colonne F/T GWAS}
\hlstd{gwas1pos}\hlkwb{<-}\hlstd{gwastab}\hlopt{$}\hlstd{base_pair_location[gwastab}\hlopt{$}\hlstd{p_value}\hlopt{<}\hlstd{thres]}
\hlstd{fyco1tab}\hlopt{$}\hlstd{gwas1}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(posref} \hlopt{%in%} \hlstd{gwas1pos,} \hlnum{TRUE}\hlstd{,} \hlnum{FALSE}\hlstd{)}
\hlcom{## Same for GWAS 2}
\hlstd{gwastab}\hlkwb{<-}\hlkwd{read.table}\hlstd{(gwas2[}\hlnum{1}\hlstd{],}\hlkwc{h}\hlstd{=T,} \hlkwc{sep}\hlstd{=}\hlstr{"\textbackslash{}t"}\hlstd{)}
\hlstd{gwastab}\hlkwb{<-}\hlstd{gwastab[(gwastab}\hlopt{$}\hlstd{CHR}\hlopt{==}\hlnum{3} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS}\hlopt{>}\hlstd{deb} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS}\hlopt{<}\hlstd{end),]}
\hlstd{gwas1pos}\hlkwb{<-}\hlstd{gwastab}\hlopt{$}\hlstd{POS[gwastab}\hlopt{$}\hlstd{Pval}\hlopt{<}\hlstd{thres]}
\hlstd{file1}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(posref} \hlopt{%in%} \hlstd{gwas1pos,} \hlnum{TRUE}\hlstd{,} \hlnum{FALSE}\hlstd{)}
\hlkwd{sum}\hlstd{(file1)}
\end{alltt}
\begin{verbatim}
## [1] 0
\end{verbatim}
\begin{alltt}
\hlstd{gwastab}\hlkwb{<-}\hlkwd{read.table}\hlstd{(gwas2[}\hlnum{2}\hlstd{],}\hlkwc{h}\hlstd{=T,} \hlkwc{sep}\hlstd{=}\hlstr{"\textbackslash{}t"}\hlstd{)}
\hlstd{gwastab}\hlkwb{<-}\hlstd{gwastab[(gwastab}\hlopt{$}\hlstd{CHR}\hlopt{==}\hlnum{3} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS}\hlopt{>}\hlstd{deb} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS}\hlopt{<}\hlstd{end),]}
\hlstd{gwas1pos}\hlkwb{<-}\hlstd{gwastab}\hlopt{$}\hlstd{POS[gwastab}\hlopt{$}\hlstd{Pval}\hlopt{<}\hlstd{thres]}
\hlstd{file2}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(posref} \hlopt{%in%} \hlstd{gwas1pos,} \hlnum{TRUE}\hlstd{,} \hlnum{FALSE}\hlstd{)}
\hlkwd{sum}\hlstd{(file2)}
\end{alltt}
\begin{verbatim}
## [1] 0
\end{verbatim}
\begin{alltt}
\hlstd{gwastab}\hlkwb{<-}\hlkwd{read.table}\hlstd{(gwas2[}\hlnum{3}\hlstd{],}\hlkwc{h}\hlstd{=T,} \hlkwc{sep}\hlstd{=}\hlstr{"\textbackslash{}t"}\hlstd{)}
\hlstd{gwastab}\hlkwb{<-}\hlstd{gwastab[(gwastab}\hlopt{$}\hlstd{CHR}\hlopt{==}\hlnum{3} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS}\hlopt{>}\hlstd{deb} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS}\hlopt{<}\hlstd{end),]}
\hlstd{gwas1pos}\hlkwb{<-}\hlstd{gwastab}\hlopt{$}\hlstd{POS[gwastab}\hlopt{$}\hlstd{Pval}\hlopt{<}\hlstd{thres]}
\hlstd{file3}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(posref} \hlopt{%in%} \hlstd{gwas1pos,} \hlnum{TRUE}\hlstd{,} \hlnum{FALSE}\hlstd{)}
\hlkwd{sum}\hlstd{(file3)}
\end{alltt}
\begin{verbatim}
## [1] 17
\end{verbatim}
\begin{alltt}
\hlstd{gwastab}\hlkwb{<-}\hlkwd{read.table}\hlstd{(gwas2[}\hlnum{4}\hlstd{],}\hlkwc{h}\hlstd{=T,} \hlkwc{sep}\hlstd{=}\hlstr{"\textbackslash{}t"}\hlstd{)}
\hlstd{gwastab}\hlkwb{<-}\hlstd{gwastab[(gwastab}\hlopt{$}\hlstd{CHR}\hlopt{==}\hlnum{3} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS}\hlopt{>}\hlstd{deb} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS}\hlopt{<}\hlstd{end),]}
\hlstd{gwas1pos}\hlkwb{<-}\hlstd{gwastab}\hlopt{$}\hlstd{POS[gwastab}\hlopt{$}\hlstd{Pval}\hlopt{<}\hlstd{thres]}
\hlstd{file4}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(posref} \hlopt{%in%} \hlstd{gwas1pos,} \hlnum{TRUE}\hlstd{,} \hlnum{FALSE}\hlstd{)}
\hlkwd{sum}\hlstd{(file4)}
\end{alltt}
\begin{verbatim}
## [1] 0
\end{verbatim}
\end{kframe}
\end{knitrout}
Significant SNPs are all in genomicc.EUR.PLINK2.txt.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{fyco1tab}\hlopt{$}\hlstd{gwas2}\hlkwb{<-}\hlstd{((file1} \hlopt{|} \hlstd{file2)} \hlopt{|} \hlstd{file3 )} \hlopt{|} \hlstd{file4}
\hlcom{## Voir cohérence entre les 2}
\hlkwd{table}\hlstd{(fyco1tab[,}\hlnum{2}\hlopt{:}\hlnum{3}\hlstd{])}
\end{alltt}
\begin{verbatim}
## gwas2
## gwas1 FALSE TRUE
## FALSE 77909 17
\end{verbatim}
\end{kframe}
\end{knitrout}
GWAS3
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{gwastab}\hlkwb{<-}\hlkwd{read.table}\hlstd{(gwas3,} \hlkwc{h}\hlstd{=F,} \hlkwc{sep}\hlstd{=}\hlstr{"\textbackslash{}t"}\hlstd{)}
\hlkwd{names}\hlstd{(gwastab)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"CHR"}\hlstd{,} \hlstr{"POS"}\hlstd{,} \hlstr{"REF"}\hlstd{,} \hlstr{"ALT"}\hlstd{,} \hlstr{"SNP"}\hlstd{,} \hlstr{"all_meta_N"}\hlstd{,}
\hlstr{"all_inv_var_meta_beta"}\hlstd{,} \hlstr{"all_inv_var_meta_sebeta"}\hlstd{,}
\hlstr{"all_inv_var_meta_p"}\hlstd{,} \hlstr{"all_inv_var_meta_cases total"}\hlstd{,}
\hlstr{"all_inv_var_meta_controls"}\hlstd{,} \hlstr{"all_inv_var_meta_effective"}\hlstd{,}
\hlstr{"all_inv_var_het_p"}\hlstd{,} \hlstr{"all_meta_AF"}\hlstd{,} \hlstr{"rsid"}\hlstd{)}
\hlstd{gwastab}\hlkwb{<-}\hlstd{gwastab[(gwastab}\hlopt{$}\hlstd{CHR}\hlopt{==}\hlnum{3} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS}\hlopt{>}\hlstd{deb} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS}\hlopt{<}\hlstd{end),]}
\hlstd{gwas1pos}\hlkwb{<-}\hlstd{gwastab}\hlopt{$}\hlstd{POS[gwastab}\hlopt{$}\hlstd{all_inv_var_meta_p}\hlopt{<}\hlstd{thres]}
\hlstd{gwas3tab}\hlkwb{<-}\hlstd{gwastab}
\hlcom{#gwastab[gwastab$all_inv_var_meta_p<thres & gwastab$POS==45967995,]}
\hlstd{file1}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(posref} \hlopt{%in%} \hlstd{gwas1pos,} \hlnum{TRUE}\hlstd{,} \hlnum{FALSE}\hlstd{)}
\hlkwd{sum}\hlstd{(file1)}
\end{alltt}
\begin{verbatim}
## [1] 62
\end{verbatim}
\begin{alltt}
\hlstd{fyco1tab}\hlopt{$}\hlstd{gwas3}\hlkwb{<-}\hlstd{file1}
\end{alltt}
\end{kframe}
\end{knitrout}
\subsection{Coordinates aln/genome}
How to convert coordinates from alignments to genomic coordinate to the multi species alignment coordinates?
1. I downloaded the cds sequence to know for which transcript I should refer to
FYCO1-205 is the right length. ENST00000535325.5, le cds de référence fait 4497 bp, l'alignement fait 4503 pb
FYCO1-201 has one exon missing, as moste of the sequences in the alignment
I will get the coordinates of exons within this transcript ENST00000535325.5
"cat /Xnfs/ciridb/shared\_data/Genomes/Ensembl/Hs/GRCh38/Annot/v77/Homo\_sapiens.GRCh38.77.gtf | grep "ENST00000535325" | awk '{print $1 " " $3 " " $4 " " $5 " " $10 " " $14 " " $17 " " $18}' > human\_fyco1.gtf"
There might be a max 6bp discrepency between positions, we will see...
~
I aligned the sequences of the FYCO1-205 cds to the primates alignement
\subsection{Make table from gtf}
Le gene est en antisens co
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{gtf}\hlkwb{<-}\hlkwd{read.table}\hlstd{(}\hlkwd{paste0}\hlstd{(datapath,} \hlstr{"/human_fyco1.gtf"}\hlstd{),} \hlkwc{h}\hlstd{=F)}
\hlcom{#gtf$V3=gtf$V3-45900000}
\hlcom{#gtf$V4=gtf$V4-45900000}
\hlstd{cds}\hlkwb{<-}\hlstd{gtf[gtf}\hlopt{$}\hlstd{V2} \hlopt{%in%} \hlkwd{c}\hlstd{(}\hlstr{"CDS"}\hlstd{),]}
\hlstd{cds}
\end{alltt}
\begin{verbatim}
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
## 4 3 CDS 45984856 45984910 ENSG00000163820 ; ENST00000535325 ; exon_number 2 ;
## 7 3 CDS 45981570 45981676 ENSG00000163820 ; ENST00000535325 ; exon_number 3 ;
## 9 3 CDS 45979705 45979830 ENSG00000163820 ; ENST00000535325 ; exon_number 4 ;
## 11 3 CDS 45975239 45975345 ENSG00000163820 ; ENST00000535325 ; exon_number 5 ;
## 13 3 CDS 45973088 45973231 ENSG00000163820 ; ENST00000535325 ; exon_number 6 ;
## 15 3 CDS 45969675 45969765 ENSG00000163820 ; ENST00000535325 ; exon_number 7 ;
## 17 3 CDS 45966277 45968703 ENSG00000163820 ; ENST00000535325 ; exon_number 8 ;
## 19 3 CDS 45965033 45965125 ENSG00000163820 ; ENST00000535325 ; exon_number 9 ;
## 21 3 CDS 45964336 45964454 ENSG00000163820 ; ENST00000535325 ; exon_number 10 ;
## 23 3 CDS 45962225 45962392 ENSG00000163820 ; ENST00000535325 ; exon_number 11 ;
## 25 3 CDS 45959393 45959542 ENSG00000163820 ; ENST00000535325 ; exon_number 12 ;
## 27 3 CDS 45958408 45958619 ENSG00000163820 ; ENST00000535325 ; exon_number 13 ;
## 29 3 CDS 45955249 45955393 ENSG00000163820 ; ENST00000535325 ; exon_number 14 ;
## 31 3 CDS 45938196 45938255 ENSG00000163820 ; ENST00000535325 ; exon_number 15 ;
## 33 3 CDS 45936448 45936543 ENSG00000163820 ; ENST00000535325 ; exon_number 16 ;
## 35 3 CDS 45931071 45931281 ENSG00000163820 ; ENST00000535325 ; exon_number 17 ;
## 37 3 CDS 45923656 45923765 ENSG00000163820 ; ENST00000535325 ; exon_number 18 ;
## 39 3 CDS 45921768 45921840 ENSG00000163820 ; ENST00000535325 ; exon_number 19 ;
\end{verbatim}
\begin{alltt}
\hlstd{cds}\hlopt{$}\hlstd{len}\hlkwb{=}\hlstd{cds}\hlopt{$}\hlstd{V4}\hlopt{-}\hlstd{cds}\hlopt{$}\hlstd{V3}
\hlkwd{sum}\hlstd{(cds}\hlopt{$}\hlstd{len)}
\end{alltt}
\begin{verbatim}
## [1] 4476
\end{verbatim}
\begin{alltt}
\hlcom{# I think there is an exon missing outdated gtf?}
\hlcom{#make a vector with positions}
\hlstd{cat}\hlkwb{<-}\hlkwd{numeric}\hlstd{()}
\hlkwa{for} \hlstd{(i} \hlkwa{in} \hlnum{1}\hlopt{:}\hlkwd{nrow}\hlstd{(cds))\{}
\hlstd{tmp}\hlkwb{<-}\hlstd{cds}\hlopt{$}\hlstd{V3[i]}\hlopt{:}\hlstd{cds}\hlopt{$}\hlstd{V4[i]}
\hlstd{cat}\hlkwb{<-}\hlkwd{c}\hlstd{(cat,tmp)}
\hlstd{\}}
\hlstd{cat}\hlkwb{<-}\hlkwd{sort}\hlstd{(cat)}
\hlstd{aln}\hlkwb{<-}\hlkwd{c}\hlstd{((}\hlkwd{length}\hlstd{(cat)}\hlopt{-}\hlnum{6}\hlstd{)}\hlopt{:}\hlnum{1831}\hlstd{,} \hlnum{1827}\hlopt{:}\hlnum{1078}\hlstd{,} \hlnum{1074}\hlopt{:}\hlnum{1}\hlstd{)}
\hlstd{cds_tab}\hlkwb{<-}\hlkwd{cbind}\hlstd{(cat,} \hlkwd{c}\hlstd{((}\hlkwd{length}\hlstd{(cat)}\hlopt{+}\hlnum{6}\hlstd{)}\hlopt{:}\hlnum{1831}\hlstd{,} \hlnum{1827}\hlopt{:}\hlnum{1078}\hlstd{,} \hlnum{1074}\hlopt{:}\hlnum{1}\hlstd{))}
\hlkwd{colnames}\hlstd{(cds_tab)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"posref"}\hlstd{,} \hlstr{"pos_aln"}\hlstd{)}
\hlstd{cds_tab}\hlkwb{<-}\hlstd{cds_tab[}\hlkwd{order}\hlstd{(cds_tab[,}\hlnum{2}\hlstd{]),]}
\end{alltt}
\end{kframe}
\end{knitrout}
After visual examination of the alignment between the reference cds and the original alignment, their is 2 3bp insertion in the alignment compared to the reference sequence.
On the reference, I manually added 3 gaps between 1075 and 1078 and 1828 and 1825.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{fyco1tab}\hlkwb{<-}\hlkwd{merge}\hlstd{(fyco1tab, cds_tab,} \hlkwc{by}\hlstd{=}\hlstr{"posref"}\hlstd{,} \hlkwc{all.x}\hlstd{=T)}
\end{alltt}
\end{kframe}
\end{knitrout}
\subsection{Get PS coordinates}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{PS1}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{File}\hlopt{==}\hlstr{"FYCO1_pri_DGINN_select_cut_add_mafft_prank"}\hlstd{,}
\hlstr{"MEME_PSS"}\hlstd{]}
\hlstd{PS2}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{File}\hlopt{==}\hlstr{"FYCO1_pri_DGINN_select_cut_add_mafft_prank"}\hlstd{,}
\hlstr{"BppM1M2_PSS"}\hlstd{]}
\hlstd{PS3}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{File}\hlopt{==}\hlstr{"FYCO1_pri_DGINN_select_cut_add_mafft_prank"}\hlstd{,}
\hlstr{"BppM7M8_PSS"}\hlstd{]}
\hlstd{PS4}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{File}\hlopt{==}\hlstr{"FYCO1_pri_DGINN_select_cut_add_mafft_prank"}\hlstd{,}
\hlstr{"codemlM1M2_PSS"}\hlstd{]}
\hlstd{PS5}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{File}\hlopt{==}\hlstr{"FYCO1_pri_DGINN_select_cut_add_mafft_prank"}\hlstd{,}
\hlstr{"codemlM7M8_PSS"}\hlstd{]}
\hlstd{PS1}\hlkwb{<-}\hlkwd{as.numeric}\hlstd{(}\hlkwd{unlist}\hlstd{(}\hlkwd{strsplit}\hlstd{(}\hlkwd{as.character}\hlstd{(PS1),} \hlkwc{split} \hlstd{=} \hlstr{", "}\hlstd{)))}
\hlstd{PS2}\hlkwb{<-}\hlkwd{as.numeric}\hlstd{(}\hlkwd{unlist}\hlstd{(}\hlkwd{strsplit}\hlstd{(}\hlkwd{as.character}\hlstd{(PS2),} \hlkwc{split} \hlstd{=} \hlstr{", "}\hlstd{)))}
\hlstd{PS3}\hlkwb{<-}\hlkwd{as.numeric}\hlstd{(}\hlkwd{unlist}\hlstd{(}\hlkwd{strsplit}\hlstd{(}\hlkwd{as.character}\hlstd{(PS3),} \hlkwc{split} \hlstd{=} \hlstr{", "}\hlstd{)))}
\hlstd{PS5}\hlkwb{<-}\hlkwd{as.numeric}\hlstd{(}\hlkwd{unlist}\hlstd{(}\hlkwd{strsplit}\hlstd{(}\hlkwd{as.character}\hlstd{(PS5),} \hlkwc{split} \hlstd{=} \hlstr{", "}\hlstd{)))}
\hlstd{PS}\hlkwb{<-}\hlkwd{as.numeric}\hlstd{(}\hlkwd{names}\hlstd{(}\hlkwd{table}\hlstd{(}\hlkwd{c}\hlstd{(PS1,PS2,PS3,PS5))[}\hlkwd{table}\hlstd{(}\hlkwd{c}\hlstd{(PS1,PS2,PS3,PS5))}\hlopt{>}\hlnum{1}\hlstd{]))}
\hlcom{## manually}
\hlstd{PS}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlnum{447}\hlstd{,} \hlnum{552}\hlstd{,} \hlnum{800}\hlstd{,} \hlnum{928}\hlstd{)}
\hlcom{## This is codon positions :)}
\hlstd{PS}\hlkwb{<-}\hlstd{PS}\hlopt{*}\hlnum{3}
\end{alltt}
\end{kframe}
\end{knitrout}
Updated positions under PS
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{PS}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlnum{448}\hlstd{,} \hlnum{472}\hlstd{,} \hlnum{553}\hlstd{,} \hlnum{930}\hlstd{)}
\hlstd{PS}\hlkwb{<-}\hlstd{PS}\hlopt{*}\hlnum{3}
\end{alltt}
\end{kframe}
\end{knitrout}
\section{Plot}
\subsection{Plot Fyco1 on absolute coordinates}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{head}\hlstd{(fyco1tab)}
\end{alltt}
\begin{verbatim}
## posref gwas1 gwas2 gwas3 pos_aln
## 1 45917899 FALSE FALSE FALSE NA
## 2 45917900 FALSE FALSE FALSE NA
## 3 45917901 FALSE FALSE FALSE NA
## 4 45917902 FALSE FALSE FALSE NA
## 5 45917903 FALSE FALSE FALSE NA
## 6 45917904 FALSE FALSE FALSE NA
\end{verbatim}
\begin{alltt}
\hlstd{fyco1tab}\hlopt{$}\hlstd{PS}\hlkwb{<-}\hlnum{FALSE}
\hlstd{fyco1tab}\hlopt{$}\hlstd{PS[fyco1tab}\hlopt{$}\hlstd{pos_aln} \hlopt{%in%} \hlstd{PS]}\hlkwb{<-}\hlnum{TRUE}
\end{alltt}
\end{kframe}
\end{knitrout}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{plot}\hlstd{(fyco1tab}\hlopt{$}\hlstd{posref,} \hlkwd{rep}\hlstd{(}\hlnum{1}\hlstd{,} \hlkwd{nrow}\hlstd{(fyco1tab)),} \hlkwc{cex}\hlstd{=}\hlnum{0.2}\hlstd{,}
\hlkwc{xlab}\hlstd{=}\hlstr{"chromosome 3"}\hlstd{,} \hlkwc{ylab}\hlstd{=}\hlstr{""}\hlstd{)}
\hlkwd{points}\hlstd{(fyco1tab}\hlopt{$}\hlstd{posref[}\hlkwd{is.na}\hlstd{(fyco1tab}\hlopt{$}\hlstd{pos_aln)}\hlopt{==}\hlnum{FALSE}\hlstd{],}
\hlkwd{rep}\hlstd{(}\hlnum{1}\hlstd{,} \hlkwd{length}\hlstd{(fyco1tab}\hlopt{$}\hlstd{posref[}\hlkwd{is.na}\hlstd{(fyco1tab}\hlopt{$}\hlstd{pos_aln)}\hlopt{==}\hlnum{FALSE}\hlstd{])),}
\hlkwc{cex}\hlstd{=}\hlnum{0.5}\hlstd{,} \hlkwc{pch}\hlstd{=}\hlnum{15}\hlstd{,} \hlkwc{col}\hlstd{=}\hlstr{"lightblue"}\hlstd{)}
\hlstd{posgwas2}\hlkwb{<-}\hlstd{fyco1tab}\hlopt{$}\hlstd{posref[fyco1tab}\hlopt{$}\hlstd{gwas2}\hlopt{==}\hlnum{TRUE}\hlstd{]}
\hlkwd{Arrows}\hlstd{(}\hlkwc{x0}\hlstd{=posgwas2,} \hlkwc{y0}\hlstd{=}\hlkwd{rep}\hlstd{(}\hlnum{0.94}\hlstd{,} \hlkwd{length}\hlstd{(posgwas2)),}
\hlkwc{x1}\hlstd{=posgwas2,} \hlkwc{y1}\hlstd{=}\hlkwd{rep}\hlstd{(}\hlnum{0.96}\hlstd{,} \hlkwd{length}\hlstd{(posgwas2)),}
\hlkwc{arr.type}\hlstd{=}\hlstr{"triangle"}\hlstd{,} \hlkwc{arr.length}\hlstd{=}\hlnum{0.6}\hlstd{)}
\hlkwd{text}\hlstd{(}\hlnum{45930000}\hlstd{,} \hlnum{1.2}\hlstd{,} \hlstr{"PS"}\hlstd{)}
\hlstd{posps}\hlkwb{<-}\hlstd{fyco1tab}\hlopt{$}\hlstd{posref[fyco1tab}\hlopt{$}\hlstd{PS}\hlopt{==}\hlnum{TRUE}\hlstd{]}
\hlkwd{Arrows}\hlstd{(}\hlkwc{x0}\hlstd{=posps,} \hlkwc{y0}\hlstd{=}\hlkwd{rep}\hlstd{(}\hlnum{1.06}\hlstd{,} \hlkwd{length}\hlstd{(posps)),}
\hlkwc{x1}\hlstd{=posps,} \hlkwc{y1}\hlstd{=}\hlkwd{rep}\hlstd{(}\hlnum{1.04}\hlstd{,} \hlkwd{length}\hlstd{(posps)),}
\hlkwc{arr.type}\hlstd{=}\hlstr{"triangle"}\hlstd{,} \hlkwc{arr.length}\hlstd{=}\hlnum{0.6}\hlstd{)}
\hlkwd{text}\hlstd{(}\hlnum{45930000}\hlstd{,} \hlnum{0.8}\hlstd{,} \hlstr{"GWAS, pval<0.01, Pairo-Castineira et al."}\hlstd{)}
\end{alltt}
\end{kframe}
\end{knitrout}
Nouvelle version avec des barres.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{library}\hlstd{(ggplot2)}
\end{alltt}
{\ttfamily\noindent\bfseries\color{errorcolor}{\#\# Error in library(ggplot2): there is no package called 'ggplot2'}}\begin{alltt}
\hlkwd{library}\hlstd{(gggenes)}
\end{alltt}
{\ttfamily\noindent\bfseries\color{errorcolor}{\#\# Error in library(gggenes): there is no package called 'gggenes'}}\end{kframe}
\end{knitrout}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{example_fyco1}\hlkwb{<-}\hlstd{gtf[gtf}\hlopt{$}\hlstd{V2}\hlopt{==}\hlstr{"CDS"}\hlstd{,} \hlnum{3}\hlopt{:}\hlnum{4}\hlstd{]}
\hlkwd{names}\hlstd{(example_fyco1)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"start"}\hlstd{,} \hlstr{"end"}\hlstd{)}
\hlstd{example_fyco1}\hlopt{$}\hlstd{gene}\hlkwb{<-}\hlstr{"fyco1"}
\hlstd{example_fyco1}\hlopt{$}\hlstd{start}\hlkwb{<-}\hlkwd{min}\hlstd{(example_fyco1}\hlopt{$}\hlstd{start)}
\hlstd{example_fyco1}\hlopt{$}\hlstd{end}\hlkwb{<-}\hlkwd{max}\hlstd{(example_fyco1}\hlopt{$}\hlstd{end)}
\hlstd{example_fyco1}\hlkwb{<-}\hlstd{example_fyco1[}\hlnum{1}\hlstd{,]}
\hlstd{g}\hlkwb{<-} \hlkwd{ggplot}\hlstd{(example_fyco1,} \hlkwd{aes}\hlstd{(}\hlkwc{xmin} \hlstd{= start,} \hlkwc{xmax} \hlstd{= end,} \hlkwc{y} \hlstd{= gene,} \hlkwc{fill} \hlstd{= gene))}
\end{alltt}
{\ttfamily\noindent\bfseries\color{errorcolor}{\#\# Error in ggplot(example\_fyco1, aes(xmin = start, xmax = end, y = gene, : impossible de trouver la fonction "{}ggplot"{}}}\begin{alltt}
\hlstd{g} \hlkwb{<-} \hlstd{g} \hlopt{+} \hlkwd{geom_gene_arrow}\hlstd{()}
\end{alltt}
{\ttfamily\noindent\bfseries\color{errorcolor}{\#\# Error in eval(expr, envir, enclos): objet 'g' introuvable}}\begin{alltt}
\hlcom{#g}
\end{alltt}
\end{kframe}
\end{knitrout}
En fait quand on dessine les exons avec une largeur de traits normal, les introns sont beaucoup trop gros.
\subsection{Plot Fyco1 on CDS coordinates}
Conversion des coordonnées dans example\_fyco1
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{names}\hlstd{(example_fyco1)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"start_abs"}\hlstd{,} \hlstr{"end_abs"}\hlstd{,} \hlstr{"gene"}\hlstd{)}
\hlstd{example_fyco1}\hlopt{$}\hlstd{end}\hlkwb{<-}\hlkwd{sapply}\hlstd{(example_fyco1}\hlopt{$}\hlstd{start_abs,} \hlkwa{function}\hlstd{(}\hlkwc{x}\hlstd{)}
\hlstd{fyco1tab}\hlopt{$}\hlstd{pos_aln[fyco1tab}\hlopt{$}\hlstd{posref}\hlopt{==}\hlstd{x])}
\hlstd{example_fyco1}\hlopt{$}\hlstd{start}\hlkwb{<-}\hlkwd{sapply}\hlstd{(example_fyco1}\hlopt{$}\hlstd{end_abs,} \hlkwa{function}\hlstd{(}\hlkwc{x}\hlstd{)}
\hlstd{fyco1tab}\hlopt{$}\hlstd{pos_aln[fyco1tab}\hlopt{$}\hlstd{posref}\hlopt{==}\hlstd{x])}
\hlcom{#example_fyco1$gene<-paste0("exon", 1:nrow(example_fyco1))}
\hlstd{example_fyco1_multi}\hlkwb{<-}\hlkwd{rbind}\hlstd{(example_fyco1, example_fyco1, example_fyco1)}
\hlstd{example_fyco1_multi}\hlopt{$}\hlstd{molecule}\hlkwb{<-}\hlkwd{rep}\hlstd{(}\hlkwd{c}\hlstd{(}\hlstr{"PS"}\hlstd{,} \hlstr{"GWAS3"}\hlstd{,} \hlstr{"GWAS2"}\hlstd{),}
\hlkwc{each}\hlstd{=}\hlkwd{nrow}\hlstd{(example_fyco1))}
\hlstd{g}\hlkwb{<-} \hlkwd{ggplot}\hlstd{(example_fyco1_multi,}
\hlkwd{aes}\hlstd{(}\hlkwc{xmin} \hlstd{= start,} \hlkwc{xmax} \hlstd{= end,} \hlkwc{y} \hlstd{= molecule,} \hlkwc{fill} \hlstd{= gene))}
\end{alltt}
{\ttfamily\noindent\bfseries\color{errorcolor}{\#\# Error in ggplot(example\_fyco1\_multi, aes(xmin = start, xmax = end, y = molecule, : impossible de trouver la fonction "{}ggplot"{}}}\begin{alltt}
\hlstd{g} \hlkwb{<-} \hlstd{g} \hlopt{+} \hlkwd{geom_gene_arrow}\hlstd{(}\hlkwc{arrowhead_height} \hlstd{=} \hlkwd{unit}\hlstd{(}\hlnum{3}\hlstd{,} \hlstr{"mm"}\hlstd{),}
\hlkwc{arrowhead_width} \hlstd{=} \hlkwd{unit}\hlstd{(}\hlnum{0.5}\hlstd{,} \hlstr{"mm"}\hlstd{))}
\end{alltt}
{\ttfamily\noindent\bfseries\color{errorcolor}{\#\# Error in eval(expr, envir, enclos): objet 'g' introuvable}}\begin{alltt}
\hlstd{g} \hlkwb{<-} \hlstd{g} \hlopt{+} \hlkwd{theme_genes}\hlstd{()}
\end{alltt}
{\ttfamily\noindent\bfseries\color{errorcolor}{\#\# Error in eval(expr, envir, enclos): objet 'g' introuvable}}\begin{alltt}
\hlcom{#g}
\end{alltt}
\end{kframe}
\end{knitrout}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{PS_subgene}\hlkwb{<-}\hlkwd{sort}\hlstd{(fyco1tab[fyco1tab}\hlopt{$}\hlstd{PS}\hlopt{==}\hlnum{TRUE}\hlstd{,}\hlstr{"pos_aln"}\hlstd{])}
\hlstd{subgene_PS}\hlkwb{<-}\hlkwd{as.data.frame}\hlstd{(}\hlkwd{t}\hlstd{(}\hlkwd{sapply}\hlstd{(PS_subgene,} \hlkwa{function}\hlstd{(}\hlkwc{x}\hlstd{)\{}
\hlstd{line}\hlkwb{<-}\hlkwd{c}\hlstd{(example_fyco1[example_fyco1}\hlopt{$}\hlstd{start}\hlopt{<=}\hlstd{x} \hlopt{&} \hlstd{example_fyco1}\hlopt{$}\hlstd{end}\hlopt{>}\hlstd{x,], x, x)}
\hlkwd{names}\hlstd{(line)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"start_abs"}\hlstd{,} \hlstr{"end_abs"}\hlstd{,} \hlstr{"gene"}\hlstd{,} \hlstr{"start"}\hlstd{,} \hlstr{"end"}\hlstd{,} \hlstr{"from"}\hlstd{,} \hlstr{"to"}\hlstd{)}
\hlkwd{return}\hlstd{(line)}
\hlstd{\})))}
\hlstd{subgene_PS}\hlkwb{<-}\hlstd{subgene_PS[,}\hlkwd{c}\hlstd{(}\hlstr{"gene"}\hlstd{,} \hlstr{"start"}\hlstd{,} \hlstr{"end"}\hlstd{,} \hlstr{"from"}\hlstd{,} \hlstr{"to"}\hlstd{)]}
\hlstd{subgene_PS}\hlopt{$}\hlstd{gene}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_PS}\hlopt{$}\hlstd{gene)}
\hlstd{subgene_PS}\hlopt{$}\hlstd{start}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_PS}\hlopt{$}\hlstd{start)}
\hlstd{subgene_PS}\hlopt{$}\hlstd{end}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_PS}\hlopt{$}\hlstd{end)}
\hlstd{subgene_PS}\hlopt{$}\hlstd{from}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_PS}\hlopt{$}\hlstd{from)}
\hlstd{subgene_PS}\hlopt{$}\hlstd{to}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_PS}\hlopt{$}\hlstd{to)}
\hlstd{subgene_PS}\hlopt{$}\hlstd{molecule}\hlkwb{<-}\hlstr{"PS"}
\hlcom{###########}
\hlstd{gwas_subgene}\hlkwb{<-}\hlkwd{sort}\hlstd{(fyco1tab[fyco1tab}\hlopt{$}\hlstd{gwas2}\hlopt{==}\hlnum{TRUE}\hlstd{,}\hlstr{"pos_aln"}\hlstd{])}
\hlstd{subgene_gwas}\hlkwb{<-}\hlkwd{as.data.frame}\hlstd{(}\hlkwd{t}\hlstd{(}\hlkwd{sapply}\hlstd{(gwas_subgene,} \hlkwa{function}\hlstd{(}\hlkwc{x}\hlstd{)\{}
\hlstd{line}\hlkwb{<-}\hlkwd{c}\hlstd{(example_fyco1[example_fyco1}\hlopt{$}\hlstd{start}\hlopt{<=}\hlstd{x} \hlopt{&} \hlstd{example_fyco1}\hlopt{$}\hlstd{end}\hlopt{>}\hlstd{x,], x, x)}
\hlkwd{names}\hlstd{(line)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"start_abs"}\hlstd{,} \hlstr{"end_abs"}\hlstd{,} \hlstr{"gene"}\hlstd{,} \hlstr{"start"}\hlstd{,} \hlstr{"end"}\hlstd{,} \hlstr{"from"}\hlstd{,} \hlstr{"to"}\hlstd{)}
\hlkwd{return}\hlstd{(line)}
\hlstd{\})))}
\hlstd{subgene_gwas}\hlkwb{<-}\hlstd{subgene_gwas[,}\hlkwd{c}\hlstd{(}\hlstr{"gene"}\hlstd{,} \hlstr{"start"}\hlstd{,} \hlstr{"end"}\hlstd{,} \hlstr{"from"}\hlstd{,} \hlstr{"to"}\hlstd{)]}
\hlstd{subgene_gwas}\hlopt{$}\hlstd{gene}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_gwas}\hlopt{$}\hlstd{gene)}
\hlstd{subgene_gwas}\hlopt{$}\hlstd{start}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_gwas}\hlopt{$}\hlstd{start)}
\hlstd{subgene_gwas}\hlopt{$}\hlstd{end}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_gwas}\hlopt{$}\hlstd{end)}
\hlstd{subgene_gwas}\hlopt{$}\hlstd{from}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_gwas}\hlopt{$}\hlstd{from)}
\hlstd{subgene_gwas}\hlopt{$}\hlstd{to}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_gwas}\hlopt{$}\hlstd{to)}
\hlstd{subgene_gwas}\hlopt{$}\hlstd{molecule}\hlkwb{<-}\hlstr{"GWAS2"}
\hlcom{###########}
\hlstd{gwas_subgene}\hlkwb{<-}\hlkwd{sort}\hlstd{(fyco1tab[fyco1tab}\hlopt{$}\hlstd{gwas3}\hlopt{==}\hlnum{TRUE}\hlstd{,}\hlstr{"pos_aln"}\hlstd{])}
\hlstd{subgene_gwas3}\hlkwb{<-}\hlkwd{as.data.frame}\hlstd{(}\hlkwd{t}\hlstd{(}\hlkwd{sapply}\hlstd{(gwas_subgene,} \hlkwa{function}\hlstd{(}\hlkwc{x}\hlstd{)\{}
\hlstd{line}\hlkwb{<-}\hlkwd{c}\hlstd{(example_fyco1[example_fyco1}\hlopt{$}\hlstd{start}\hlopt{<=}\hlstd{x} \hlopt{&} \hlstd{example_fyco1}\hlopt{$}\hlstd{end}\hlopt{>}\hlstd{x,], x, x)}
\hlkwd{names}\hlstd{(line)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"start_abs"}\hlstd{,} \hlstr{"end_abs"}\hlstd{,} \hlstr{"gene"}\hlstd{,} \hlstr{"start"}\hlstd{,} \hlstr{"end"}\hlstd{,} \hlstr{"from"}\hlstd{,} \hlstr{"to"}\hlstd{)}
\hlkwd{return}\hlstd{(line)}
\hlstd{\})))}
\hlstd{subgene_gwas3}\hlkwb{<-}\hlstd{subgene_gwas3[,}\hlkwd{c}\hlstd{(}\hlstr{"gene"}\hlstd{,} \hlstr{"start"}\hlstd{,} \hlstr{"end"}\hlstd{,} \hlstr{"from"}\hlstd{,} \hlstr{"to"}\hlstd{)]}
\hlstd{subgene_gwas3}\hlopt{$}\hlstd{gene}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_gwas3}\hlopt{$}\hlstd{gene)}
\hlstd{subgene_gwas3}\hlopt{$}\hlstd{start}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_gwas3}\hlopt{$}\hlstd{start)}
\hlstd{subgene_gwas3}\hlopt{$}\hlstd{end}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_gwas3}\hlopt{$}\hlstd{end)}
\hlstd{subgene_gwas3}\hlopt{$}\hlstd{from}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_gwas3}\hlopt{$}\hlstd{from)}
\hlstd{subgene_gwas3}\hlopt{$}\hlstd{to}\hlkwb{<-}\hlkwd{unlist}\hlstd{(subgene_gwas3}\hlopt{$}\hlstd{to)}
\hlstd{subgene_gwas3}\hlopt{$}\hlstd{molecule}\hlkwb{<-}\hlstr{"GWAS3"}
\hlcom{###########}
\hlstd{subgene_fyco1}\hlkwb{<-}\hlkwd{as.data.frame}\hlstd{(}\hlkwd{rbind}\hlstd{(subgene_PS, subgene_gwas, subgene_gwas3))}
\hlstd{g}\hlkwb{<-} \hlkwd{ggplot}\hlstd{(example_fyco1_multi,}
\hlkwd{aes}\hlstd{(}\hlkwc{xmin} \hlstd{= start,} \hlkwc{xmax} \hlstd{= end,} \hlkwc{y} \hlstd{= molecule,} \hlkwc{fill} \hlstd{=} \hlstr{"white"}\hlstd{))}
\end{alltt}
{\ttfamily\noindent\bfseries\color{errorcolor}{\#\# Error in ggplot(example\_fyco1\_multi, aes(xmin = start, xmax = end, y = molecule, : impossible de trouver la fonction "{}ggplot"{}}}\begin{alltt}
\hlstd{g} \hlkwb{<-} \hlstd{g} \hlopt{+} \hlkwd{geom_gene_arrow}\hlstd{(}\hlkwc{arrowhead_height} \hlstd{=} \hlkwd{unit}\hlstd{(}\hlnum{3}\hlstd{,} \hlstr{"mm"}\hlstd{),}
\hlkwc{arrowhead_width} \hlstd{=} \hlkwd{unit}\hlstd{(}\hlnum{0}\hlstd{,} \hlstr{"mm"}\hlstd{),} \hlkwc{fill}\hlstd{=}\hlstr{"white"}\hlstd{)}
\end{alltt}
{\ttfamily\noindent\bfseries\color{errorcolor}{\#\# Error in eval(expr, envir, enclos): objet 'g' introuvable}}\begin{alltt}
\hlstd{g} \hlkwb{<-} \hlstd{g} \hlopt{+} \hlkwd{geom_subgene_arrow}\hlstd{(}\hlkwc{data} \hlstd{= subgene_fyco1,}
\hlkwc{arrowhead_height} \hlstd{=} \hlkwd{unit}\hlstd{(}\hlnum{3}\hlstd{,} \hlstr{"mm"}\hlstd{),} \hlkwc{arrowhead_width} \hlstd{=} \hlkwd{unit}\hlstd{(}\hlnum{0}\hlstd{,} \hlstr{"mm"}\hlstd{),}
\hlkwd{aes}\hlstd{(}\hlkwc{xmin} \hlstd{= start,} \hlkwc{xmax} \hlstd{= end,} \hlkwc{y} \hlstd{= molecule,} \hlkwc{fill} \hlstd{=} \hlstr{"white"}\hlstd{,}
\hlkwc{xsubmin} \hlstd{= from,} \hlkwc{xsubmax} \hlstd{= to),} \hlkwc{color}\hlstd{=}\hlstr{"red"}\hlstd{,} \hlkwc{alpha}\hlstd{=}\hlnum{1}\hlstd{)}
\end{alltt}
{\ttfamily\noindent\bfseries\color{errorcolor}{\#\# Error in eval(expr, envir, enclos): objet 'g' introuvable}}\begin{alltt}
\hlstd{g} \hlkwb{<-} \hlstd{g} \hlopt{+} \hlkwd{theme_genes}\hlstd{()}
\end{alltt}
{\ttfamily\noindent\bfseries\color{errorcolor}{\#\# Error in eval(expr, envir, enclos): objet 'g' introuvable}}\begin{alltt}
\hlstd{g}
\end{alltt}
{\ttfamily\noindent\bfseries\color{errorcolor}{\#\# Error in eval(expr, envir, enclos): objet 'g' introuvable}}\end{kframe}
\end{knitrout}
\section{Do position match between analysis?}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{fyco1tab[}\hlkwd{is.na}\hlstd{(fyco1tab}\hlopt{$}\hlstd{pos_aln)}\hlopt{==}\hlnum{FALSE} \hlopt{&}
\hlstd{(fyco1tab}\hlopt{$}\hlstd{PS}\hlopt{==}\hlnum{TRUE} \hlopt{|} \hlstd{fyco1tab}\hlopt{$}\hlstd{gwas2}\hlopt{==}\hlnum{TRUE} \hlopt{|} \hlstd{fyco1tab}\hlopt{$}\hlstd{gwas3}\hlopt{==}\hlnum{TRUE}\hlstd{),]}
\end{alltt}
\begin{verbatim}
## posref gwas1 gwas2 gwas3 pos_aln PS
## 48433 45966331 FALSE FALSE TRUE 3009 FALSE
## 48435 45966333 FALSE FALSE TRUE 3007 FALSE
## 48652 45966550 FALSE FALSE FALSE 2790 TRUE
## 48697 45966595 FALSE FALSE TRUE 2745 FALSE
## 49780 45967678 FALSE FALSE FALSE 1659 TRUE
## 50023 45967921 FALSE FALSE FALSE 1416 TRUE
## 50095 45967993 FALSE FALSE FALSE 1344 TRUE
## 50097 45967995 FALSE FALSE TRUE 1342 FALSE
## 50145 45968043 FALSE TRUE FALSE 1294 FALSE
## 50617 45968515 FALSE FALSE TRUE 819 FALSE
\end{verbatim}
\begin{alltt}
\hlstd{gwastab[gwastab}\hlopt{$}\hlstd{all_inv_var_meta_p}\hlopt{<}\hlstd{thres} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS} \hlopt{%in%} \hlkwd{c}\hlstd{(}\hlnum{45966331}\hlstd{,} \hlnum{45966333}\hlstd{,} \hlnum{45966595}\hlstd{,} \hlnum{45967995}\hlstd{,} \hlnum{45968515}\hlstd{),]}
\end{alltt}
\begin{verbatim}
## CHR POS REF ALT SNP all_meta_N all_inv_var_meta_beta all_inv_var_meta_sebeta
## 1428785 3 45966331 G T 3:45966331:G:T 23 0.38699 0.027678
## 1428786 3 45966333 T C 3:45966333:T:C 23 0.37298 0.027352
## 1428788 3 45966595 G A 3:45966595:G:A 23 0.38420 0.027682
## 1428791 3 45967995 G A 3:45967995:G:A 23 0.37943 0.027617
## 1428794 3 45968515 T C 3:45968515:T:C 23 0.37288 0.027339
## all_inv_var_meta_p all_inv_var_meta_cases\ttotal all_inv_var_meta_controls
## 1428785 2.0106e-44 8475 999425
## 1428786 2.4293e-42 8475 999425
## 1428788 8.4910e-44 8475 999425
## 1428791 5.9274e-43 8475 999425
## 1428794 2.3510e-42 8475 999425
## all_inv_var_meta_effective all_inv_var_het_p all_meta_AF rsid
## 1428785 7110 0.00081500 0.1221 rs13079478
## 1428786 7110 0.00107150 0.1230 rs13059238
## 1428788 7110 0.00155590 0.1222 rs13079869
## 1428791 7110 0.00039328 0.1224 rs33910087
## 1428794 7110 0.00111540 0.1230 rs13071283
\end{verbatim}
\end{kframe}
\end{knitrout}
\subsection{Amino-acid change}
\textbf{Position 273}
3-45968515-T-C : https://gnomad.broadinstitute.org/variant/3-45968515-T-C?dataset=gnomad\_r3
Gln273Gln
dbSNP (rs13071283)
\textbf{Position 447}
3-45967995-G-A : https://gnomad.broadinstitute.org/variant/3-45967995-G-A?dataset=gnomad\_r3
Arg447Cys
dbSNP (rs33910087)
\textbf{Position 915}
3-45966595-G-A : https://gnomad.broadinstitute.org/variant/3-45966595-G-A?dataset=gnomad\_r3
Cys913Cys
dbSNP (rs13079869)
\textbf{Position 1002}
3-45966333-T-C : https://gnomad.broadinstitute.org/variant/3-45966333-T-C?dataset=gnomad\_r3
Asn1001Asp
dbSNP (rs13059238)
\textbf{Position 1003}
3-45966331-G-T : https://gnomad.broadinstitute.org/variant/3-45966331-G-T?dataset=gnomad\_r3
Asn1001Lys
dbSNP (rs13079478)
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{fyco1tab[fyco1tab}\hlopt{$}\hlstd{posref}\hlopt{==}\hlnum{45966556}\hlstd{,]}
\end{alltt}
\begin{verbatim}
## posref gwas1 gwas2 gwas3 pos_aln PS
## 48658 45966556 FALSE FALSE FALSE 2784 FALSE
\end{verbatim}
\begin{alltt}
\hlkwd{dim}\hlstd{(gwastab)}
\end{alltt}
\begin{verbatim}
## [1] 214 15
\end{verbatim}
\begin{alltt}
\hlstd{gwastab}\hlkwb{<-}\hlkwd{read.table}\hlstd{(gwas3,} \hlkwc{h}\hlstd{=F,} \hlkwc{sep}\hlstd{=}\hlstr{"\textbackslash{}t"}\hlstd{)}
\hlkwd{names}\hlstd{(gwastab)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"CHR"}\hlstd{,} \hlstr{"POS"}\hlstd{,} \hlstr{"REF"}\hlstd{,} \hlstr{"ALT"}\hlstd{,} \hlstr{"SNP"}\hlstd{,} \hlstr{"all_meta_N"}\hlstd{,}
\hlstr{"all_inv_var_meta_beta"}\hlstd{,} \hlstr{"all_inv_var_meta_sebeta"}\hlstd{,}
\hlstr{"all_inv_var_meta_p"}\hlstd{,} \hlstr{"all_inv_var_meta_cases total"}\hlstd{,}
\hlstr{"all_inv_var_meta_controls"}\hlstd{,} \hlstr{"all_inv_var_meta_effective"}\hlstd{,}
\hlstr{"all_inv_var_het_p"}\hlstd{,} \hlstr{"all_meta_AF"}\hlstd{,} \hlstr{"rsid"}\hlstd{)}
\hlstd{gwastab}\hlkwb{<-}\hlstd{gwastab[(gwastab}\hlopt{$}\hlstd{CHR}\hlopt{==}\hlnum{3} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS}\hlopt{>}\hlstd{deb} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS}\hlopt{<}\hlstd{end),]}
\hlstd{gwastab[gwastab}\hlopt{$}\hlstd{POS} \hlopt{%in%} \hlkwd{c}\hlstd{(}\hlnum{45966556}\hlstd{),]}
\end{alltt}
\begin{verbatim}
## [1] CHR POS REF
## [4] ALT SNP all_meta_N
## [7] all_inv_var_meta_beta all_inv_var_meta_sebeta all_inv_var_meta_p
## [10] all_inv_var_meta_cases\ttotal all_inv_var_meta_controls all_inv_var_meta_effective
## [13] all_inv_var_het_p all_meta_AF rsid
## <0 rows> (or 0-length row.names)
\end{verbatim}
\begin{alltt}
\hlstd{gwastab[(gwastab}\hlopt{$}\hlstd{POS}\hlopt{>}\hlnum{45966500} \hlopt{&} \hlstd{gwastab}\hlopt{$}\hlstd{POS}\hlopt{<}\hlnum{45966600}\hlstd{),]}
\end{alltt}
\begin{verbatim}
## CHR POS REF ALT SNP all_meta_N all_inv_var_meta_beta all_inv_var_meta_sebeta
## 1428788 3 45966595 G A 3:45966595:G:A 23 0.3842 0.027682
## all_inv_var_meta_p all_inv_var_meta_cases\ttotal all_inv_var_meta_controls
## 1428788 8.491e-44 8475 999425
## all_inv_var_meta_effective all_inv_var_het_p all_meta_AF rsid
## 1428788 7110 0.0015559 0.1222 rs13079869
\end{verbatim}
\end{kframe}
\end{knitrout}
\end{document}
File deleted
\documentclass[11pt, oneside]{article}\usepackage[]{graphicx}\usepackage[]{color}
% maxwidth is the original width if it is less than linewidth
% otherwise use linewidth (to make sure the graphics do not exceed the margin)
\makeatletter
\def\maxwidth{ %
\ifdim\Gin@nat@width>\linewidth
\linewidth
\else
\Gin@nat@width
\fi
}
\makeatother
\definecolor{fgcolor}{rgb}{0.345, 0.345, 0.345}
\newcommand{\hlnum}[1]{\textcolor[rgb]{0.686,0.059,0.569}{#1}}%
\newcommand{\hlstr}[1]{\textcolor[rgb]{0.192,0.494,0.8}{#1}}%
\newcommand{\hlcom}[1]{\textcolor[rgb]{0.678,0.584,0.686}{\textit{#1}}}%
\newcommand{\hlopt}[1]{\textcolor[rgb]{0,0,0}{#1}}%
\newcommand{\hlstd}[1]{\textcolor[rgb]{0.345,0.345,0.345}{#1}}%
\newcommand{\hlkwa}[1]{\textcolor[rgb]{0.161,0.373,0.58}{\textbf{#1}}}%
\newcommand{\hlkwb}[1]{\textcolor[rgb]{0.69,0.353,0.396}{#1}}%
\newcommand{\hlkwc}[1]{\textcolor[rgb]{0.333,0.667,0.333}{#1}}%
\newcommand{\hlkwd}[1]{\textcolor[rgb]{0.737,0.353,0.396}{\textbf{#1}}}%
\let\hlipl\hlkwb
\usepackage{framed}
\makeatletter
\newenvironment{kframe}{%
\def\at@end@of@kframe{}%
\ifinner\ifhmode%
\def\at@end@of@kframe{\end{minipage}}%
\begin{minipage}{\columnwidth}%
\fi\fi%
\def\FrameCommand##1{\hskip\@totalleftmargin \hskip-\fboxsep
\colorbox{shadecolor}{##1}\hskip-\fboxsep
% There is no \\@totalrightmargin, so:
\hskip-\linewidth \hskip-\@totalleftmargin \hskip\columnwidth}%
\MakeFramed {\advance\hsize-\width
\@totalleftmargin\z@ \linewidth\hsize
\@setminipage}}%
{\par\unskip\endMakeFramed%
\at@end@of@kframe}
\makeatother
\definecolor{shadecolor}{rgb}{.97, .97, .97}
\definecolor{messagecolor}{rgb}{0, 0, 0}
\definecolor{warningcolor}{rgb}{1, 0, 1}
\definecolor{errorcolor}{rgb}{1, 0, 0}
\newenvironment{knitrout}{}{} % an empty environment to be redefined in TeX
\usepackage{alltt} % use "amsart" instead of "article" for AMSLaTeX format
%\usepackage{geometry} % See geometry.pdf to learn the layout options. There are lots.
%\geometry{letterpaper} % ... or a4paper or a5paper or ...
%\geometry{landscape} % Activate for for rotated page geometry
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
%\usepackage{graphicx} % Use pdf, png, jpg, or eps with pdflatex; use eps in DVI mode
% TeX will automatically convert eps --> pdf in pdflatex
%\usepackage{amssymb}
\usepackage[utf8]{inputenc}
%\usepackage[cyr]{aeguill}
%\usepackage[francais]{babel}
%\usepackage{hyperref}
\title{Positive selection on genes interacting with SARS-Cov2, comparison of different analysis, maic}
\author{Marie Cariou}
\date{March 2021} % Activate to display a given date or no date
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\begin{document}
\maketitle
\tableofcontents
\newpage
\section{Data}
output from covid\_comp\_dataset.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{tablo}\hlkwb{<-}\hlkwd{read.table}\hlstd{(}\hlstr{"primatesVbats.csv"}\hlstd{,}
\hlkwc{h}\hlstd{=T,} \hlkwc{sep}\hlstd{=}\hlstr{","}\hlstd{)}
\end{alltt}
\end{kframe}
\end{knitrout}
Output MAIC formatted by Léa. This table includes the DGINN "score".
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{home}\hlkwb{<-}\hlstr{"/home/adminmarie/Documents/"}
\hlstd{workdir}\hlkwb{<-}\hlkwd{paste0}\hlstd{(home,} \hlstr{"CIRI_BIBS_projects/2020_05_Etienne_covid/"}\hlstd{)}
\hlstd{maic}\hlkwb{<-}\hlkwd{read.table}\hlstd{(}\hlkwd{paste0}\hlstd{(workdir,} \hlstr{"data/covid_comp_maic.txt"}\hlstd{),}
\hlkwc{h}\hlstd{=T)}
\end{alltt}
\end{kframe}
\end{knitrout}
\section{MAIC}
\subsection{Boxplot}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{par}\hlstd{(}\hlkwc{mfrow}\hlstd{=}\hlkwd{c}\hlstd{(}\hlnum{2}\hlstd{,}\hlnum{1}\hlstd{))}
\hlkwd{boxplot}\hlstd{(maic}\hlopt{$}\hlstd{rank}\hlopt{~}\hlstd{maic}\hlopt{$}\hlstd{nbats,} \hlkwc{notch}\hlstd{=}\hlnum{TRUE}\hlstd{,} \hlkwc{varwidth}\hlstd{=}\hlnum{TRUE}\hlstd{,} \hlkwc{xlab}\hlstd{=}\hlstr{"score DGINN"}\hlstd{,} \hlkwc{ylab}\hlstd{=}\hlstr{"rank MAIC"}\hlstd{,} \hlkwc{main}\hlstd{=}\hlstr{"Bats"}\hlstd{)}
\end{alltt}
{\ttfamily\noindent\color{warningcolor}{\#\# Warning in bxp(list(stats = structure(c(21, 825, 1664, 2860, 5392, 15, 625.5, : some notches went outside hinges ('box'): maybe set notch=FALSE}}\begin{alltt}
\hlkwd{stripchart}\hlstd{(maic}\hlopt{$}\hlstd{rank}\hlopt{~}\hlstd{maic}\hlopt{$}\hlstd{nbats,} \hlkwc{method}\hlstd{=}\hlstr{"jitter"}\hlstd{,} \hlkwc{vertical}\hlstd{=}\hlnum{TRUE}\hlstd{,} \hlkwc{pch}\hlstd{=}\hlnum{1}\hlstd{,} \hlkwc{cex}\hlstd{=}\hlnum{0.3}\hlstd{,} \hlkwc{add}\hlstd{=}\hlnum{TRUE}\hlstd{)}
\hlkwd{boxplot}\hlstd{(maic}\hlopt{$}\hlstd{rank}\hlopt{~}\hlstd{maic}\hlopt{$}\hlstd{nprimates,} \hlkwc{notch}\hlstd{=}\hlnum{TRUE}\hlstd{,} \hlkwc{xlab}\hlstd{=}\hlstr{"score DGINN"}\hlstd{,} \hlkwc{ylab}\hlstd{=}\hlstr{"rank MAIC"}\hlstd{,} \hlkwc{main}\hlstd{=}\hlstr{"Primates"}\hlstd{)}
\hlkwd{stripchart}\hlstd{(maic}\hlopt{$}\hlstd{rank}\hlopt{~}\hlstd{maic}\hlopt{$}\hlstd{nprimates,} \hlkwc{method}\hlstd{=}\hlstr{"jitter"}\hlstd{,} \hlkwc{vertical}\hlstd{=}\hlnum{TRUE}\hlstd{,} \hlkwc{pch}\hlstd{=}\hlnum{1}\hlstd{,} \hlkwc{cex}\hlstd{=}\hlnum{0.3}\hlstd{,} \hlkwc{add}\hlstd{=}\hlnum{TRUE}\hlstd{)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/boxplot-1}
\end{knitrout}
\subsection{Dotchart}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{tmp}\hlkwb{<-}\hlstd{maic[maic}\hlopt{$}\hlstd{nbats}\hlopt{>=}\hlnum{3}\hlstd{,} \hlkwd{c}\hlstd{(}\hlstr{"gene"}\hlstd{,} \hlstr{"rank"}\hlstd{,} \hlstr{"nbats"}\hlstd{)]}
\hlstd{tmp}\hlkwb{<-}\hlstd{tmp[}\hlkwd{order}\hlstd{(tmp}\hlopt{$}\hlstd{rank,} \hlkwc{decreasing} \hlstd{=} \hlnum{TRUE}\hlstd{),]}
\hlstd{tmp}\hlopt{$}\hlstd{col}\hlkwb{<-}\hlstr{"black"}
\hlstd{tmp}\hlopt{$}\hlstd{col[tmp}\hlopt{$}\hlstd{gene}\hlopt{==}\hlstr{"ACE2"}\hlstd{]}\hlkwb{<-}\hlstr{"red"}
\hlstd{tmp}\hlopt{$}\hlstd{col[tmp}\hlopt{$}\hlstd{gene}\hlopt{==}\hlstr{"TMPRSS2"}\hlstd{]}\hlkwb{<-}\hlstr{"red"}
\hlstd{tmp}\hlopt{$}\hlstd{pch[tmp}\hlopt{$}\hlstd{nbats}\hlopt{==}\hlnum{5}\hlstd{]}\hlkwb{<-}\hlnum{1}
\hlstd{tmp}\hlopt{$}\hlstd{pch[tmp}\hlopt{$}\hlstd{nbats}\hlopt{==}\hlnum{4}\hlstd{]}\hlkwb{<-}\hlnum{20}
\hlstd{tmp}\hlopt{$}\hlstd{pch[tmp}\hlopt{$}\hlstd{nbats}\hlopt{==}\hlnum{3}\hlstd{]}\hlkwb{<-}\hlnum{4}
\hlkwd{dotchart}\hlstd{(tmp}\hlopt{$}\hlstd{rank,} \hlkwc{main}\hlstd{=}\hlstr{"Bats DGINN >=3"}\hlstd{,} \hlkwc{xlab}\hlstd{=}\hlstr{"rank MAIC"}\hlstd{,} \hlkwc{labels}\hlstd{=tmp}\hlopt{$}\hlstd{gene,} \hlkwc{pch}\hlstd{=tmp}\hlopt{$}\hlstd{pch,} \hlkwc{col}\hlstd{=tmp}\hlopt{$}\hlstd{col)}
\hlkwd{legend}\hlstd{(}\hlstr{"topright"}\hlstd{,} \hlkwd{c}\hlstd{(}\hlstr{"5 (score DGINN)"}\hlstd{,} \hlstr{"4"}\hlstd{,} \hlstr{"3"}\hlstd{),} \hlkwc{pch}\hlstd{=}\hlkwd{c}\hlstd{(}\hlnum{1}\hlstd{,}\hlnum{20}\hlstd{,}\hlnum{4}\hlstd{))}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/dotbats-1}
\end{knitrout}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{tmp}\hlkwb{<-}\hlstd{maic[maic}\hlopt{$}\hlstd{nprimates}\hlopt{>=}\hlnum{3}\hlstd{,} \hlkwd{c}\hlstd{(}\hlstr{"gene"}\hlstd{,} \hlstr{"rank"}\hlstd{,} \hlstr{"nprimates"}\hlstd{)]}
\hlstd{tmp}\hlkwb{<-}\hlstd{tmp[}\hlkwd{order}\hlstd{(tmp}\hlopt{$}\hlstd{rank,} \hlkwc{decreasing} \hlstd{=} \hlnum{TRUE}\hlstd{),]}
\hlstd{tmp}\hlopt{$}\hlstd{pch[tmp}\hlopt{$}\hlstd{nprimates}\hlopt{==}\hlnum{5}\hlstd{]}\hlkwb{<-}\hlnum{1}
\hlstd{tmp}\hlopt{$}\hlstd{pch[tmp}\hlopt{$}\hlstd{nprimates}\hlopt{==}\hlnum{4}\hlstd{]}\hlkwb{<-}\hlnum{20}
\hlstd{tmp}\hlopt{$}\hlstd{pch[tmp}\hlopt{$}\hlstd{nprimates}\hlopt{==}\hlnum{3}\hlstd{]}\hlkwb{<-}\hlnum{4}
\hlstd{tmp}\hlopt{$}\hlstd{col}\hlkwb{<-}\hlstr{"black"}
\hlstd{tmp}\hlopt{$}\hlstd{col[tmp}\hlopt{$}\hlstd{gene}\hlopt{==}\hlstr{"ACE2"}\hlstd{]}\hlkwb{<-}\hlstr{"red"}
\hlstd{tmp}\hlopt{$}\hlstd{col[tmp}\hlopt{$}\hlstd{gene}\hlopt{==}\hlstr{"TMPRSS2"}\hlstd{]}\hlkwb{<-}\hlstr{"red"}
\hlkwd{dotchart}\hlstd{(tmp}\hlopt{$}\hlstd{rank,} \hlkwc{main}\hlstd{=}\hlstr{"Primates DGINN >=3"}\hlstd{,} \hlkwc{xlab}\hlstd{=}\hlstr{"rank MAIC"}\hlstd{,} \hlkwc{labels}\hlstd{=tmp}\hlopt{$}\hlstd{gene,} \hlkwc{pch}\hlstd{=tmp}\hlopt{$}\hlstd{pch,} \hlkwc{cex}\hlstd{=}\hlnum{0.8}\hlstd{,} \hlkwc{col}\hlstd{=tmp}\hlopt{$}\hlstd{col)}
\hlkwd{legend}\hlstd{(}\hlstr{"topright"}\hlstd{,} \hlkwd{c}\hlstd{(}\hlstr{"5 (score DGINN)"}\hlstd{,} \hlstr{"4"}\hlstd{,} \hlstr{"3"}\hlstd{),} \hlkwc{pch}\hlstd{=}\hlkwd{c}\hlstd{(}\hlnum{1}\hlstd{,}\hlnum{20}\hlstd{,}\hlnum{4}\hlstd{))}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/dotprimates-1}
\end{knitrout}
\section{Pan Corona}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{pancorona}\hlkwb{<-}\hlkwd{read.table}\hlstd{(}\hlkwd{paste0}\hlstd{(workdir,} \hlstr{"data/pancorona_S5.csv"}\hlstd{),}
\hlkwc{h}\hlstd{=T,} \hlkwc{fill} \hlstd{=} \hlnum{TRUE}\hlstd{,} \hlkwc{sep}\hlstd{=}\hlstr{"\textbackslash{}t"}\hlstd{)}
\hlkwd{names}\hlstd{(pancorona)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"tmp.Gene.name"}\hlstd{,} \hlkwd{names}\hlstd{(pancorona)[}\hlopt{-}\hlnum{1}\hlstd{])}
\hlcom{# Genes en commun}
\hlstd{pancorona}\hlopt{$}\hlstd{tmp.Gene.name[pancorona}\hlopt{$}\hlstd{tmp.Gene.name} \hlopt{%in%} \hlstd{tablo}\hlopt{$}\hlstd{tmp.Gene.name]}
\end{alltt}
\begin{verbatim}
## [1] TBK1 MARK3 GIGYF2 MARK2 G3BP1 LARP1 ACE2 PABPC1
## [9] TMPRSS2 AP3B1 CLCC1 CSDE1 HECTD1 MARK1 MEPCE PDE4DIP
## [17] POR PRKAR2B RAB5C RTN4 SRP54 UBAP2 UBAP2L UBXN8
## [25] SPART BZW2 EIF4E2 SMOC1 STOML2 DDX21 FAM98A G3BP2
## [33] MOV10 PABPC4 UPF1
## 105 Levels: ACE2 ANPEP AP3B1 ATXN2L BTF3 BZW2 CKAP5 CLCC1 ... YTHDF2
\end{verbatim}
\begin{alltt}
\hlcom{# Uniquement dans le tableau pancorona}
\hlkwd{sort}\hlstd{(pancorona}\hlopt{$}\hlstd{tmp.Gene.name[(pancorona}\hlopt{$}\hlstd{tmp.Gene.name} \hlopt{%in%} \hlstd{tablo}\hlopt{$}\hlstd{tmp.Gene.name)}\hlopt{==}\hlnum{FALSE}\hlstd{])}
\end{alltt}
\begin{verbatim}
## [1] ANPEP ATXN2L BTF3 CKAP5 CTSB CTSL
## [7] CYB5R3 DDX1 DDX5 DDX58 DHX9 DNM1L
## [13] EEF1A1 EIF2A EIF3F EIF4B EZR FLNA
## [19] FURIN FUS GSK3A GSK3B HDLBP HNRNPA1
## [25] HNRNPD HNRNPF HNRNPU IFIH1 IGF2BP1 IKBKB
## [31] IKBKE IRF3 ISG15 KPNA3 KPNB1 MYH9
## [37] NCL POLD1 POLR2B PRKRA RBM14 RCHY1
## [43] RPL13A RPL26 RPS13 RPS17 RPS19 RPS9
## [49] SDCBP SERBP1 SGTA SLC1A5 SNAP47 SSB
## [55] STING1 SYNCRIP TANC1 TBCB TMPRSS11D TRAF3
## [61] TUBA4A TUBB2A TUBB4A TUBB6 USP10 VPS36
## [67] XRCC5 XRCC6 YBX1 YTHDF2
## 105 Levels: ACE2 ANPEP AP3B1 ATXN2L BTF3 BZW2 CKAP5 CLCC1 ... YTHDF2
\end{verbatim}
\begin{alltt}
\hlcom{## Uniquement dans tableau }
\hlkwd{sort}\hlstd{(tablo}\hlopt{$}\hlstd{tmp.Gene.name[(tablo}\hlopt{$}\hlstd{tmp.Gene.name} \hlopt{%in%} \hlstd{pancorona}\hlopt{$}\hlstd{tmp.Gene.name)}\hlopt{==}\hlnum{FALSE}\hlstd{])}
\end{alltt}
\begin{verbatim}
## [1] AAR2 AASS AATF ABCC1 ACAD9 ACADM
## [7] ACSL3 ADAM9 ADAMTS1 AGPS AKAP8 AKAP8L
## [13] AKAP9 ALG11 ALG5 ALG8 ANO6 AP2A2
## [19] AP2M1 ARF6 ATE1 ATP13A3 ATP1B1 ATP6AP1
## [25] ATP6V1A BAG5 BCKDK BRD2 BRD4 CCDC86
## [31] CDK5RAP2 CENPF CEP112 CEP135 CEP250 CEP350
## [37] CEP68 CHMP2A CHPF CHPF2 CISD3 CIT
## [43] CLIP4 CNTRL COL6A1 COLGALT1 COMT COQ8B
## [49] CRTC3 CSNK2A2 CSNK2B CUL2 CWC27 CYB5B
## [55] DCAF7 DCAKD DCTPP1 DDX10 DNAJC11 DNAJC19
## [61] DNMT1 DPH5 DPY19L1 ECSIT EDEM3 EIF4H
## [67] ELOC EMC1 ERC1 ERGIC1 ERLEC1 ERMP1
## [73] ERO1B ERP44 ETFA EXOSC2 EXOSC3 EXOSC5
## [79] EXOSC8 F2RL1 FAM162A FAM8A1 FAR2 FASTKD5
## [85] FBLN5 FBN1 FBN2 FBXL12 FKBP10 FKBP15
## [91] FKBP7 FOXRED2 FYCO1 GCC1 GCC2 GDF15
## [97] GFER GGCX GGH GHITM GLA GNB1
## [103] GNG5 GOLGA2 GOLGA3 GOLGA7 GOLGB1 GORASP1
## [109] GPAA1 GPX1 GRIPAP1 GRPEL1 GTF2F2 HDAC2
## [115] HEATR3 HMOX1 HOOK1 HS2ST1 HS6ST2 HSBP1
## [121] HYOU1 IDE IL17RA IMPDH2 INHBE INTS4
## [127] ITGB1 JAKMIP1 KDELC1 KDELC2 LARP4B LARP7
## [133] LMAN2 LOX MAP7D1 MARC1 MAT2B MDN1
## [139] MIB1 MIPOL1 MOGS MPHOSPH10 MRPS2 MRPS25
## [145] MRPS27 MRPS5 MTCH1 MYCBP2 NARS2 NAT14
## [151] NDFIP2 NDUFAF1 NDUFAF2 NDUFB9 NEK9 NEU1
## [157] NGDN NGLY1 NIN NINL NLRX1 NOL10
## [163] NPC2 NPTX1 NSD2 NUP210 NUP214 NUP54
## [169] NUP58 NUP62 NUP88 NUP98 NUTF2 OS9
## [175] PCNT PCSK5 PCSK6 PDZD11 PIGO PIGS
## [181] PITRM1 PKP2 PLAT PLD3 PLEKHA5 PLEKHF2
## [187] PLOD2 PMPCA PMPCB POFUT1 POLA1 POLA2
## [193] PPIL3 PPT1 PRIM1 PRIM2 PRKACA PRKAR2A
## [199] PRRC2B PSMD8 PTBP2 PTGES2 PUSL1 PVR
## [205] QSOX2 RAB10 RAB14 RAB18 RAB1A RAB2A
## [211] RAB7A RAB8A RAE1 RALA RAP1GDS1 RBM28
## [217] RBM41 RBX1 RDX REEP5 REEP6 RETREG3
## [223] RHOA RIPK1 RNF41 RPL36 RRP9 SAAL1
## [229] SBNO1 SCAP SCARB1 SCCPDH SDF2 SELENOS
## [235] SEPSECS SIL1 SIRT5 SLC25A21 SLC27A2 SLC30A6
## [241] SLC30A7 SLC30A9 SLC44A2 SLC9A3R1 SLU7 SNIP1
## [247] SRP19 SRP72 STC2 STOM SUN2 TAPT1
## [253] TARS2 TBCA TBKBP1 TCF12 THTPA TIMM10
## [259] TIMM10B TIMM29 TIMM8B TIMM9 TLE1 TLE3
## [265] TM2D3 TMED5 TMEM39B TMEM97 TOMM70 TOR1A
## [271] TOR1AIP1 TRIM59 TRMT1 TUBGCP2 TUBGCP3 TYSND1
## [277] UGGT2 USP54 VPS11 VPS39 WASHC4 WFS1
## [283] YIF1A ZC3H18 ZC3H7A ZDHHC5 ZNF318 ZNF503
## [289] ZYG11B
## 324 Levels: AAR2 AASS AATF ABCC1 ACAD9 ACADM ACE2 ACSL3 ... ZYG11B
\end{verbatim}
\end{kframe}
\end{knitrout}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{pancorona}\hlkwb{<-}\hlstd{pancorona[,}\hlkwd{c}\hlstd{(}\hlstr{"tmp.Gene.name"}\hlstd{,} \hlstr{"TOTAL"}\hlstd{)]}
\hlstd{pandginn}\hlkwb{<-}\hlkwd{na.omit}\hlstd{(}\hlkwd{merge}\hlstd{(pancorona, tablo,} \hlkwc{by}\hlstd{=}\hlstr{"tmp.Gene.name"}\hlstd{,} \hlkwc{all.x}\hlstd{=}\hlnum{TRUE}\hlstd{))}
\hlstd{pandginn}\hlkwb{<-}\hlstd{pandginn[}\hlkwd{order}\hlstd{(pandginn}\hlopt{$}\hlstd{nprimates),]}
\hlstd{pandginn}\hlkwb{<-}\hlstd{pandginn[}\hlkwd{order}\hlstd{(pandginn}\hlopt{$}\hlstd{TOTAL),]}
\hlkwd{dotchart}\hlstd{(}\hlkwd{as.matrix}\hlstd{(pandginn[,}\hlnum{2}\hlstd{]),} \hlkwc{labels} \hlstd{= pandginn}\hlopt{$}\hlstd{tmp.Gene.name,} \hlkwc{xlim}\hlstd{=}\hlkwd{c}\hlstd{(}\hlnum{0}\hlstd{,}\hlnum{5}\hlstd{))}
\hlkwd{points}\hlstd{(pandginn[,}\hlnum{4}\hlstd{],} \hlnum{1}\hlopt{:}\hlkwd{nrow}\hlstd{(pandginn),} \hlkwc{col}\hlstd{=}\hlstr{"blue"}\hlstd{,} \hlkwc{pch}\hlstd{=}\hlnum{20}\hlstd{,} \hlkwc{cex}\hlstd{=}\hlnum{0.7}\hlstd{)}
\hlkwd{points}\hlstd{(pandginn[,}\hlnum{3}\hlstd{],} \hlnum{1}\hlopt{:}\hlkwd{nrow}\hlstd{(pandginn),} \hlkwc{col}\hlstd{=}\hlstr{"blue"}\hlstd{,} \hlkwc{pch}\hlstd{=}\hlnum{4}\hlstd{)}
\hlkwd{legend}\hlstd{(}\hlstr{"bottomright"}\hlstd{,} \hlkwd{c}\hlstd{(}\hlstr{"pancorona score"}\hlstd{,} \hlstr{"dginn primate score"}\hlstd{,} \hlstr{"dginn bats score"}\hlstd{),} \hlkwc{pch}\hlstd{=}\hlkwd{c}\hlstd{(}\hlnum{1}\hlstd{,}\hlnum{20}\hlstd{,}\hlnum{4}\hlstd{),} \hlkwc{col}\hlstd{=}\hlkwd{c}\hlstd{(}\hlstr{"black"}\hlstd{,} \hlstr{"blue"}\hlstd{,} \hlstr{"blue"}\hlstd{))}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/pancorona-1}
\end{knitrout}
\end{document}
\documentclass[11pt, oneside]{article} % use "amsart" instead of "article" for AMSLaTeX format
%\usepackage{geometry} % See geometry.pdf to learn the layout options. There are lots.
%\geometry{letterpaper} % ... or a4paper or a5paper or ...
%\geometry{landscape} % Activate for for rotated page geometry
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
%\usepackage{graphicx} % Use pdf, png, jpg, or eps with pdflatex; use eps in DVI mode
% TeX will automatically convert eps --> pdf in pdflatex
%\usepackage{amssymb}
\usepackage[utf8]{inputenc}
%\usepackage[cyr]{aeguill}
%\usepackage[francais]{babel}
%\usepackage{hyperref}
\title{Positive selection on genes interacting with SARS-Cov2, comparison of different analysis}
\author{Marie Cariou}
\date{March 2021} % Activate to display a given date or no date
\begin{document}
\maketitle
\tableofcontents
\newpage
\section{Files manipulations}
\subsection{Complete table}
<<>>=
home<-"/home/adminmarie/Documents/"
workdir<-paste0(home, "CIRI_BIBS_projects/2020_05_Etienne_covid/")
tab<-read.delim(paste0(workdir,
"covid_comp/covid_comp_complete.txt"), h=T, sep="\t")
dim(tab)
tab$Gene.name<-as.character(tab$Gene.name.x)
tab$Gene.name[tab$PreyGene=="MARC1"]<-"MARC1"
@
\subsection{Read DGINN Young table}
DGINN-Young-primate table correspond to DGINN results, on the SAME alignment as Young-primate.
I will merge the 2 tables.
<<>>=
dginnY<-read.delim(paste0(workdir,
"data/summary_primate_young.res"),
fill=T, h=T)
dim(dginnY)
names(dginnY)
@
<<>>=
add_col<-function(method="PamlM1M2"){
tmp<-dginnY[dginnY$Method==method,
c("Gene", "Omega", "PosSel", "PValue", "NbSites", "PSS")]
names(tmp)<-c("Gene.name", paste0("Omega_", method),
paste0("PosSel_", method), paste0("PValue_", method),
paste0("NbSites_", method), paste0("PSS_", method))
tab<-merge(tab, tmp, by="Gene.name")
return(tab)
}
tab<-add_col("PamlM1M2")
tab<-add_col("PamlM7M8")
tab<-add_col("BppM1M2")
tab<-add_col("BppM7M8")
# Manip pour la colonne BUSTED
tmp<-dginnY[dginnY$Method=="BUSTED",c("Gene", "Omega", "PosSel", "PValue")]
names(tmp)<-c("Gene.name", "Omega_BUSTED", "PosSel_BUSTED", "PValue_BUSTED")
tab<-merge(tab, tmp, by="Gene.name")
tmp<-dginnY[dginnY$Method=="MEME",c("Gene", "NbSites", "PSS")]
names(tmp)<-c("Gene.name", "NbSites_MEME", "PSS_MEME")
tab<-merge(tab, tmp, by="Gene.name")
dim(tab)
@
\section{Comparisons Primates}
\subsection{DGINN results on Janet Young's alignments (DGINN-Young-primate) VS Janet Young's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
<<omegaM7M8>>=
tab$whole.gene.dN.dS.model.0<-as.numeric(
as.character(tab$whole.gene.dN.dS.model.0))
plot(tab$whole.gene.dN.dS.model.0, tab$Omega_PamlM7M8,
xlab="Omega Young-primate", ylab="Omega DGINN-Young-primate")
abline(0,1)
outlier<-tab[tab$whole.gene.dN.dS.model.0<0.2 & tab$Omega_PamlM7M8>0.4,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=(outlier$Omega_PamlM7M8+0.01),
outlier$Gene.name)
outlier<-tab[tab$whole.gene.dN.dS.model.0<0.6 & tab$Omega_PamlM7M8>0.7,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=(outlier$Omega_PamlM7M8+0.01),
outlier$Gene.name)
@
\subsection{DGINN results on Janet Young's alignments (DGINN-Young-primate) VS DGINN-full's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
<<omegaM7M8_2>>=
tab$'dginn.primate_omegaM0Bpp'<-as.numeric(
as.character(tab$'dginn.primate_omegaM0Bpp'))
plot(tab$'dginn.primate_omegaM0Bpp', tab$Omega_PamlM7M8,
xlab="DGINN-full's", ylab="Omega DGINN-Young-primate")
abline(0,1)
outlier<-tab[tab$'dginn.primate_omegaM0Bpp'>0.4 & tab$Omega_PamlM7M8<0.2,]
text(x=outlier$'dginn.primate_omegaM0Bpp',
y=(outlier$Omega_PamlM7M8+0.01),
outlier$Gene.name)
outlier<-tab[tab$'dginn.primate_omegaM0Bpp'>0.5 & tab$Omega_PamlM7M8<0.4,]
text(x=outlier$'dginn.primate_omegaM0Bpp',
y=(outlier$Omega_PamlM7M8+0.01),
outlier$Gene.name)
outlier<-tab[tab$'dginn.primate_omegaM0Bpp'>0.2 & tab$Omega_PamlM7M8>0.6,]
text(x=outlier$'dginn.primate_omegaM0Bpp',
y=(outlier$Omega_PamlM7M8+0.01),
outlier$Gene.name)
@
\subsection{Janet Young's results (Young-primate) VS DGINN-full's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
<<omegaM7M8_3>>=
plot(tab$whole.gene.dN.dS.model.0,
as.numeric(as.character(tab$'dginn.primate_omegaM0Bpp')),
xlab="Omega Young-primate", ylab="DGINN-full's")
abline(0,1)
outlier<-tab[tab$whole.gene.dN.dS.model.0<0.4 &
as.numeric(as.character(tab$'dginn.primate_omegaM0Bpp'))>0.5,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=outlier$'dginn.primate_omegaM0Bpp',
outlier$Gene.name)
outlier<-tab[tab$whole.gene.dN.dS.model.0>0.7 &
as.numeric(as.character(tab$'dginn.primate_omegaM0Bpp'))>0,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=outlier$'dginn.primate_omegaM0Bpp',
outlier$Gene.name)
outlier<-tab[tab$whole.gene.dN.dS.model.0<0.1 &
as.numeric(as.character(tab$'dginn.primate_omegaM0Bpp'))>0.3,]
text(x=outlier$whole.gene.dN.dS.model.0+0.03,
y=outlier$'dginn.primate_omegaM0Bpp',
outlier$Gene.name)
@
\section{Overlap}
\subsection{Mondrian}
<<mondrianprimates>>=
library(Mondrian)
monddata<-as.data.frame(tab$Gene.name)
dim(monddata)
dginnyoungtmp<-rowSums(cbind(tab$PosSel_PamlM1M2=="Y",
tab$PosSel_PamlM7M8=="Y",
tab$PosSel_BppM1M2=="Y",
tab$PosSel_BppM7M8=="Y",
tab$PosSel_BUSTED=="Y"))
dginnfulltmp<-rowSums(cbind(tab$'dginn.primate_BUSTED'=="Y",
tab$'dginn.primate_BppM1M2'=="Y",
tab$'dginn.primate_BppM7M8'=="Y",
tab$'dginn.primate_codemlM1M2'=="Y",
tab$'dginn.primate_codemlM7M8'=="Y"))
monddata$primates_young<-ifelse(tab$pVal.M8vsM7<0.05, 1, 0)
monddata$primates_dginn_young<-ifelse(dginnyoungtmp>=3, 1,0)
monddata$primates_dginn_full<-ifelse(dginnfulltmp>=3, 1,0)
mondrian(na.omit(monddata[,2:4]),
labels=c("Young", "DGINN-Young >=3", "DGINN-full >=3" ))
monddata$primates_dginn_young<-ifelse(dginnyoungtmp>=4, 1,0)
monddata$primates_dginn_full<-ifelse(dginnfulltmp>=4, 1,0)
mondrian(na.omit(monddata[,2:4]),
labels=c("Young", "DGINN-Young >=4", "DGINN-full >=4"))
@
Comparison of results with the same method.
<<>>=
monddata$primates_dginn_young<-tab$PosSel_BppM7M8=="Y"
monddata$primates_dginn_full<-tab$'dginn.primate_codemlM7M8'=="Y"
mondrian(na.omit(monddata[,2:4]),
labels=c("Young", "DGINN-Young", "DGINN-full"),
main="posel codeml M7M8")
@
\subsection{subsetR}
Just another representation of the same result, for now, I focuse on the gene positive in 3 methodes for DGINN analysis.
<<subsetprimates>>=
library(UpSetR)
upsetdata<-as.data.frame(tab$Gene.name)
upsetdata$primates_young<-ifelse(tab$pVal.M8vsM7<0.05, 1, 0)
###
upsetdata$primates_dginn_young<-ifelse(dginnyoungtmp>=3, 1,0)
upsetdata$primates_dginn_full<-ifelse(dginnfulltmp>=3, 1,0)
upset(na.omit(upsetdata), nsets = 3, matrix.color = "#DC267F",
main.bar.color = "#648FFF", sets.bar.color = "#FE6100")
@
\section{Gene List}
<<setup, include=FALSE, cache=FALSE, tidy=TRUE>>=
options(tidy=TRUE, width=70)
@
List of the 34 genes found under positive selection in all analysis.
<<>>=
upsetdata$`tab$Gene.name`[(upsetdata$primates_young==TRUE &
upsetdata$primates_dginn_young==TRUE &
upsetdata$primates_dginn_full==TRUE)]
@
List of the 13 genes found under positive selection in both Young analysis and DGINN-Young alignments (but not full-DGINN).
<<>>=
upsetdata$`tab$Gene.name`[(upsetdata$primates_young==TRUE &
upsetdata$primates_dginn_young==TRUE &
upsetdata$primates_dginn_full==FALSE)]
@
List of the 1 gene found under positive selection in both DGINN analysis, but not Young.
<<>>=
upsetdata$`tab$Gene.name`[(upsetdata$primates_young==FALSE &
upsetdata$primates_dginn_young==TRUE &
upsetdata$primates_dginn_full==TRUE)]
@
List of the 8 genes found under positive selection in both Young analysis and full-DGINN, but not DGINN-young.
<<>>=
upsetdata$`tab$Gene.name`[(upsetdata$primates_young==TRUE &
upsetdata$primates_dginn_young==FALSE &
upsetdata$primates_dginn_full==TRUE)]
@
List of the 18 genes found under positive selection ONLY in Young analysis.
<<>>=
upsetdata$`tab$Gene.name`[(upsetdata$primates_young==TRUE &
upsetdata$primates_dginn_young==FALSE &
upsetdata$primates_dginn_full==FALSE)]
@
List of the 1 genes found under positive selection ONLY in DGINN-Young.
<<>>=
upsetdata$`tab$Gene.name`[(upsetdata$primates_young==FALSE &
upsetdata$primates_dginn_young==TRUE &
upsetdata$primates_dginn_full==FALSE)]
@
List of the 44 genes found under positive selection ONLY in full-DGINN.
<<>>=
upsetdata$`tab$Gene.name`[(upsetdata$primates_young==FALSE &
upsetdata$primates_dginn_young==FALSE &
upsetdata$primates_dginn_full==TRUE)]
@
<<echo=FALSE, results="hide">>=
dginnfulltmp<-rowSums(cbind(tab$'dginn.primate_BUSTED'=="Y",
tab$'dginn.primate_BppM1M2'=="Y",
tab$'dginn.primate_BppM7M8'=="Y",
tab$'dginn.primate_codemlM1M2'=="Y",
tab$'dginn.primate_codemlM7M8'=="Y"))
tab$Gene.name[dginnfulltmp>=4 & is.na(dginnfulltmp)==F]
tab$Gene.name[dginnfulltmp>=3 & is.na(dginnfulltmp)==F]
tmp<-tab[dginnfulltmp>=4 & is.na(dginnfulltmp)==F,
c("Gene.name","dginn.primate_BUSTED", "dginn.primate_BppM1M2",
"dginn.primate_BppM7M8","dginn.primate_codemlM1M2","dginn.primate_codemlM7M8")]
write.table(tmp, "geneList_DGINN_full_primate_pos4.txt", row.names=F, quote=F)
@
<<shiny, fig.height=11, echo=FALSE, results="hide", fig="hide">>=
makeFig1 <- function(df){
# prepare data for colors etc
colMethods <- c("deepskyblue4", "darkorange" , "deepskyblue3" , "mediumseagreen" , "yellow3" , "black")
nameMethods <- c("BUSTED", "BppM1M2", "BppM7M8", "codemlM1M2", "codemlM7M8", "MEME")
metColor <- data.frame(Name = nameMethods , Col = colMethods , stringsAsFactors = FALSE)
# subset for this specific figure
#df <- df[df$nbY >= 1, ] # to drop genes found by 0 methods (big datasets)
xt <- df[, c("BUSTED", "BppM1M2", "BppM7M8", "codemlM1M2", "codemlM7M8")]
xt$Gene <- df$Gene
nbrMeth <- 5
# reverse order of dataframe so that genes with the most Y are at the bottom (to be on top of the barplot)
xt[,1:5] <- ifelse(xt[,1:5] == "Y", 1, 0)
# sort and Filter the 0 lines
xt<-xt[order(rowSums(xt[,1:5])),]
xt<-xt[rowSums(xt[,1:5])>2,]
row.names(xt)<-xt$Gene
xt<-xt[,1:5]
colFig1 <- metColor[which(metColor$Name %in% colnames(xt)) , ]
##### PART 1 : NUMBER OF METHODS
par(xpd = NA , mar=c(2,7,4,0) , oma = c(0,0,0,0) , mgp = c(3,0.3,0))
h = barplot(
t(xt),
border = NA ,
axes = F ,
col = adjustcolor(colFig1$Col, alpha.f = 1),
horiz = T ,
las = 2 ,
main = "Methods detecting positive selection" ,
cex.main = 0.85,
cex.names = min(50/nrow(xt), 1.5)
)
axis(3, line = 0, at = c(0:nbrMeth), label = c("0", rep("", nbrMeth -1), nbrMeth), tck = 0.02)
legend("bottomleft",
horiz = T,
border = colFig1$Col,
legend = colFig1$Name,
fill = colFig1$Col,
cex = 0.8,
bty = "n",
xpd = NA
)
}
df<-read.delim(paste0(workdir,
"/data/DGINN_202005281649summary_cleaned.csv"),
fill=T, h=T, sep=",")
#makeFig1(df)
@
\end{document}
File deleted
\documentclass[11pt, oneside]{article}\usepackage[]{graphicx}\usepackage[]{color}
% maxwidth is the original width if it is less than linewidth
% otherwise use linewidth (to make sure the graphics do not exceed the margin)
\makeatletter
\def\maxwidth{ %
\ifdim\Gin@nat@width>\linewidth
\linewidth
\else
\Gin@nat@width
\fi
}
\makeatother
\definecolor{fgcolor}{rgb}{0.345, 0.345, 0.345}
\newcommand{\hlnum}[1]{\textcolor[rgb]{0.686,0.059,0.569}{#1}}%
\newcommand{\hlstr}[1]{\textcolor[rgb]{0.192,0.494,0.8}{#1}}%
\newcommand{\hlcom}[1]{\textcolor[rgb]{0.678,0.584,0.686}{\textit{#1}}}%
\newcommand{\hlopt}[1]{\textcolor[rgb]{0,0,0}{#1}}%
\newcommand{\hlstd}[1]{\textcolor[rgb]{0.345,0.345,0.345}{#1}}%
\newcommand{\hlkwa}[1]{\textcolor[rgb]{0.161,0.373,0.58}{\textbf{#1}}}%
\newcommand{\hlkwb}[1]{\textcolor[rgb]{0.69,0.353,0.396}{#1}}%
\newcommand{\hlkwc}[1]{\textcolor[rgb]{0.333,0.667,0.333}{#1}}%
\newcommand{\hlkwd}[1]{\textcolor[rgb]{0.737,0.353,0.396}{\textbf{#1}}}%
\let\hlipl\hlkwb
\usepackage{framed}
\makeatletter
\newenvironment{kframe}{%
\def\at@end@of@kframe{}%
\ifinner\ifhmode%
\def\at@end@of@kframe{\end{minipage}}%
\begin{minipage}{\columnwidth}%
\fi\fi%
\def\FrameCommand##1{\hskip\@totalleftmargin \hskip-\fboxsep
\colorbox{shadecolor}{##1}\hskip-\fboxsep
% There is no \\@totalrightmargin, so:
\hskip-\linewidth \hskip-\@totalleftmargin \hskip\columnwidth}%
\MakeFramed {\advance\hsize-\width
\@totalleftmargin\z@ \linewidth\hsize
\@setminipage}}%
{\par\unskip\endMakeFramed%
\at@end@of@kframe}
\makeatother
\definecolor{shadecolor}{rgb}{.97, .97, .97}
\definecolor{messagecolor}{rgb}{0, 0, 0}
\definecolor{warningcolor}{rgb}{1, 0, 1}
\definecolor{errorcolor}{rgb}{1, 0, 0}
\newenvironment{knitrout}{}{} % an empty environment to be redefined in TeX
\usepackage{alltt} % use "amsart" instead of "article" for AMSLaTeX format
%\usepackage{geometry} % See geometry.pdf to learn the layout options. There are lots.
%\geometry{letterpaper} % ... or a4paper or a5paper or ...
%\geometry{landscape} % Activate for for rotated page geometry
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
%\usepackage{graphicx} % Use pdf, png, jpg, or eps with pdflatex; use eps in DVI mode
% TeX will automatically convert eps --> pdf in pdflatex
%\usepackage{amssymb}
\usepackage[utf8]{inputenc}
%\usepackage[cyr]{aeguill}
%\usepackage[francais]{babel}
%\usepackage{hyperref}
\title{Positive selection on genes interacting with SARS-Cov2, comparison of different analysis}
\author{Marie Cariou}
\date{March 2021} % Activate to display a given date or no date
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\begin{document}
\maketitle
\tableofcontents
\newpage
\section{Files manipulations}
\subsection{Complete table}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{home}\hlkwb{<-}\hlstr{"/home/adminmarie/Documents/"}
\hlstd{workdir}\hlkwb{<-}\hlkwd{paste0}\hlstd{(home,} \hlstr{"CIRI_BIBS_projects/2020_05_Etienne_covid/"}\hlstd{)}
\hlstd{tab}\hlkwb{<-}\hlkwd{read.delim}\hlstd{(}\hlkwd{paste0}\hlstd{(workdir,}
\hlstr{"covid_comp/covid_comp_complete.txt"}\hlstd{),} \hlkwc{h}\hlstd{=T,} \hlkwc{sep}\hlstd{=}\hlstr{"\textbackslash{}t"}\hlstd{)}
\hlkwd{dim}\hlstd{(tab)}
\end{alltt}
\begin{verbatim}
## [1] 332 141
\end{verbatim}
\begin{alltt}
\hlstd{tab}\hlopt{$}\hlstd{Gene.name}\hlkwb{<-}\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstd{Gene.name.x)}
\hlstd{tab}\hlopt{$}\hlstd{Gene.name[tab}\hlopt{$}\hlstd{PreyGene}\hlopt{==}\hlstr{"MARC1"}\hlstd{]}\hlkwb{<-}\hlstr{"MARC1"}
\end{alltt}
\end{kframe}
\end{knitrout}
\subsection{Read DGINN Young table}
DGINN-Young-primate table correspond to DGINN results, on the SAME alignment as Young-primate.
I will merge the 2 tables.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{dginnY}\hlkwb{<-}\hlkwd{read.delim}\hlstd{(}\hlkwd{paste0}\hlstd{(workdir,}
\hlstr{"data/summary_primate_young.res"}\hlstd{),}
\hlkwc{fill}\hlstd{=T,} \hlkwc{h}\hlstd{=T)}
\hlkwd{dim}\hlstd{(dginnY)}
\end{alltt}
\begin{verbatim}
## [1] 1992 7
\end{verbatim}
\begin{alltt}
\hlkwd{names}\hlstd{(dginnY)}
\end{alltt}
\begin{verbatim}
## [1] "Gene" "Omega" "Method" "PosSel" "PValue" "NbSites"
## [7] "PSS"
\end{verbatim}
\end{kframe}
\end{knitrout}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{add_col}\hlkwb{<-}\hlkwa{function}\hlstd{(}\hlkwc{method}\hlstd{=}\hlstr{"PamlM1M2"}\hlstd{)\{}
\hlstd{tmp}\hlkwb{<-}\hlstd{dginnY[dginnY}\hlopt{$}\hlstd{Method}\hlopt{==}\hlstd{method,}
\hlkwd{c}\hlstd{(}\hlstr{"Gene"}\hlstd{,} \hlstr{"Omega"}\hlstd{,} \hlstr{"PosSel"}\hlstd{,} \hlstr{"PValue"}\hlstd{,} \hlstr{"NbSites"}\hlstd{,} \hlstr{"PSS"}\hlstd{)]}
\hlkwd{names}\hlstd{(tmp)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"Gene.name"}\hlstd{,} \hlkwd{paste0}\hlstd{(}\hlstr{"Omega_"}\hlstd{, method),}
\hlkwd{paste0}\hlstd{(}\hlstr{"PosSel_"}\hlstd{, method),} \hlkwd{paste0}\hlstd{(}\hlstr{"PValue_"}\hlstd{, method),}
\hlkwd{paste0}\hlstd{(}\hlstr{"NbSites_"}\hlstd{, method),} \hlkwd{paste0}\hlstd{(}\hlstr{"PSS_"}\hlstd{, method))}
\hlstd{tab}\hlkwb{<-}\hlkwd{merge}\hlstd{(tab, tmp,} \hlkwc{by}\hlstd{=}\hlstr{"Gene.name"}\hlstd{)}
\hlkwd{return}\hlstd{(tab)}
\hlstd{\}}
\hlstd{tab}\hlkwb{<-}\hlkwd{add_col}\hlstd{(}\hlstr{"PamlM1M2"}\hlstd{)}
\hlstd{tab}\hlkwb{<-}\hlkwd{add_col}\hlstd{(}\hlstr{"PamlM7M8"}\hlstd{)}
\hlstd{tab}\hlkwb{<-}\hlkwd{add_col}\hlstd{(}\hlstr{"BppM1M2"}\hlstd{)}
\hlstd{tab}\hlkwb{<-}\hlkwd{add_col}\hlstd{(}\hlstr{"BppM7M8"}\hlstd{)}
\hlcom{# Manip pour la colonne BUSTED}
\hlstd{tmp}\hlkwb{<-}\hlstd{dginnY[dginnY}\hlopt{$}\hlstd{Method}\hlopt{==}\hlstr{"BUSTED"}\hlstd{,}\hlkwd{c}\hlstd{(}\hlstr{"Gene"}\hlstd{,} \hlstr{"Omega"}\hlstd{,} \hlstr{"PosSel"}\hlstd{,} \hlstr{"PValue"}\hlstd{)]}
\hlkwd{names}\hlstd{(tmp)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"Gene.name"}\hlstd{,} \hlstr{"Omega_BUSTED"}\hlstd{,} \hlstr{"PosSel_BUSTED"}\hlstd{,} \hlstr{"PValue_BUSTED"}\hlstd{)}
\hlstd{tab}\hlkwb{<-}\hlkwd{merge}\hlstd{(tab, tmp,} \hlkwc{by}\hlstd{=}\hlstr{"Gene.name"}\hlstd{)}
\hlstd{tmp}\hlkwb{<-}\hlstd{dginnY[dginnY}\hlopt{$}\hlstd{Method}\hlopt{==}\hlstr{"MEME"}\hlstd{,}\hlkwd{c}\hlstd{(}\hlstr{"Gene"}\hlstd{,} \hlstr{"NbSites"}\hlstd{,} \hlstr{"PSS"}\hlstd{)]}
\hlkwd{names}\hlstd{(tmp)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"Gene.name"}\hlstd{,} \hlstr{"NbSites_MEME"}\hlstd{,} \hlstr{"PSS_MEME"}\hlstd{)}
\hlstd{tab}\hlkwb{<-}\hlkwd{merge}\hlstd{(tab, tmp,} \hlkwc{by}\hlstd{=}\hlstr{"Gene.name"}\hlstd{)}
\hlkwd{dim}\hlstd{(tab)}
\end{alltt}
\begin{verbatim}
## [1] 332 167
\end{verbatim}
\end{kframe}
\end{knitrout}
\section{Comparisons Primates}
\subsection{DGINN results on Janet Young's alignments (DGINN-Young-primate) VS Janet Young's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlkwb{<-}\hlkwd{as.numeric}\hlstd{(}
\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0))}
\hlkwd{plot}\hlstd{(tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0, tab}\hlopt{$}\hlstd{Omega_PamlM7M8,}
\hlkwc{xlab}\hlstd{=}\hlstr{"Omega Young-primate"}\hlstd{,} \hlkwc{ylab}\hlstd{=}\hlstr{"Omega DGINN-Young-primate"}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlnum{0}\hlstd{,}\hlnum{1}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{<}\hlnum{0.2} \hlopt{&} \hlstd{tab}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{>}\hlnum{0.4}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=(outlier}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{+}\hlnum{0.01}\hlstd{),}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{<}\hlnum{0.6} \hlopt{&} \hlstd{tab}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{>}\hlnum{0.7}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=(outlier}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{+}\hlnum{0.01}\hlstd{),}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/omegaM7M8-1}
\end{knitrout}
\subsection{DGINN results on Janet Young's alignments (DGINN-Young-primate) VS DGINN-full's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{tab}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlkwb{<-}\hlkwd{as.numeric}\hlstd{(}
\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlstd{))}
\end{alltt}
{\ttfamily\noindent\color{warningcolor}{\#\# Warning: NAs introduits lors de la conversion automatique}}\begin{alltt}
\hlkwd{plot}\hlstd{(tab}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlstd{, tab}\hlopt{$}\hlstd{Omega_PamlM7M8,}
\hlkwc{xlab}\hlstd{=}\hlstr{"DGINN-full's"}\hlstd{,} \hlkwc{ylab}\hlstd{=}\hlstr{"Omega DGINN-Young-primate"}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlnum{0}\hlstd{,}\hlnum{1}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlopt{>}\hlnum{0.4} \hlopt{&} \hlstd{tab}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{<}\hlnum{0.2}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlstd{,}
\hlkwc{y}\hlstd{=(outlier}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{+}\hlnum{0.01}\hlstd{),}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlopt{>}\hlnum{0.5} \hlopt{&} \hlstd{tab}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{<}\hlnum{0.4}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlstd{,}
\hlkwc{y}\hlstd{=(outlier}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{+}\hlnum{0.01}\hlstd{),}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlopt{>}\hlnum{0.2} \hlopt{&} \hlstd{tab}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{>}\hlnum{0.6}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlstd{,}
\hlkwc{y}\hlstd{=(outlier}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{+}\hlnum{0.01}\hlstd{),}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/omegaM7M8_2-1}
\end{knitrout}
\subsection{Janet Young's results (Young-primate) VS DGINN-full's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{plot}\hlstd{(tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwd{as.numeric}\hlstd{(}\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlstd{)),}
\hlkwc{xlab}\hlstd{=}\hlstr{"Omega Young-primate"}\hlstd{,} \hlkwc{ylab}\hlstd{=}\hlstr{"DGINN-full's"}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlnum{0}\hlstd{,}\hlnum{1}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{<}\hlnum{0.4} \hlopt{&}
\hlkwd{as.numeric}\hlstd{(}\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlstd{))}\hlopt{>}\hlnum{0.5}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlstd{,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{>}\hlnum{0.7} \hlopt{&}
\hlkwd{as.numeric}\hlstd{(}\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlstd{))}\hlopt{>}\hlnum{0}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlstd{,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{<}\hlnum{0.1} \hlopt{&}
\hlkwd{as.numeric}\hlstd{(}\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlstd{))}\hlopt{>}\hlnum{0.3}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{+}\hlnum{0.03}\hlstd{,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstr{'dginn.primate_omegaM0Bpp'}\hlstd{,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/omegaM7M8_3-1}
\end{knitrout}
\section{Overlap}
\subsection{Mondrian}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{library}\hlstd{(Mondrian)}
\hlstd{monddata}\hlkwb{<-}\hlkwd{as.data.frame}\hlstd{(tab}\hlopt{$}\hlstd{Gene.name)}
\hlkwd{dim}\hlstd{(monddata)}
\end{alltt}
\begin{verbatim}
## [1] 332 1
\end{verbatim}
\begin{alltt}
\hlstd{dginnyoungtmp}\hlkwb{<-}\hlkwd{rowSums}\hlstd{(}\hlkwd{cbind}\hlstd{(tab}\hlopt{$}\hlstd{PosSel_PamlM1M2}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstd{PosSel_PamlM7M8}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstd{PosSel_BppM1M2}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstd{PosSel_BppM7M8}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstd{PosSel_BUSTED}\hlopt{==}\hlstr{"Y"}\hlstd{))}
\hlstd{dginnfulltmp}\hlkwb{<-}\hlkwd{rowSums}\hlstd{(}\hlkwd{cbind}\hlstd{(tab}\hlopt{$}\hlstr{'dginn.primate_BUSTED'}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstr{'dginn.primate_BppM1M2'}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstr{'dginn.primate_BppM7M8'}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstr{'dginn.primate_codemlM1M2'}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstr{'dginn.primate_codemlM7M8'}\hlopt{==}\hlstr{"Y"}\hlstd{))}
\hlstd{monddata}\hlopt{$}\hlstd{primates_young}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(tab}\hlopt{$}\hlstd{pVal.M8vsM7}\hlopt{<}\hlnum{0.05}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_young}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnyoungtmp}\hlopt{>=}\hlnum{3}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_full}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnfulltmp}\hlopt{>=}\hlnum{3}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{mondrian}\hlstd{(}\hlkwd{na.omit}\hlstd{(monddata[,}\hlnum{2}\hlopt{:}\hlnum{4}\hlstd{]),}
\hlkwc{labels}\hlstd{=}\hlkwd{c}\hlstd{(}\hlstr{"Young"}\hlstd{,} \hlstr{"DGINN-Young >=3"}\hlstd{,} \hlstr{"DGINN-full >=3"} \hlstd{))}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/mondrianprimates-1}
\begin{kframe}\begin{alltt}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_young}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnyoungtmp}\hlopt{>=}\hlnum{4}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_full}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnfulltmp}\hlopt{>=}\hlnum{4}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{mondrian}\hlstd{(}\hlkwd{na.omit}\hlstd{(monddata[,}\hlnum{2}\hlopt{:}\hlnum{4}\hlstd{]),}
\hlkwc{labels}\hlstd{=}\hlkwd{c}\hlstd{(}\hlstr{"Young"}\hlstd{,} \hlstr{"DGINN-Young >=4"}\hlstd{,} \hlstr{"DGINN-full >=4"}\hlstd{))}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/mondrianprimates-2}
\end{knitrout}
Comparison of results with the same method.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_young}\hlkwb{<-}\hlstd{tab}\hlopt{$}\hlstd{PosSel_BppM7M8}\hlopt{==}\hlstr{"Y"}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_full}\hlkwb{<-}\hlstd{tab}\hlopt{$}\hlstr{'dginn.primate_codemlM7M8'}\hlopt{==}\hlstr{"Y"}
\hlkwd{mondrian}\hlstd{(}\hlkwd{na.omit}\hlstd{(monddata[,}\hlnum{2}\hlopt{:}\hlnum{4}\hlstd{]),}
\hlkwc{labels}\hlstd{=}\hlkwd{c}\hlstd{(}\hlstr{"Young"}\hlstd{,} \hlstr{"DGINN-Young"}\hlstd{,} \hlstr{"DGINN-full"}\hlstd{),}
\hlkwc{main}\hlstd{=}\hlstr{"posel codeml M7M8"}\hlstd{)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/unnamed-chunk-4-1}
\end{knitrout}
\subsection{subsetR}
Just another representation of the same result, for now, I focuse on the gene positive in 3 methodes for DGINN analysis.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{library}\hlstd{(UpSetR)}
\hlstd{upsetdata}\hlkwb{<-}\hlkwd{as.data.frame}\hlstd{(tab}\hlopt{$}\hlstd{Gene.name)}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_young}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(tab}\hlopt{$}\hlstd{pVal.M8vsM7}\hlopt{<}\hlnum{0.05}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlcom{###}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_young}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnyoungtmp}\hlopt{>=}\hlnum{3}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_full}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnfulltmp}\hlopt{>=}\hlnum{3}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{upset}\hlstd{(}\hlkwd{na.omit}\hlstd{(upsetdata),} \hlkwc{nsets} \hlstd{=} \hlnum{3}\hlstd{,} \hlkwc{matrix.color} \hlstd{=} \hlstr{"#DC267F"}\hlstd{,}
\hlkwc{main.bar.color} \hlstd{=} \hlstr{"#648FFF"}\hlstd{,} \hlkwc{sets.bar.color} \hlstd{=} \hlstr{"#FE6100"}\hlstd{)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/subsetprimates-1}
\end{knitrout}
\section{Gene List}
List of the 34 genes found under positive selection in all analysis.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{upsetdata}\hlopt{$}\hlstd{`tab$Gene.name`[(upsetdata}\hlopt{$}\hlstd{primates_young}\hlopt{==}\hlnum{TRUE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_young}\hlopt{==}\hlnum{TRUE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_full}\hlopt{==}\hlnum{TRUE}\hlstd{)]}
\end{alltt}
\begin{verbatim}
## [1] ACADM ATE1 BCS1L BRD4 CDK5RAP2 CEP68
## [7] CNTRL DNMT1 EDEM3 FYCO1 GCC2 GHITM
## [13] GIGYF2 GOLGB1 GORASP1 ITGB1 MDN1 MPHOSPH10
## [19] MRPS5 NDUFAF2 PCNT PLAT POLA1 PRIM2
## [25] PVR SAAL1 SEPSECS SIRT5 SLC25A21 SLC27A2
## [31] TOR1AIP1 UGGT2 USP54 ZNF318
## 332 Levels: AAR2 AASS AATF ABCC1 ACAD9 ACADM ACSL3 ADAM9 ... ZYG11B
\end{verbatim}
\end{kframe}
\end{knitrout}
List of the 13 genes found under positive selection in both Young analysis and DGINN-Young alignments (but not full-DGINN).
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{upsetdata}\hlopt{$}\hlstd{`tab$Gene.name`[(upsetdata}\hlopt{$}\hlstd{primates_young}\hlopt{==}\hlnum{TRUE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_young}\hlopt{==}\hlnum{TRUE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_full}\hlopt{==}\hlnum{FALSE}\hlstd{)]}
\end{alltt}
\begin{verbatim}
## [1] ABCC1 ALG8 CEP250 CEP350 ERLEC1 FASTKD5 GOLGA2 MRPS27
## [9] NINL PDE4DIP PRRC2B RAB18 WFS1
## 332 Levels: AAR2 AASS AATF ABCC1 ACAD9 ACADM ACSL3 ADAM9 ... ZYG11B
\end{verbatim}
\end{kframe}
\end{knitrout}
List of the 1 gene found under positive selection in both DGINN analysis, but not Young.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{upsetdata}\hlopt{$}\hlstd{`tab$Gene.name`[(upsetdata}\hlopt{$}\hlstd{primates_young}\hlopt{==}\hlnum{FALSE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_young}\hlopt{==}\hlnum{TRUE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_full}\hlopt{==}\hlnum{TRUE}\hlstd{)]}
\end{alltt}
\begin{verbatim}
## [1] MARK1
## 332 Levels: AAR2 AASS AATF ABCC1 ACAD9 ACADM ACSL3 ADAM9 ... ZYG11B
\end{verbatim}
\end{kframe}
\end{knitrout}
List of the 8 genes found under positive selection in both Young analysis and full-DGINN, but not DGINN-young.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{upsetdata}\hlopt{$}\hlstd{`tab$Gene.name`[(upsetdata}\hlopt{$}\hlstd{primates_young}\hlopt{==}\hlnum{TRUE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_young}\hlopt{==}\hlnum{FALSE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_full}\hlopt{==}\hlnum{TRUE}\hlstd{)]}
\end{alltt}
\begin{verbatim}
## [1] <NA> LMAN2 NDUFB9 RIPK1 STOM TMEM39B TRMT1 UBAP2
## [9] VPS39
## 332 Levels: AAR2 AASS AATF ABCC1 ACAD9 ACADM ACSL3 ADAM9 ... ZYG11B
\end{verbatim}
\end{kframe}
\end{knitrout}
List of the 18 genes found under positive selection ONLY in Young analysis.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{upsetdata}\hlopt{$}\hlstd{`tab$Gene.name`[(upsetdata}\hlopt{$}\hlstd{primates_young}\hlopt{==}\hlnum{TRUE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_young}\hlopt{==}\hlnum{FALSE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_full}\hlopt{==}\hlnum{FALSE}\hlstd{)]}
\end{alltt}
\begin{verbatim}
## [1] AKAP9 ALG11 ALG5 C19orf52 CENPF CHMP2A COLGALT1
## [8] DCTPP1 DDX21 FBN1 FBXL12 JAKMIP1 <NA> NLRX1
## [15] NUP210 NUP98 PCSK6 PUSL1 ZYG11B
## 332 Levels: AAR2 AASS AATF ABCC1 ACAD9 ACADM ACSL3 ADAM9 ... ZYG11B
\end{verbatim}
\end{kframe}
\end{knitrout}
List of the 1 genes found under positive selection ONLY in DGINN-Young.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{upsetdata}\hlopt{$}\hlstd{`tab$Gene.name`[(upsetdata}\hlopt{$}\hlstd{primates_young}\hlopt{==}\hlnum{FALSE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_young}\hlopt{==}\hlnum{TRUE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_full}\hlopt{==}\hlnum{FALSE}\hlstd{)]}
\end{alltt}
\begin{verbatim}
## [1] FBN2
## 332 Levels: AAR2 AASS AATF ABCC1 ACAD9 ACADM ACSL3 ADAM9 ... ZYG11B
\end{verbatim}
\end{kframe}
\end{knitrout}
List of the 44 genes found under positive selection ONLY in full-DGINN.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{upsetdata}\hlopt{$}\hlstd{`tab$Gene.name`[(upsetdata}\hlopt{$}\hlstd{primates_young}\hlopt{==}\hlnum{FALSE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_young}\hlopt{==}\hlnum{FALSE} \hlopt{&}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_full}\hlopt{==}\hlnum{TRUE}\hlstd{)]}
\end{alltt}
\begin{verbatim}
## [1] ADAM9 <NA> AP2A2 <NA> <NA> BZW2 <NA>
## [8] CEP135 CLIP4 DPH5 EIF4E2 EMC1 ERO1LB EXOSC2
## [15] GGH GLA GOLGA7 HDAC2 HECTD1 HS6ST2 IDE
## [22] <NA> LARP1 LARP4B LARP7 <NA> MIPOL1 MOV10
## [29] MYCBP2 NAT14 NGLY1 NPC2 NUPL1 PITRM1 PLOD2
## [36] PMPCB POR PRKAR2A PTBP2 RAB14 RAB1A RAB2A
## [43] RAP1GDS1 RBX1 REEP6 RPL36 SCCPDH <NA> <NA>
## [50] TIMM8B TRIM59 TUBGCP2 <NA>
## 332 Levels: AAR2 AASS AATF ABCC1 ACAD9 ACADM ACSL3 ADAM9 ... ZYG11B
\end{verbatim}
\end{kframe}
\end{knitrout}
\end{document}
\documentclass[11pt, oneside]{article} % use "amsart" instead of "article" for AMSLaTeX format
%\usepackage{geometry} % See geometry.pdf to learn the layout options. There are lots.
%\geometry{letterpaper} % ... or a4paper or a5paper or ...
%\geometry{landscape} % Activate for for rotated page geometry
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
%\usepackage{graphicx} % Use pdf, png, jpg, or eps with pdflatex; use eps in DVI mode
% TeX will automatically convert eps --> pdf in pdflatex
%\usepackage{amssymb}
\usepackage[utf8]{inputenc}
%\usepackage[cyr]{aeguill}
%\usepackage[francais]{babel}
%\usepackage{hyperref}
\title{Positive selection on genes interacting with SARS-Cov2, comparison of different analysis}
\author{Marie Cariou}
\date{October 2020} % Activate to display a given date or no date
\begin{document}
\maketitle
\tableofcontents
\newpage
\section{Files manipulations}
\subsection{Read Janet Young's table}
<<>>=
workdir<-"/home/adminmarie/Documents/CIRI_BIBS_projects/2020_05_Etienne_covid/"
tab<-read.delim(paste0(workdir,
"data/COVID_PAMLresults_332hits_plusBatScreens_2020_Apr14.csv"),
fill=T, h=T, dec=",")
dim(tab)
#names(tab)
@
\subsection{Read DGINN Young table}
DGINN-Young-primate table correspond to DGINN results, on the SAME alignment as Young-primate.
I will merge the 2 tables.
<<>>=
dginnY<-read.delim(paste0(workdir,
"data/summary_primate_young.res"),
fill=T, h=T)
dim(dginnY)
names(dginnY)
@
\subsection{Joining Young and DGINN Young table}
\textit{I hide some code corresponding to verifications of gene names coherence between tables}
<<results="hide", echo=FALSE>>=
head(tab)[,1:5]
# gene avec un nom bizar dans certaines colomne
tab[158,1:10]
#
length(unique(dginnY$Gene))
length(unique(tab$PreyGene))
length(unique(tab$Gene.name))
#quelle paire de colonne contient le plus de noms identiques
sum(unique(dginnY$Gene) %in% unique(tab$PreyGene))
sum(unique(dginnY$Gene) %in% unique(tab$Gene.name))
# dginn$Gene et tab$Gene.name presque identiques sauf 1 ligne.
# Je soupçonne que c'est celle là:
tab[158,1:10]
# Verif:
tab[,1:10][(tab$Gene.name %in% unique(dginnY$Gene))==F,]
# yep
# Remplacement manuel par
as.character(unique(dginnY$Gene)[(unique(dginnY$Gene) %in% tab$Gene.name)==F])
# dans le tableau de Janet
val_remp=as.character(unique(dginnY$Gene)[(unique(dginnY$Gene) %in% tab$Gene.name)==F])
tab$Gene.name<-as.character(tab$Gene.name)
tab$Gene.name[158]<-val_remp
sum(unique(dginnY$Gene) %in% unique(tab$Gene.name))
@
<<>>=
add_col<-function(method="PamlM1M2"){
tmp<-dginnY[dginnY$Method==method,
c("Gene", "Omega", "PosSel", "PValue", "NbSites", "PSS")]
names(tmp)<-c("Gene.name", paste0("Omega_", method),
paste0("PosSel_", method), paste0("PValue_", method),
paste0("NbSites_", method), paste0("PSS_", method))
tab<-merge(tab, tmp, by="Gene.name")
return(tab)
}
tab<-add_col("PamlM1M2")
tab<-add_col("PamlM7M8")
tab<-add_col("BppM1M2")
tab<-add_col("BppM7M8")
# Manip pour la colonne BUSTED
tmp<-dginnY[dginnY$Method=="BUSTED",c("Gene", "Omega", "PosSel", "PValue")]
names(tmp)<-c("Gene.name", "Omega_BUSTED", "PosSel_BUSTED", "PValue_BUSTED")
tab<-merge(tab, tmp, by="Gene.name")
tmp<-dginnY[dginnY$Method=="MEME",c("Gene", "NbSites", "PSS")]
names(tmp)<-c("Gene.name", "NbSites_MEME", "PSS_MEME")
tab<-merge(tab, tmp, by="Gene.name")
@
\subsection{Read DGINN Table}
<<>>=
dginnT<-read.delim(paste0(workdir,
"/data/DGINN_202005281649summary_cleaned.csv"),
fill=T, h=T, sep=",")
dim(dginnT)
names(dginnT)
# Number of genes in dginn-primate output not present in the original table
dginnT[(dginnT$Gene %in% tab$Gene.name)==F,"Gene"]
# This includes paralogs, recombinations found by DGINN
# and additionnal genes included on purpose
# Number of genes from the original list not present in DGINN output
tab[(tab$Gene.name %in% dginnT$Gene)==F,"Gene.name"]
names(dginnT)<-c("File", "Name", "Gene.name", "GeneSize", "dginn-primate_NbSpecies", "dginn-primate_omegaM0Bpp",
"dginn-primate_omegaM0codeml", "dginn-primate_BUSTED", "dginn-primate_BUSTED.p.value",
"dginn-primate_MEME.NbSites", "dginn-primate_MEME.PSS", "dginn-primate_BppM1M2",
"dginn-primate_BppM1M2.p.value", "dginn-primate_BppM1M2.NbSites", "dginn-primate_BppM1M2.PSS",
"dginn-primate_BppM7M8", "dginn-primate_BppM7M8.p.value", "dginn-primate_BppM7M8.NbSites",
"dginn-primate_BppM7M8.PSS", "dginn-primate_codemlM1M2", "dginn-primate_codemlM1M2.p.value",
"dginn-primate_codemlM1M2.NbSites", "dginn-primate_codemlM1M2.PSS", "dginn-primate_codemlM7M8",
"dginn-primate_codemlM7M8.p.value", "dginn-primate_codemlM7M8.NbSites", "dginn-primate_codemlM7M8.PSS")
@
\subsection{Join Table and DGINN table}
<<>>=
tab<-merge(tab,dginnT, by="Gene.name", all.x=T)
@
\subsection{Write new table}
<<>>=
write.table(tab,
"COVID_PAMLresults_332hits_plusBatScreens_plusDGINN_20201014.txt",
row.names=F, quote=F, sep="\t")
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Comparisons Primates}
\subsection{DGINN results on Janet Young's alignments (DGINN-Young-primate) VS Janet Young's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
<<omegaM7M8>>=
plot(tab$whole.gene.dN.dS.model.0, tab$Omega_PamlM7M8,
xlab="Omega Young-primate", ylab="Omega DGINN-Young-primate")
abline(0,1)
outlier<-tab[tab$whole.gene.dN.dS.model.0<0.2 & tab$Omega_PamlM7M8>0.4,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=(outlier$Omega_PamlM7M8+0.01),
outlier$Gene.name)
outlier<-tab[tab$whole.gene.dN.dS.model.0<0.6 & tab$Omega_PamlM7M8>0.7,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=(outlier$Omega_PamlM7M8+0.01),
outlier$Gene.name)
@
\subsection{DGINN results on Janet Young's alignments (DGINN-Young-primate) VS DGINN-full's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
<<omegaM7M8_2>>=
tab$'dginn-primate_omegaM0Bpp'<-as.numeric(as.character(tab$'dginn-primate_omegaM0Bpp'))
plot(tab$'dginn-primate_omegaM0Bpp', tab$Omega_PamlM7M8,
xlab="DGINN-full's", ylab="Omega DGINN-Young-primate")
abline(0,1)
outlier<-tab[tab$'dginn-primate_omegaM0Bpp'>0.4 & tab$Omega_PamlM7M8<0.2,]
text(x=outlier$'dginn-primate_omegaM0Bpp',
y=(outlier$Omega_PamlM7M8+0.01),
outlier$Gene.name)
outlier<-tab[tab$'dginn-primate_omegaM0Bpp'>0.5 & tab$Omega_PamlM7M8<0.4,]
text(x=outlier$'dginn-primate_omegaM0Bpp',
y=(outlier$Omega_PamlM7M8+0.01),
outlier$Gene.name)
@
\subsection{Janet Young's results (Young-primate) VS DGINN-full's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
<<omegaM7M8_3>>=
plot(tab$whole.gene.dN.dS.model.0, as.numeric(as.character(tab$'dginn-primate_omegaM0Bpp')),
xlab="Omega Young-primate", ylab="DGINN-full's")
abline(0,1)
outlier<-tab[tab$whole.gene.dN.dS.model.0<0.4 & tab$'dginn-primate_omegaM0Bpp'>0.5,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=outlier$'dginn-primate_omegaM0Bpp',
outlier$Gene.name)
@
\section{Overlap}
\subsection{Mondrian}
<<mondrianprimates>>=
library(Mondrian)
#######
monddata<-as.data.frame(tab$Gene.name)
dim(monddata)
dginnyoungtmp<-rowSums(cbind(tab$PosSel_PamlM1M2=="Y", tab$PosSel_PamlM7M8=="Y",
tab$PosSel_BppM1M2=="Y", tab$PosSel_BppM7M8=="Y", tab$PosSel_BUSTED=="Y"))
#monddata$primates_dginn_young<-ifelse(tmp$PosSel_PamlM7M8=="Y", 1,0)
dginnfulltmp<-rowSums(cbind(tab$'dginn-primate_BUSTED'=="Y", tab$'dginn-primate_BppM1M2'=="Y",
tab$'dginn-primate_BppM7M8'=="Y", tab$'dginn-primate_codemlM1M2'=="Y", tab$'dginn-primate_codemlM7M8'=="Y"))
monddata$primates_young<-ifelse(tab$pVal.M8vsM7<0.05, 1, 0)
#monddata$primates_cooper<-ifelse(tab$cooper.primates.M7.M8_p_val<0.05, 1, 0)
monddata$primates_dginn_young<-ifelse(dginnyoungtmp>=3, 1,0)
monddata$primates_dginn_full<-ifelse(dginnfulltmp>=3, 1,0)
mondrian(na.omit(monddata[,2:4]), labels=c("Young", "DGINN-Young >=3", "DGINN-full >=3" ))
#####
monddata$primates_dginn_young<-ifelse(dginnyoungtmp>=4, 1,0)
monddata$primates_dginn_full<-ifelse(dginnfulltmp>=4, 1,0)
mondrian(na.omit(monddata[,2:4]), labels=c("Young", "DGINN-Young >=4", "DGINN-full >=4"))
@
Comparison of results with the same method.
<<>>=
#####
monddata$primates_dginn_young<-tab$PosSel_BppM7M8=="Y"
monddata$primates_dginn_full<-tab$'dginn-primate_codemlM7M8'=="Y"
mondrian(na.omit(monddata[,2:4]), labels=c("Young", "DGINN-Young", "DGINN-full"), main="posel codeml M7M8")
@
\subsection{subsetR}
Just another representation of the same result.
<<subsetprimates>>=
library(UpSetR)
upsetdata<-as.data.frame(tab$Gene.name)
upsetdata$primates_young<-ifelse(tab$pVal.M8vsM7<0.05, 1, 0)
###
upsetdata$primates_dginn_young<-ifelse(dginnyoungtmp>=3, 1,0)
upsetdata$primates_dginn_full<-ifelse(dginnfulltmp>=3, 1,0)
upset(na.omit(upsetdata), nsets = 3, matrix.color = "#DC267F",
main.bar.color = "#648FFF", sets.bar.color = "#FE6100")
###
upsetdata$primates_dginn_young<-ifelse(dginnyoungtmp>=4, 1,0)
upsetdata$primates_dginn_full<-ifelse(dginnfulltmp>=4, 1,0)
upset(na.omit(upsetdata), nsets = 3, matrix.color = "#DC267F",
main.bar.color = "#648FFF", sets.bar.color = "#FE6100")
@
\section{Gene List}
Genes under positive selection for at least 4 methods.
<<>>=
dginnfulltmp<-rowSums(cbind(tab$'dginn-primate_BUSTED'=="Y",
tab$'dginn-primate_BppM1M2'=="Y",
tab$'dginn-primate_BppM7M8'=="Y",
tab$'dginn-primate_codemlM1M2'=="Y",
tab$'dginn-primate_codemlM7M8'=="Y"))
tab$Gene.name[dginnfulltmp>=4 & is.na(dginnfulltmp)==F]
tab$Gene.name[dginnfulltmp>=3 & is.na(dginnfulltmp)==F]
tmp<-tab[dginnfulltmp>=4 & is.na(dginnfulltmp)==F,
c("Gene.name","dginn-primate_BUSTED", "dginn-primate_BppM1M2",
"dginn-primate_BppM7M8","dginn-primate_codemlM1M2","dginn-primate_codemlM7M8")]
write.table(tmp, "geneList_DGINN_full_primate_pos4.txt", row.names=F, quote=F)
@
\section{Shiny like}
<<shiny, fig.height=11>>=
makeFig1 <- function(df){
# prepare data for colors etc
colMethods <- c("deepskyblue4", "darkorange" , "deepskyblue3" , "mediumseagreen" , "yellow3" , "black")
nameMethods <- c("BUSTED", "BppM1M2", "BppM7M8", "codemlM1M2", "codemlM7M8", "MEME")
metColor <- data.frame(Name = nameMethods , Col = colMethods , stringsAsFactors = FALSE)
# subset for this specific figure
#df <- df[df$nbY >= 1, ] # to drop genes found by 0 methods (big datasets)
xt <- df[, c("BUSTED", "BppM1M2", "BppM7M8", "codemlM1M2", "codemlM7M8")]
xt$Gene <- df$Gene
nbrMeth <- 5
# reverse order of dataframe so that genes with the most Y are at the bottom (to be on top of the barplot)
xt[,1:5] <- ifelse(xt[,1:5] == "Y", 1, 0)
# sort and Filter the 0 lines
xt<-xt[order(rowSums(xt[,1:5])),]
xt<-xt[rowSums(xt[,1:5])>2,]
row.names(xt)<-xt$Gene
xt<-xt[,1:5]
colFig1 <- metColor[which(metColor$Name %in% colnames(xt)) , ]
##### PART 1 : NUMBER OF METHODS
par(xpd = NA , mar=c(2,7,4,0) , oma = c(0,0,0,0) , mgp = c(3,0.3,0))
h = barplot(
t(xt),
border = NA ,
axes = F ,
col = adjustcolor(colFig1$Col, alpha.f = 1),
horiz = T ,
las = 2 ,
main = "Methods detecting positive selection" ,
cex.main = 0.85,
cex.names = min(50/nrow(xt), 1.5)
)
axis(3, line = 0, at = c(0:nbrMeth), label = c("0", rep("", nbrMeth -1), nbrMeth), tck = 0.02)
legend("bottomleft",
horiz = T,
border = colFig1$Col,
legend = colFig1$Name,
fill = colFig1$Col,
cex = 0.8,
bty = "n",
xpd = NA
)
}
df<-read.delim(paste0(workdir,
"/data/DGINN_202005281649summary_cleaned.csv"),
fill=T, h=T, sep=",")
makeFig1(df)
@
\end{document}
File deleted
\documentclass[11pt, oneside]{article}\usepackage[]{graphicx}\usepackage[]{color}
% maxwidth is the original width if it is less than linewidth
% otherwise use linewidth (to make sure the graphics do not exceed the margin)
\makeatletter
\def\maxwidth{ %
\ifdim\Gin@nat@width>\linewidth
\linewidth
\else
\Gin@nat@width
\fi
}
\makeatother
\definecolor{fgcolor}{rgb}{0.345, 0.345, 0.345}
\newcommand{\hlnum}[1]{\textcolor[rgb]{0.686,0.059,0.569}{#1}}%
\newcommand{\hlstr}[1]{\textcolor[rgb]{0.192,0.494,0.8}{#1}}%
\newcommand{\hlcom}[1]{\textcolor[rgb]{0.678,0.584,0.686}{\textit{#1}}}%
\newcommand{\hlopt}[1]{\textcolor[rgb]{0,0,0}{#1}}%
\newcommand{\hlstd}[1]{\textcolor[rgb]{0.345,0.345,0.345}{#1}}%
\newcommand{\hlkwa}[1]{\textcolor[rgb]{0.161,0.373,0.58}{\textbf{#1}}}%
\newcommand{\hlkwb}[1]{\textcolor[rgb]{0.69,0.353,0.396}{#1}}%
\newcommand{\hlkwc}[1]{\textcolor[rgb]{0.333,0.667,0.333}{#1}}%
\newcommand{\hlkwd}[1]{\textcolor[rgb]{0.737,0.353,0.396}{\textbf{#1}}}%
\let\hlipl\hlkwb
\usepackage{framed}
\makeatletter
\newenvironment{kframe}{%
\def\at@end@of@kframe{}%
\ifinner\ifhmode%
\def\at@end@of@kframe{\end{minipage}}%
\begin{minipage}{\columnwidth}%
\fi\fi%
\def\FrameCommand##1{\hskip\@totalleftmargin \hskip-\fboxsep
\colorbox{shadecolor}{##1}\hskip-\fboxsep
% There is no \\@totalrightmargin, so:
\hskip-\linewidth \hskip-\@totalleftmargin \hskip\columnwidth}%
\MakeFramed {\advance\hsize-\width
\@totalleftmargin\z@ \linewidth\hsize
\@setminipage}}%
{\par\unskip\endMakeFramed%
\at@end@of@kframe}
\makeatother
\definecolor{shadecolor}{rgb}{.97, .97, .97}
\definecolor{messagecolor}{rgb}{0, 0, 0}
\definecolor{warningcolor}{rgb}{1, 0, 1}
\definecolor{errorcolor}{rgb}{1, 0, 0}
\newenvironment{knitrout}{}{} % an empty environment to be redefined in TeX
\usepackage{alltt} % use "amsart" instead of "article" for AMSLaTeX format
%\usepackage{geometry} % See geometry.pdf to learn the layout options. There are lots.
%\geometry{letterpaper} % ... or a4paper or a5paper or ...
%\geometry{landscape} % Activate for for rotated page geometry
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
%\usepackage{graphicx} % Use pdf, png, jpg, or eps with pdflatex; use eps in DVI mode
% TeX will automatically convert eps --> pdf in pdflatex
%\usepackage{amssymb}
\usepackage[utf8]{inputenc}
%\usepackage[cyr]{aeguill}
%\usepackage[francais]{babel}
%\usepackage{hyperref}
\title{Positive selection on genes interacting with SARS-Cov2, comparison of different analysis}
\author{Marie Cariou}
\date{October 2020} % Activate to display a given date or no date
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\begin{document}
\maketitle
\tableofcontents
\newpage
\section{Files manipulations}
\subsection{Read Janet Young's table}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{workdir}\hlkwb{<-}\hlstr{"/home/adminmarie/Documents/CIRI_BIBS_projects/2020_05_Etienne_covid/"}
\hlstd{tab}\hlkwb{<-}\hlkwd{read.delim}\hlstd{(}\hlkwd{paste0}\hlstd{(workdir,}
\hlstr{"data/COVID_PAMLresults_332hits_plusBatScreens_2020_Apr14.csv"}\hlstd{),}
\hlkwc{fill}\hlstd{=T,} \hlkwc{h}\hlstd{=T,} \hlkwc{dec}\hlstd{=}\hlstr{","}\hlstd{)}
\hlkwd{dim}\hlstd{(tab)}
\end{alltt}
\begin{verbatim}
## [1] 332 84
\end{verbatim}
\begin{alltt}
\hlcom{#names(tab)}
\end{alltt}
\end{kframe}
\end{knitrout}
\subsection{Read DGINN Young table}
DGINN-Young-primate table correspond to DGINN results, on the SAME alignment as Young-primate.
I will merge the 2 tables.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{dginnY}\hlkwb{<-}\hlkwd{read.delim}\hlstd{(}\hlkwd{paste0}\hlstd{(workdir,}
\hlstr{"data/summary_primate_young.res"}\hlstd{),}
\hlkwc{fill}\hlstd{=T,} \hlkwc{h}\hlstd{=T)}
\hlkwd{dim}\hlstd{(dginnY)}
\end{alltt}
\begin{verbatim}
## [1] 1992 7
\end{verbatim}
\begin{alltt}
\hlkwd{names}\hlstd{(dginnY)}
\end{alltt}
\begin{verbatim}
## [1] "Gene" "Omega" "Method" "PosSel" "PValue" "NbSites" "PSS"
\end{verbatim}
\end{kframe}
\end{knitrout}
\subsection{Joining Young and DGINN Young table}
\textit{I hide some code corresponding to verifications of gene names coherence between tables}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{add_col}\hlkwb{<-}\hlkwa{function}\hlstd{(}\hlkwc{method}\hlstd{=}\hlstr{"PamlM1M2"}\hlstd{)\{}
\hlstd{tmp}\hlkwb{<-}\hlstd{dginnY[dginnY}\hlopt{$}\hlstd{Method}\hlopt{==}\hlstd{method,}
\hlkwd{c}\hlstd{(}\hlstr{"Gene"}\hlstd{,} \hlstr{"Omega"}\hlstd{,} \hlstr{"PosSel"}\hlstd{,} \hlstr{"PValue"}\hlstd{,} \hlstr{"NbSites"}\hlstd{,} \hlstr{"PSS"}\hlstd{)]}
\hlkwd{names}\hlstd{(tmp)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"Gene.name"}\hlstd{,} \hlkwd{paste0}\hlstd{(}\hlstr{"Omega_"}\hlstd{, method),}
\hlkwd{paste0}\hlstd{(}\hlstr{"PosSel_"}\hlstd{, method),} \hlkwd{paste0}\hlstd{(}\hlstr{"PValue_"}\hlstd{, method),}
\hlkwd{paste0}\hlstd{(}\hlstr{"NbSites_"}\hlstd{, method),} \hlkwd{paste0}\hlstd{(}\hlstr{"PSS_"}\hlstd{, method))}
\hlstd{tab}\hlkwb{<-}\hlkwd{merge}\hlstd{(tab, tmp,} \hlkwc{by}\hlstd{=}\hlstr{"Gene.name"}\hlstd{)}
\hlkwd{return}\hlstd{(tab)}
\hlstd{\}}
\hlstd{tab}\hlkwb{<-}\hlkwd{add_col}\hlstd{(}\hlstr{"PamlM1M2"}\hlstd{)}
\hlstd{tab}\hlkwb{<-}\hlkwd{add_col}\hlstd{(}\hlstr{"PamlM7M8"}\hlstd{)}
\hlstd{tab}\hlkwb{<-}\hlkwd{add_col}\hlstd{(}\hlstr{"BppM1M2"}\hlstd{)}
\hlstd{tab}\hlkwb{<-}\hlkwd{add_col}\hlstd{(}\hlstr{"BppM7M8"}\hlstd{)}
\hlcom{# Manip pour la colonne BUSTED}
\hlstd{tmp}\hlkwb{<-}\hlstd{dginnY[dginnY}\hlopt{$}\hlstd{Method}\hlopt{==}\hlstr{"BUSTED"}\hlstd{,}\hlkwd{c}\hlstd{(}\hlstr{"Gene"}\hlstd{,} \hlstr{"Omega"}\hlstd{,} \hlstr{"PosSel"}\hlstd{,} \hlstr{"PValue"}\hlstd{)]}
\hlkwd{names}\hlstd{(tmp)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"Gene.name"}\hlstd{,} \hlstr{"Omega_BUSTED"}\hlstd{,} \hlstr{"PosSel_BUSTED"}\hlstd{,} \hlstr{"PValue_BUSTED"}\hlstd{)}
\hlstd{tab}\hlkwb{<-}\hlkwd{merge}\hlstd{(tab, tmp,} \hlkwc{by}\hlstd{=}\hlstr{"Gene.name"}\hlstd{)}
\hlstd{tmp}\hlkwb{<-}\hlstd{dginnY[dginnY}\hlopt{$}\hlstd{Method}\hlopt{==}\hlstr{"MEME"}\hlstd{,}\hlkwd{c}\hlstd{(}\hlstr{"Gene"}\hlstd{,} \hlstr{"NbSites"}\hlstd{,} \hlstr{"PSS"}\hlstd{)]}
\hlkwd{names}\hlstd{(tmp)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"Gene.name"}\hlstd{,} \hlstr{"NbSites_MEME"}\hlstd{,} \hlstr{"PSS_MEME"}\hlstd{)}
\hlstd{tab}\hlkwb{<-}\hlkwd{merge}\hlstd{(tab, tmp,} \hlkwc{by}\hlstd{=}\hlstr{"Gene.name"}\hlstd{)}
\end{alltt}
\end{kframe}
\end{knitrout}
\subsection{Read DGINN Table}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{dginnT}\hlkwb{<-}\hlkwd{read.delim}\hlstd{(}\hlkwd{paste0}\hlstd{(workdir,}
\hlstr{"/data/DGINN_202005281649summary_cleaned.csv"}\hlstd{),}
\hlkwc{fill}\hlstd{=T,} \hlkwc{h}\hlstd{=T,} \hlkwc{sep}\hlstd{=}\hlstr{","}\hlstd{)}
\hlkwd{dim}\hlstd{(dginnT)}
\end{alltt}
\begin{verbatim}
## [1] 412 27
\end{verbatim}
\begin{alltt}
\hlkwd{names}\hlstd{(dginnT)}
\end{alltt}
\begin{verbatim}
## [1] "File" "Name" "Gene" "GeneSize" "NbSpecies" "omegaM0Bpp"
## [7] "omegaM0codeml" "BUSTED" "BUSTED.p.value" "MEME.NbSites" "MEME.PSS" "BppM1M2"
## [13] "BppM1M2.p.value" "BppM1M2.NbSites" "BppM1M2.PSS" "BppM7M8" "BppM7M8.p.value" "BppM7M8.NbSites"
## [19] "BppM7M8.PSS" "codemlM1M2" "codemlM1M2.p.value" "codemlM1M2.NbSites" "codemlM1M2.PSS" "codemlM7M8"
## [25] "codemlM7M8.p.value" "codemlM7M8.NbSites" "codemlM7M8.PSS"
\end{verbatim}
\begin{alltt}
\hlcom{# Number of genes in dginn-primate output not present in the original table}
\hlstd{dginnT[(dginnT}\hlopt{$}\hlstd{Gene} \hlopt{%in%} \hlstd{tab}\hlopt{$}\hlstd{Gene.name)}\hlopt{==}\hlstd{F,}\hlstr{"Gene"}\hlstd{]}
\end{alltt}
\begin{verbatim}
## [1] ACE2 ADAM9[0-3120] ADAM9[3119-3927] ATP5MGL C1H1ORF50 CEP135[0-3264] CEP135[3263-3678]
## [8] CEP43 COQ8B COQ8A CSNK2A1 CSNK2B[0-609] CSNK2B[608-2568] CYB5R1
## [15] DDX21[0-717] DDX21[716-2538] DDX50 DNAJC15 DPH5[0-702] DPH5[701-1326] DPY19L2
## [22] ELOC ERO1B EXOSC3[0-1446] EXOSC3[1445-1980] FBN3 GNB4 GNB2
## [29] GNB3 GOLGA7[0-312] GOLGA7[311-549] GPX1[0-1218] GPX1[1217-2946] HDAC1 HS6ST3
## [36] IMPDH1 ITGB1[0-2328] ITGB1[2327-2844] LMAN2L MRPS5[0-1569] MRPS5[1568-3783] MARC2
## [43] MGRN1 NDFIP2[0-768] NDFIP2[767-1314] NDUFAF2[0-258] NDUFAF2[257-744] NSD2 NUP58
## [50] NUP58[0-1824] NUP58[1823-2367] PABPC3 POTPABPC1 PABPC4L PABPC5 PCSK5
## [57] PRIM2[0-1071] PRIM2[1070-1902] PRKACB PRKACG PTGES2[0-1587] PTGES2[1586-2202] RAB8B
## [64] RAB13 RAB18[0-855] RAB18[854-1815] RAB2B RAB5A RAB5B RAB15
## [71] RALB EZR EZR[0-1458] EZR[1457-3771] MSN RETREG3 RHOB
## [78] RHOC SLC44A2[0-2577] SLC44A2[2576-3657] SPART SRP72[0-2604] SRP72[2603-3417] STOM[0-1047]
## [85] STOM[1046-1800] STOML3 TIMM29 TLE4 TLE2 TLE2[0-1302] TLE2[1301-3987]
## [92] TMPRSS2 TOMM70 TOR1B WASHC4 WFS1[0-2346] WFS1[2345-3216] YIF1B
## 411 Levels: AAR2 AASS AATF ABCC1 ACAD9 ACADM ACE2 ACSL3 ADAM9 ADAM9[0-3120] ADAM9[3119-3927] ADAMTS1 AES AGPS AKAP8 AKAP8L AKAP9 ... ZYG11B
\end{verbatim}
\begin{alltt}
\hlcom{# This includes paralogs, recombinations found by DGINN }
\hlcom{# and additionnal genes included on purpose}
\hlcom{# Number of genes from the original list not present in DGINN output}
\hlstd{tab[(tab}\hlopt{$}\hlstd{Gene.name} \hlopt{%in%} \hlstd{dginnT}\hlopt{$}\hlstd{Gene)}\hlopt{==}\hlstd{F,}\hlstr{"Gene.name"}\hlstd{]}
\end{alltt}
\begin{verbatim}
## [1] "ADCK4" "ARL6IP6" "ATP5L" "C19orf52" "C1orf50" "ERO1LB" "FAM134C" "FGFR1OP" "KIAA1033" "MFGE8" "NUPL1" "SIGMAR1"
## [13] "SPG20" "TCEB1" "TCEB2" "TOMM70A" "USP13" "VIMP" "WHSC1"
\end{verbatim}
\begin{alltt}
\hlkwd{names}\hlstd{(dginnT)}\hlkwb{<-}\hlkwd{c}\hlstd{(}\hlstr{"File"}\hlstd{,} \hlstr{"Name"}\hlstd{,} \hlstr{"Gene.name"}\hlstd{,} \hlstr{"GeneSize"}\hlstd{,} \hlstr{"dginn-primate_NbSpecies"}\hlstd{,} \hlstr{"dginn-primate_omegaM0Bpp"}\hlstd{,}
\hlstr{"dginn-primate_omegaM0codeml"}\hlstd{,} \hlstr{"dginn-primate_BUSTED"}\hlstd{,} \hlstr{"dginn-primate_BUSTED.p.value"}\hlstd{,}
\hlstr{"dginn-primate_MEME.NbSites"}\hlstd{,} \hlstr{"dginn-primate_MEME.PSS"}\hlstd{,} \hlstr{"dginn-primate_BppM1M2"}\hlstd{,}
\hlstr{"dginn-primate_BppM1M2.p.value"}\hlstd{,} \hlstr{"dginn-primate_BppM1M2.NbSites"}\hlstd{,} \hlstr{"dginn-primate_BppM1M2.PSS"}\hlstd{,}
\hlstr{"dginn-primate_BppM7M8"}\hlstd{,} \hlstr{"dginn-primate_BppM7M8.p.value"}\hlstd{,} \hlstr{"dginn-primate_BppM7M8.NbSites"}\hlstd{,}
\hlstr{"dginn-primate_BppM7M8.PSS"}\hlstd{,} \hlstr{"dginn-primate_codemlM1M2"}\hlstd{,} \hlstr{"dginn-primate_codemlM1M2.p.value"}\hlstd{,}
\hlstr{"dginn-primate_codemlM1M2.NbSites"}\hlstd{,} \hlstr{"dginn-primate_codemlM1M2.PSS"}\hlstd{,} \hlstr{"dginn-primate_codemlM7M8"}\hlstd{,}
\hlstr{"dginn-primate_codemlM7M8.p.value"}\hlstd{,} \hlstr{"dginn-primate_codemlM7M8.NbSites"}\hlstd{,} \hlstr{"dginn-primate_codemlM7M8.PSS"}\hlstd{)}
\end{alltt}
\end{kframe}
\end{knitrout}
\subsection{Join Table and DGINN table}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{tab}\hlkwb{<-}\hlkwd{merge}\hlstd{(tab,dginnT,} \hlkwc{by}\hlstd{=}\hlstr{"Gene.name"}\hlstd{,} \hlkwc{all.x}\hlstd{=T)}
\end{alltt}
\end{kframe}
\end{knitrout}
\subsection{Write new table}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{write.table}\hlstd{(tab,}
\hlstr{"COVID_PAMLresults_332hits_plusBatScreens_plusDGINN_20201014.txt"}\hlstd{,}
\hlkwc{row.names}\hlstd{=F,} \hlkwc{quote}\hlstd{=F,} \hlkwc{sep}\hlstd{=}\hlstr{"\textbackslash{}t"}\hlstd{)}
\end{alltt}
\end{kframe}
\end{knitrout}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Comparisons Primates}
\subsection{DGINN results on Janet Young's alignments (DGINN-Young-primate) VS Janet Young's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{plot}\hlstd{(tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0, tab}\hlopt{$}\hlstd{Omega_PamlM7M8,}
\hlkwc{xlab}\hlstd{=}\hlstr{"Omega Young-primate"}\hlstd{,} \hlkwc{ylab}\hlstd{=}\hlstr{"Omega DGINN-Young-primate"}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlnum{0}\hlstd{,}\hlnum{1}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{<}\hlnum{0.2} \hlopt{&} \hlstd{tab}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{>}\hlnum{0.4}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=(outlier}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{+}\hlnum{0.01}\hlstd{),}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{<}\hlnum{0.6} \hlopt{&} \hlstd{tab}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{>}\hlnum{0.7}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=(outlier}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{+}\hlnum{0.01}\hlstd{),}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/omegaM7M8-1}
\end{knitrout}
\subsection{DGINN results on Janet Young's alignments (DGINN-Young-primate) VS DGINN-full's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{tab}\hlopt{$}\hlstr{'dginn-primate_omegaM0Bpp'}\hlkwb{<-}\hlkwd{as.numeric}\hlstd{(}\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstr{'dginn-primate_omegaM0Bpp'}\hlstd{))}
\end{alltt}
{\ttfamily\noindent\color{warningcolor}{\#\# Warning: NAs introduits lors de la conversion automatique}}\begin{alltt}
\hlkwd{plot}\hlstd{(tab}\hlopt{$}\hlstr{'dginn-primate_omegaM0Bpp'}\hlstd{, tab}\hlopt{$}\hlstd{Omega_PamlM7M8,}
\hlkwc{xlab}\hlstd{=}\hlstr{"DGINN-full's"}\hlstd{,} \hlkwc{ylab}\hlstd{=}\hlstr{"Omega DGINN-Young-primate"}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlnum{0}\hlstd{,}\hlnum{1}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstr{'dginn-primate_omegaM0Bpp'}\hlopt{>}\hlnum{0.4} \hlopt{&} \hlstd{tab}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{<}\hlnum{0.2}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstr{'dginn-primate_omegaM0Bpp'}\hlstd{,}
\hlkwc{y}\hlstd{=(outlier}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{+}\hlnum{0.01}\hlstd{),}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstr{'dginn-primate_omegaM0Bpp'}\hlopt{>}\hlnum{0.5} \hlopt{&} \hlstd{tab}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{<}\hlnum{0.4}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstr{'dginn-primate_omegaM0Bpp'}\hlstd{,}
\hlkwc{y}\hlstd{=(outlier}\hlopt{$}\hlstd{Omega_PamlM7M8}\hlopt{+}\hlnum{0.01}\hlstd{),}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/omegaM7M8_2-1}
\end{knitrout}
\subsection{Janet Young's results (Young-primate) VS DGINN-full's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{plot}\hlstd{(tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,} \hlkwd{as.numeric}\hlstd{(}\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstr{'dginn-primate_omegaM0Bpp'}\hlstd{)),}
\hlkwc{xlab}\hlstd{=}\hlstr{"Omega Young-primate"}\hlstd{,} \hlkwc{ylab}\hlstd{=}\hlstr{"DGINN-full's"}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlnum{0}\hlstd{,}\hlnum{1}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{<}\hlnum{0.4} \hlopt{&} \hlstd{tab}\hlopt{$}\hlstr{'dginn-primate_omegaM0Bpp'}\hlopt{>}\hlnum{0.5}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstr{'dginn-primate_omegaM0Bpp'}\hlstd{,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/omegaM7M8_3-1}
\end{knitrout}
\section{Overlap}
\subsection{Mondrian}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{library}\hlstd{(Mondrian)}
\hlcom{#######}
\hlstd{monddata}\hlkwb{<-}\hlkwd{as.data.frame}\hlstd{(tab}\hlopt{$}\hlstd{Gene.name)}
\hlkwd{dim}\hlstd{(monddata)}
\end{alltt}
\begin{verbatim}
## [1] 333 1
\end{verbatim}
\begin{alltt}
\hlstd{dginnyoungtmp}\hlkwb{<-}\hlkwd{rowSums}\hlstd{(}\hlkwd{cbind}\hlstd{(tab}\hlopt{$}\hlstd{PosSel_PamlM1M2}\hlopt{==}\hlstr{"Y"}\hlstd{, tab}\hlopt{$}\hlstd{PosSel_PamlM7M8}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstd{PosSel_BppM1M2}\hlopt{==}\hlstr{"Y"}\hlstd{, tab}\hlopt{$}\hlstd{PosSel_BppM7M8}\hlopt{==}\hlstr{"Y"}\hlstd{, tab}\hlopt{$}\hlstd{PosSel_BUSTED}\hlopt{==}\hlstr{"Y"}\hlstd{))}
\hlcom{#monddata$primates_dginn_young<-ifelse(tmp$PosSel_PamlM7M8=="Y", 1,0)}
\hlstd{dginnfulltmp}\hlkwb{<-}\hlkwd{rowSums}\hlstd{(}\hlkwd{cbind}\hlstd{(tab}\hlopt{$}\hlstr{'dginn-primate_BUSTED'}\hlopt{==}\hlstr{"Y"}\hlstd{, tab}\hlopt{$}\hlstr{'dginn-primate_BppM1M2'}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstr{'dginn-primate_BppM7M8'}\hlopt{==}\hlstr{"Y"}\hlstd{, tab}\hlopt{$}\hlstr{'dginn-primate_codemlM1M2'}\hlopt{==}\hlstr{"Y"}\hlstd{, tab}\hlopt{$}\hlstr{'dginn-primate_codemlM7M8'}\hlopt{==}\hlstr{"Y"}\hlstd{))}
\hlstd{monddata}\hlopt{$}\hlstd{primates_young}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(tab}\hlopt{$}\hlstd{pVal.M8vsM7}\hlopt{<}\hlnum{0.05}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlcom{#monddata$primates_cooper<-ifelse(tab$cooper.primates.M7.M8_p_val<0.05, 1, 0)}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_young}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnyoungtmp}\hlopt{>=}\hlnum{3}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_full}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnfulltmp}\hlopt{>=}\hlnum{3}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{mondrian}\hlstd{(}\hlkwd{na.omit}\hlstd{(monddata[,}\hlnum{2}\hlopt{:}\hlnum{4}\hlstd{]),} \hlkwc{labels}\hlstd{=}\hlkwd{c}\hlstd{(}\hlstr{"Young"}\hlstd{,} \hlstr{"DGINN-Young >=3"}\hlstd{,} \hlstr{"DGINN-full >=3"} \hlstd{))}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/mondrianprimates-1}
\begin{kframe}\begin{alltt}
\hlcom{#####}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_young}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnyoungtmp}\hlopt{>=}\hlnum{4}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_full}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnfulltmp}\hlopt{>=}\hlnum{4}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{mondrian}\hlstd{(}\hlkwd{na.omit}\hlstd{(monddata[,}\hlnum{2}\hlopt{:}\hlnum{4}\hlstd{]),} \hlkwc{labels}\hlstd{=}\hlkwd{c}\hlstd{(}\hlstr{"Young"}\hlstd{,} \hlstr{"DGINN-Young >=4"}\hlstd{,} \hlstr{"DGINN-full >=4"}\hlstd{))}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/mondrianprimates-2}
\end{knitrout}
Comparison of results with the same method.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlcom{#####}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_young}\hlkwb{<-}\hlstd{tab}\hlopt{$}\hlstd{PosSel_BppM7M8}\hlopt{==}\hlstr{"Y"}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_full}\hlkwb{<-}\hlstd{tab}\hlopt{$}\hlstr{'dginn-primate_codemlM7M8'}\hlopt{==}\hlstr{"Y"}
\hlkwd{mondrian}\hlstd{(}\hlkwd{na.omit}\hlstd{(monddata[,}\hlnum{2}\hlopt{:}\hlnum{4}\hlstd{]),} \hlkwc{labels}\hlstd{=}\hlkwd{c}\hlstd{(}\hlstr{"Young"}\hlstd{,} \hlstr{"DGINN-Young"}\hlstd{,} \hlstr{"DGINN-full"}\hlstd{),} \hlkwc{main}\hlstd{=}\hlstr{"posel codeml M7M8"}\hlstd{)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/unnamed-chunk-8-1}
\end{knitrout}
\subsection{subsetR}
Just another representation of the same result.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{library}\hlstd{(UpSetR)}
\hlstd{upsetdata}\hlkwb{<-}\hlkwd{as.data.frame}\hlstd{(tab}\hlopt{$}\hlstd{Gene.name)}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_young}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(tab}\hlopt{$}\hlstd{pVal.M8vsM7}\hlopt{<}\hlnum{0.05}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlcom{###}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_young}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnyoungtmp}\hlopt{>=}\hlnum{3}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_full}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnfulltmp}\hlopt{>=}\hlnum{3}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{upset}\hlstd{(}\hlkwd{na.omit}\hlstd{(upsetdata),} \hlkwc{nsets} \hlstd{=} \hlnum{3}\hlstd{,} \hlkwc{matrix.color} \hlstd{=} \hlstr{"#DC267F"}\hlstd{,}
\hlkwc{main.bar.color} \hlstd{=} \hlstr{"#648FFF"}\hlstd{,} \hlkwc{sets.bar.color} \hlstd{=} \hlstr{"#FE6100"}\hlstd{)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/subsetprimates-1}
\begin{kframe}\begin{alltt}
\hlcom{###}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_young}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnyoungtmp}\hlopt{>=}\hlnum{4}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_full}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnfulltmp}\hlopt{>=}\hlnum{4}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{upset}\hlstd{(}\hlkwd{na.omit}\hlstd{(upsetdata),} \hlkwc{nsets} \hlstd{=} \hlnum{3}\hlstd{,} \hlkwc{matrix.color} \hlstd{=} \hlstr{"#DC267F"}\hlstd{,}
\hlkwc{main.bar.color} \hlstd{=} \hlstr{"#648FFF"}\hlstd{,} \hlkwc{sets.bar.color} \hlstd{=} \hlstr{"#FE6100"}\hlstd{)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/subsetprimates-2}
\end{knitrout}
\section{Gene List}
Genes under positive selection for at least 4 methods.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{dginnfulltmp}\hlkwb{<-}\hlkwd{rowSums}\hlstd{(}\hlkwd{cbind}\hlstd{(tab}\hlopt{$}\hlstr{'dginn-primate_BUSTED'}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstr{'dginn-primate_BppM1M2'}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstr{'dginn-primate_BppM7M8'}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstr{'dginn-primate_codemlM1M2'}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstr{'dginn-primate_codemlM7M8'}\hlopt{==}\hlstr{"Y"}\hlstd{))}
\hlstd{tab}\hlopt{$}\hlstd{Gene.name[dginnfulltmp}\hlopt{>=}\hlnum{4} \hlopt{&} \hlkwd{is.na}\hlstd{(dginnfulltmp)}\hlopt{==}\hlstd{F]}
\end{alltt}
\begin{verbatim}
## [1] "ACADM" "BCS1L" "BRD4" "CDK5RAP2" "CEP135" "CEP68" "CLIP4" "DNMT1" "DPH5" "EMC1" "FYCO1"
## [12] "GCC2" "GGH" "GHITM" "GIGYF2" "GLA" "GOLGA7" "HECTD1" "IDE" "ITGB1" "LARP1" "LARP4B"
## [23] "LMAN2" "MARK1" "MIPOL1" "MPHOSPH10" "MYCBP2" "NDUFAF2" "NDUFB9" "PCNT" "POLA1" "PRIM2" "PRKAR2A"
## [34] "PVR" "REEP6" "RIPK1" "SAAL1" "SEPSECS" "SIRT5" "SLC25A21" "SLC27A2" "TMEM39B" "TOR1AIP1" "TUBGCP2"
## [45] "UBAP2" "UGGT2" "VPS39" "ZNF318"
\end{verbatim}
\begin{alltt}
\hlstd{tab}\hlopt{$}\hlstd{Gene.name[dginnfulltmp}\hlopt{>=}\hlnum{3} \hlopt{&} \hlkwd{is.na}\hlstd{(dginnfulltmp)}\hlopt{==}\hlstd{F]}
\end{alltt}
\begin{verbatim}
## [1] "ACADM" "ADAM9" "AP2A2" "ATE1" "BCS1L" "BRD4" "BZW2" "CDK5RAP2" "CEP135" "CEP68" "CLIP4"
## [12] "CNTRL" "DNMT1" "DPH5" "EDEM3" "EIF4E2" "EMC1" "EXOSC2" "FYCO1" "GCC2" "GGH" "GHITM"
## [23] "GIGYF2" "GLA" "GOLGA7" "GOLGB1" "GORASP1" "HDAC2" "HECTD1" "HS6ST2" "IDE" "ITGB1" "LARP1"
## [34] "LARP4B" "LARP7" "LMAN2" "MARK1" "MDN1" "MIPOL1" "MOV10" "MPHOSPH10" "MRPS5" "MYCBP2" "NAT14"
## [45] "NDUFAF2" "NDUFB9" "NGLY1" "NPC2" "PCNT" "PITRM1" "PLAT" "PLOD2" "PMPCB" "POLA1" "POR"
## [56] "PRIM2" "PRKAR2A" "PTBP2" "PVR" "RAB14" "RAB1A" "RAB2A" "RAP1GDS1" "RBX1" "REEP6" "RIPK1"
## [67] "RPL36" "SAAL1" "SCCPDH" "SEPSECS" "SIRT5" "SLC25A21" "SLC27A2" "STOM" "TIMM8B" "TMEM39B" "TOR1AIP1"
## [78] "TRIM59" "TRMT1" "TUBGCP2" "UBAP2" "UGGT2" "USP54" "VPS39" "ZNF318"
\end{verbatim}
\begin{alltt}
\hlstd{tmp}\hlkwb{<-}\hlstd{tab[dginnfulltmp}\hlopt{>=}\hlnum{4} \hlopt{&} \hlkwd{is.na}\hlstd{(dginnfulltmp)}\hlopt{==}\hlstd{F,}
\hlkwd{c}\hlstd{(}\hlstr{"Gene.name"}\hlstd{,}\hlstr{"dginn-primate_BUSTED"}\hlstd{,} \hlstr{"dginn-primate_BppM1M2"}\hlstd{,}
\hlstr{"dginn-primate_BppM7M8"}\hlstd{,}\hlstr{"dginn-primate_codemlM1M2"}\hlstd{,}\hlstr{"dginn-primate_codemlM7M8"}\hlstd{)]}
\hlkwd{write.table}\hlstd{(tmp,} \hlstr{"geneList_DGINN_full_primate_pos4.txt"}\hlstd{,} \hlkwc{row.names}\hlstd{=F,} \hlkwc{quote}\hlstd{=F)}
\end{alltt}
\end{kframe}
\end{knitrout}
\section{Shiny like}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{makeFig1} \hlkwb{<-} \hlkwa{function}\hlstd{(}\hlkwc{df}\hlstd{)\{}
\hlcom{# prepare data for colors etc}
\hlstd{colMethods} \hlkwb{<-} \hlkwd{c}\hlstd{(}\hlstr{"deepskyblue4"}\hlstd{,} \hlstr{"darkorange"} \hlstd{,} \hlstr{"deepskyblue3"} \hlstd{,} \hlstr{"mediumseagreen"} \hlstd{,} \hlstr{"yellow3"} \hlstd{,} \hlstr{"black"}\hlstd{)}
\hlstd{nameMethods} \hlkwb{<-} \hlkwd{c}\hlstd{(}\hlstr{"BUSTED"}\hlstd{,} \hlstr{"BppM1M2"}\hlstd{,} \hlstr{"BppM7M8"}\hlstd{,} \hlstr{"codemlM1M2"}\hlstd{,} \hlstr{"codemlM7M8"}\hlstd{,} \hlstr{"MEME"}\hlstd{)}
\hlstd{metColor} \hlkwb{<-} \hlkwd{data.frame}\hlstd{(}\hlkwc{Name} \hlstd{= nameMethods ,} \hlkwc{Col} \hlstd{= colMethods ,} \hlkwc{stringsAsFactors} \hlstd{=} \hlnum{FALSE}\hlstd{)}
\hlcom{# subset for this specific figure}
\hlcom{#df <- df[df$nbY >= 1, ] # to drop genes found by 0 methods (big datasets)}
\hlstd{xt} \hlkwb{<-} \hlstd{df[,} \hlkwd{c}\hlstd{(}\hlstr{"BUSTED"}\hlstd{,} \hlstr{"BppM1M2"}\hlstd{,} \hlstr{"BppM7M8"}\hlstd{,} \hlstr{"codemlM1M2"}\hlstd{,} \hlstr{"codemlM7M8"}\hlstd{)]}
\hlstd{xt}\hlopt{$}\hlstd{Gene} \hlkwb{<-} \hlstd{df}\hlopt{$}\hlstd{Gene}
\hlstd{nbrMeth} \hlkwb{<-} \hlnum{5}
\hlcom{# reverse order of dataframe so that genes with the most Y are at the bottom (to be on top of the barplot)}
\hlstd{xt[,}\hlnum{1}\hlopt{:}\hlnum{5}\hlstd{]} \hlkwb{<-} \hlkwd{ifelse}\hlstd{(xt[,}\hlnum{1}\hlopt{:}\hlnum{5}\hlstd{]} \hlopt{==} \hlstr{"Y"}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlcom{# sort and Filter the 0 lines}
\hlstd{xt}\hlkwb{<-}\hlstd{xt[}\hlkwd{order}\hlstd{(}\hlkwd{rowSums}\hlstd{(xt[,}\hlnum{1}\hlopt{:}\hlnum{5}\hlstd{])),]}
\hlstd{xt}\hlkwb{<-}\hlstd{xt[}\hlkwd{rowSums}\hlstd{(xt[,}\hlnum{1}\hlopt{:}\hlnum{5}\hlstd{])}\hlopt{>}\hlnum{2}\hlstd{,]}
\hlkwd{row.names}\hlstd{(xt)}\hlkwb{<-}\hlstd{xt}\hlopt{$}\hlstd{Gene}
\hlstd{xt}\hlkwb{<-}\hlstd{xt[,}\hlnum{1}\hlopt{:}\hlnum{5}\hlstd{]}
\hlstd{colFig1} \hlkwb{<-} \hlstd{metColor[}\hlkwd{which}\hlstd{(metColor}\hlopt{$}\hlstd{Name} \hlopt{%in%} \hlkwd{colnames}\hlstd{(xt)) , ]}
\hlcom{##### PART 1 : NUMBER OF METHODS}
\hlkwd{par}\hlstd{(}\hlkwc{xpd} \hlstd{=} \hlnum{NA} \hlstd{,} \hlkwc{mar}\hlstd{=}\hlkwd{c}\hlstd{(}\hlnum{2}\hlstd{,}\hlnum{7}\hlstd{,}\hlnum{4}\hlstd{,}\hlnum{0}\hlstd{) ,} \hlkwc{oma} \hlstd{=} \hlkwd{c}\hlstd{(}\hlnum{0}\hlstd{,}\hlnum{0}\hlstd{,}\hlnum{0}\hlstd{,}\hlnum{0}\hlstd{) ,} \hlkwc{mgp} \hlstd{=} \hlkwd{c}\hlstd{(}\hlnum{3}\hlstd{,}\hlnum{0.3}\hlstd{,}\hlnum{0}\hlstd{))}
\hlstd{h} \hlkwb{=} \hlkwd{barplot}\hlstd{(}
\hlkwd{t}\hlstd{(xt),}
\hlkwc{border} \hlstd{=} \hlnum{NA} \hlstd{,}
\hlkwc{axes} \hlstd{= F ,}
\hlkwc{col} \hlstd{=} \hlkwd{adjustcolor}\hlstd{(colFig1}\hlopt{$}\hlstd{Col,} \hlkwc{alpha.f} \hlstd{=} \hlnum{1}\hlstd{),}
\hlkwc{horiz} \hlstd{= T ,}
\hlkwc{las} \hlstd{=} \hlnum{2} \hlstd{,}
\hlkwc{main} \hlstd{=} \hlstr{"Methods detecting positive selection"} \hlstd{,}
\hlkwc{cex.main} \hlstd{=} \hlnum{0.85}\hlstd{,}
\hlkwc{cex.names} \hlstd{=} \hlkwd{min}\hlstd{(}\hlnum{50}\hlopt{/}\hlkwd{nrow}\hlstd{(xt),} \hlnum{1.5}\hlstd{)}
\hlstd{)}
\hlkwd{axis}\hlstd{(}\hlnum{3}\hlstd{,} \hlkwc{line} \hlstd{=} \hlnum{0}\hlstd{,} \hlkwc{at} \hlstd{=} \hlkwd{c}\hlstd{(}\hlnum{0}\hlopt{:}\hlstd{nbrMeth),} \hlkwc{label} \hlstd{=} \hlkwd{c}\hlstd{(}\hlstr{"0"}\hlstd{,} \hlkwd{rep}\hlstd{(}\hlstr{""}\hlstd{, nbrMeth} \hlopt{-}\hlnum{1}\hlstd{), nbrMeth),} \hlkwc{tck} \hlstd{=} \hlnum{0.02}\hlstd{)}
\hlkwd{legend}\hlstd{(}\hlstr{"bottomleft"}\hlstd{,}
\hlkwc{horiz} \hlstd{= T,}
\hlkwc{border} \hlstd{= colFig1}\hlopt{$}\hlstd{Col,}
\hlkwc{legend} \hlstd{= colFig1}\hlopt{$}\hlstd{Name,}
\hlkwc{fill} \hlstd{= colFig1}\hlopt{$}\hlstd{Col,}
\hlkwc{cex} \hlstd{=} \hlnum{0.8}\hlstd{,}
\hlkwc{bty} \hlstd{=} \hlstr{"n"}\hlstd{,}
\hlkwc{xpd} \hlstd{=} \hlnum{NA}
\hlstd{)}
\hlstd{\}}
\hlstd{df}\hlkwb{<-}\hlkwd{read.delim}\hlstd{(}\hlkwd{paste0}\hlstd{(workdir,}
\hlstr{"/data/DGINN_202005281649summary_cleaned.csv"}\hlstd{),}
\hlkwc{fill}\hlstd{=T,} \hlkwc{h}\hlstd{=T,} \hlkwc{sep}\hlstd{=}\hlstr{","}\hlstd{)}
\hlkwd{makeFig1}\hlstd{(df)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/shiny-1}
\end{knitrout}
\end{document}
\documentclass[11pt, oneside]{article} % use "amsart" instead of "article" for AMSLaTeX format
%\usepackage{geometry} % See geometry.pdf to learn the layout options. There are lots.
%\geometry{letterpaper} % ... or a4paper or a5paper or ...
%\geometry{landscape} % Activate for for rotated page geometry
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
%\usepackage{graphicx} % Use pdf, png, jpg, or eps with pdflatex; use eps in DVI mode
% TeX will automatically convert eps --> pdf in pdflatex
%\usepackage{amssymb}
\usepackage[utf8]{inputenc}
%\usepackage[cyr]{aeguill}
%\usepackage[francais]{babel}
%\usepackage{hyperref}
\title{Positive selection on genes interacting with SARS-Cov2, comparison of different analysis}
\author{Marie Cariou}
\date{October 2020} % Activate to display a given date or no date
\begin{document}
\maketitle
\tableofcontents
\newpage
\section{Data}
Analysis were formatted by the script covid\_comp\_script0\_table.Rnw.
<<>>=
home<-"/home/adminmarie/Documents/"
workdir<-paste0(home, "CIRI_BIBS_projects/2020_05_Etienne_covid/")
tab<-read.delim(paste0(workdir,
"covid_comp/covid_comp_complete.txt"), h=T, sep="\t")
dim(tab)
tab$Gene.name<-as.character(tab$Gene.name.x)
tab$Gene.name[tab$PreyGene=="MTARC1"]<-"MTARC1"
@
\section{Comparisons Primates}
\subsection{Janet Young's results (Young-primate) VS DGINN-full's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
<<omegaM7M8_1>>=
tab$dginn.primate_omegaM0Bpp[tab$dginn.primate_omegaM0Bpp=="na"]<-NA
tab$dginn.primate_omegaM0Bpp<-as.numeric(as.character(
tab$dginn.primate_omegaM0Bpp))
plot(tab$whole.gene.dN.dS.model.0,
tab$dginn.primate_omegaM0Bpp,
xlab="Omega Young-primate",
ylab="DGINN-full's",
cex=0.3)
abline(0,1)
abline(lm(tab$dginn.primate_omegaM0Bpp~tab$whole.gene.dN.dS.model.0),
col="red")
outlier<-tab[tab$whole.gene.dN.dS.model.0<0.4 &
tab$dginn.primate_omegaM0Bpp>0.5,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=outlier$dginn.primate_omegaM0Bpp,
outlier$Gene.name)
outlier<-tab[tab$whole.gene.dN.dS.model.0<0.1 &
tab$dginn.primate_omegaM0Bpp>0.3,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=outlier$dginn.primate_omegaM0Bpp,
outlier$Gene.name)
outlier<-tab[tab$whole.gene.dN.dS.model.0>0.33 &
tab$dginn.primate_omegaM0Bpp<0.2,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=outlier$dginn.primate_omegaM0Bpp,
outlier$Gene.name)
outlier<-tab[tab$whole.gene.dN.dS.model.0>0.6 &
tab$dginn.primate_omegaM0Bpp<0.6,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=outlier$dginn.primate_omegaM0Bpp,
outlier$Gene.name)
@
\subsection{Janet Young's results (Young-primate) VS Cooper's result}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "cooper.primates.Average\_dNdS".
<<omegaM7M8_2>>=
tab$cooper.primates.Average_dNdS<-as.numeric(as.character(
tab$cooper.primates.Average_dNdS))
plot(tab$whole.gene.dN.dS.model.0,
tab$cooper.primates.Average_dNdS,
xlab="Omega Young-primate",
ylab="Omega Cooper-primate",
cex=0.3)
abline(0,1)
abline(lm(tab$cooper.primates.Average_dNdS~tab$whole.gene.dN.dS.model.0),
col="red")
outlier<-tab[tab$whole.gene.dN.dS.model.0<0.15 &
tab$cooper.primates.Average_dNdS>0.4,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=outlier$cooper.primates.Average_dNdS,
outlier$Gene.name, cex=0.5)
outlier<-tab[tab$whole.gene.dN.dS.model.0<0.3 &
tab$cooper.primates.Average_dNdS>0.5,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=outlier$cooper.primates.Average_dNdS,
outlier$Gene.name, cex=0.5)
outlier<-tab[tab$whole.gene.dN.dS.model.0>0.3 &
tab$cooper.primates.Average_dNdS<0.1,]
text(x=outlier$whole.gene.dN.dS.model.0,
y=outlier$cooper.primates.Average_dNdS,
outlier$Gene.name, cex=0.5)
@
\subsection{Cooper's results (Cooper-primate) VS DGINN-full's results}
Comparaison des Omega: colonne "cooper.primates.Average\_dNdS" VS colonne "omega" dans la sortie de dginn.
<<omegaM7M8_3>>=
plot(tab$cooper.primates.Average_dNd,
tab$dginn.primate_omegaM0Bpp,
xlab="Omega Cooper-primate",
ylab="DGINN-full's",
cex=0.3)
abline(0,1)
abline(lm(tab$dginn.primate_omegaM0Bpp~tab$cooper.primates.Average_dNd), col="red")
outlier<-tab[tab$cooper.primates.Average_dNd<0.4 &
tab$dginn.primate_omegaM0Bpp>0.5,]
text(x=outlier$cooper.primates.Average_dNd,
y=outlier$dginn.primate_omegaM0Bpp,
outlier$Gene.name, cex=0.5)
outlier<-tab[tab$cooper.primates.Average_dNd<0.1 &
tab$dginn.primate_omegaM0Bpp>0.3,]
text(x=outlier$cooper.primates.Average_dNd,
y=outlier$dginn.primate_omegaM0Bpp,
outlier$Gene.name, cex=0.5)
outlier<-tab[tab$cooper.primates.Average_dNd>0.7 &
tab$dginn.primate_omegaM0Bpp<0.3,]
text(x=outlier$cooper.primates.Average_dNd,
y=outlier$dginn.primate_omegaM0Bpp,
outlier$Gene.name, cex=0.5)
outlier<-tab[tab$cooper.primates.Average_dNd>0.45 &
tab$dginn.primate_omegaM0Bpp<0.2,]
text(x=outlier$cooper.primates.Average_dNd,
y=outlier$dginn.primate_omegaM0Bpp,
outlier$Gene.name, cex=0.5)
@
\section{Overlap}
\subsection{Mondrian}
<<mondrianprimates>>=
library(Mondrian)
monddata<-as.data.frame(tab$Gene.name)
dim(monddata)
dginnfulltmp<-rowSums(cbind(tab$dginn.primate_BUSTED=="Y",
tab$dginn.primate_BppM1M2=="Y",
tab$dginn.primate_BppM7M8=="Y",
tab$dginn.primate_codemlM1M2=="Y",
tab$dginn.primate_codemlM7M8=="Y"))
monddata$primates_young<-ifelse(
tab$pVal.M8vsM7<0.05, 1, 0)
monddata$primate_cooper<-ifelse(
tab$cooper.primates.M7.M8_p_value<0.05, 1, 0)
monddata$primates_dginn_full<-ifelse(
dginnfulltmp>=3, 1,0)
mondrian(na.omit(monddata[,2:4]),
labels=c("Young", "Cooper", "DGINN-full >=3" ))
monddata$primates_dginn_full<-ifelse(
dginnfulltmp>=4, 1,0)
mondrian(na.omit(monddata[,2:4]),
labels=c("Young", "Cooper", "DGINN-full >=4"))
@
\subsection{subsetR}
Just another representation of the same result.
<<subsetprimates>>=
library(UpSetR)
upsetdata<-as.data.frame(tab$Gene.name)
upsetdata$primates_young<-ifelse(tab$pVal.M8vsM7<0.05, 1, 0)
upsetdata$primate_cooper<-ifelse(
tab$cooper.primates.M7.M8_p_value<0.05, 1, 0)
upsetdata$primates_dginn_full<-ifelse(dginnfulltmp>=3, 1,0)
upset(na.omit(upsetdata), nsets = 3, matrix.color = "#DC267F",
main.bar.color = "#648FFF", sets.bar.color = "#FE6100")
###
upsetdata$primates_dginn_full<-ifelse(dginnfulltmp>=4, 1,0)
upset(na.omit(upsetdata), nsets = 3, matrix.color = "#DC267F",
main.bar.color = "#648FFF", sets.bar.color = "#FE6100")
@
\section{Gene List}
Genes under positive selection for at least 4 methods.
<<>>=
dginnfulltmp<-rowSums(cbind(tab$dginn.primate_BUSTED=="Y",
tab$dginn.primate_BppM1M2=="Y",
tab$dginn.primate_BppM7M8=="Y",
tab$dginn.primate_codemlM1M2=="Y",
tab$dginn.primate_codemlM7M8=="Y"))
tab$Gene.name[dginnfulltmp>=4 & is.na(dginnfulltmp)==F]
tab$Gene.name[dginnfulltmp>=3 & is.na(dginnfulltmp)==F]
tmp<-tab[dginnfulltmp>=4 & is.na(dginnfulltmp)==F,
c("Gene.name","dginn.primate_BUSTED", "dginn.primate_BppM1M2",
"dginn.primate_BppM7M8","dginn.primate_codemlM1M2","dginn.primate_codemlM7M8")]
write.table(tmp, "geneList_DGINN_full_primate_pos4.txt", row.names=F, quote=F)
@
\section{Shiny like}
<<shiny, fig.height=11>>=
makeFig1 <- function(df){
# prepare data for colors etc
colMethods <- c("deepskyblue4", "darkorange" , "deepskyblue3" , "mediumseagreen" , "yellow3" , "black")
nameMethods <- c("BUSTED", "BppM1M2", "BppM7M8", "codemlM1M2", "codemlM7M8", "MEME")
metColor <- data.frame(Name = nameMethods , Col = colMethods , stringsAsFactors = FALSE)
# subset for this specific figure
#df <- df[df$nbY >= 1, ] # to drop genes found by 0 methods (big datasets)
xt <- df[, c("BUSTED", "BppM1M2", "BppM7M8", "codemlM1M2", "codemlM7M8")]
xt$Gene <- df$Gene
nbrMeth <- 5
# reverse order of dataframe so that genes with the most Y are at the bottom (to be on top of the barplot)
xt[,1:5] <- ifelse(xt[,1:5] == "Y", 1, 0)
# sort and Filter the 0 lines
xt<-xt[order(rowSums(xt[,1:5])),]
xt<-xt[rowSums(xt[,1:5])>2,]
row.names(xt)<-xt$Gene
xt<-xt[,1:5]
colFig1 <- metColor[which(metColor$Name %in% colnames(xt)) , ]
##### PART 1 : NUMBER OF METHODS
par(xpd = NA , mar=c(2,7,4,0) , oma = c(0,0,0,0) , mgp = c(3,0.3,0))
h = barplot(
t(xt),
border = NA ,
axes = F ,
col = adjustcolor(colFig1$Col, alpha.f = 1),
horiz = T ,
las = 2 ,
main = "Methods detecting positive selection" ,
cex.main = 0.85,
cex.names = min(50/nrow(xt), 1.5)
)
axis(3, line = 0, at = c(0:nbrMeth), label = c("0", rep("", nbrMeth -1), nbrMeth), tck = 0.02)
legend("bottomleft",
horiz = T,
border = colFig1$Col,
legend = colFig1$Name,
fill = colFig1$Col,
cex = 0.8,
bty = "n",
xpd = NA
)
}
@
<<>>=
source("covid_comp_shiny.R")
df<-read.delim(paste0(workdir,
"/data/DGINN_202005281649summary_cleaned.csv"),
fill=T, h=T, sep=",")
names(df)
dftmp<-tab[,c("File", "Name", "Gene.name",
"GeneSize", "dginn.primate_NbSpecies", "dginn.primate_omegaM0Bpp",
"dginn.primate_omegaM0codeml", "dginn.primate_BUSTED", "dginn.primate_BUSTED.p.value",
"dginn.primate_MEME.NbSites", "dginn.primate_MEME.PSS", "dginn.primate_BppM1M2",
"dginn.primate_BppM1M2.p.value", "dginn.primate_BppM1M2.NbSites", "dginn.primate_BppM1M2.PSS",
"dginn.primate_BppM7M8", "dginn.primate_BppM7M8.p.value", "dginn.primate_BppM7M8.NbSites",
"dginn.primate_BppM7M8.PSS", "dginn.primate_codemlM1M2", "dginn.primate_codemlM1M2.p.value",
"dginn.primate_codemlM1M2.NbSites","dginn.primate_codemlM1M2.PSS", "dginn.primate_codemlM7M8",
"dginn.primate_codemlM7M8.p.value", "dginn.primate_codemlM7M8.NbSites" , "dginn.primate_codemlM7M8.PSS")]
names(dftmp)<-names(df)
makeFig1(dftmp)
@
\end{document}
File deleted
\documentclass[11pt, oneside]{article}\usepackage[]{graphicx}\usepackage[]{color}
% maxwidth is the original width if it is less than linewidth
% otherwise use linewidth (to make sure the graphics do not exceed the margin)
\makeatletter
\def\maxwidth{ %
\ifdim\Gin@nat@width>\linewidth
\linewidth
\else
\Gin@nat@width
\fi
}
\makeatother
\definecolor{fgcolor}{rgb}{0.345, 0.345, 0.345}
\newcommand{\hlnum}[1]{\textcolor[rgb]{0.686,0.059,0.569}{#1}}%
\newcommand{\hlstr}[1]{\textcolor[rgb]{0.192,0.494,0.8}{#1}}%
\newcommand{\hlcom}[1]{\textcolor[rgb]{0.678,0.584,0.686}{\textit{#1}}}%
\newcommand{\hlopt}[1]{\textcolor[rgb]{0,0,0}{#1}}%
\newcommand{\hlstd}[1]{\textcolor[rgb]{0.345,0.345,0.345}{#1}}%
\newcommand{\hlkwa}[1]{\textcolor[rgb]{0.161,0.373,0.58}{\textbf{#1}}}%
\newcommand{\hlkwb}[1]{\textcolor[rgb]{0.69,0.353,0.396}{#1}}%
\newcommand{\hlkwc}[1]{\textcolor[rgb]{0.333,0.667,0.333}{#1}}%
\newcommand{\hlkwd}[1]{\textcolor[rgb]{0.737,0.353,0.396}{\textbf{#1}}}%
\let\hlipl\hlkwb
\usepackage{framed}
\makeatletter
\newenvironment{kframe}{%
\def\at@end@of@kframe{}%
\ifinner\ifhmode%
\def\at@end@of@kframe{\end{minipage}}%
\begin{minipage}{\columnwidth}%
\fi\fi%
\def\FrameCommand##1{\hskip\@totalleftmargin \hskip-\fboxsep
\colorbox{shadecolor}{##1}\hskip-\fboxsep
% There is no \\@totalrightmargin, so:
\hskip-\linewidth \hskip-\@totalleftmargin \hskip\columnwidth}%
\MakeFramed {\advance\hsize-\width
\@totalleftmargin\z@ \linewidth\hsize
\@setminipage}}%
{\par\unskip\endMakeFramed%
\at@end@of@kframe}
\makeatother
\definecolor{shadecolor}{rgb}{.97, .97, .97}
\definecolor{messagecolor}{rgb}{0, 0, 0}
\definecolor{warningcolor}{rgb}{1, 0, 1}
\definecolor{errorcolor}{rgb}{1, 0, 0}
\newenvironment{knitrout}{}{} % an empty environment to be redefined in TeX
\usepackage{alltt} % use "amsart" instead of "article" for AMSLaTeX format
%\usepackage{geometry} % See geometry.pdf to learn the layout options. There are lots.
%\geometry{letterpaper} % ... or a4paper or a5paper or ...
%\geometry{landscape} % Activate for for rotated page geometry
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
%\usepackage{graphicx} % Use pdf, png, jpg, or eps with pdflatex; use eps in DVI mode
% TeX will automatically convert eps --> pdf in pdflatex
%\usepackage{amssymb}
\usepackage[utf8]{inputenc}
%\usepackage[cyr]{aeguill}
%\usepackage[francais]{babel}
%\usepackage{hyperref}
\title{Positive selection on genes interacting with SARS-Cov2, comparison of different analysis}
\author{Marie Cariou}
\date{October 2020} % Activate to display a given date or no date
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\begin{document}
\maketitle
\tableofcontents
\newpage
\section{Data}
Analysis were formatted by the script covid\_comp\_script0\_table.Rnw.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{home}\hlkwb{<-}\hlstr{"/home/adminmarie/Documents/"}
\hlstd{workdir}\hlkwb{<-}\hlkwd{paste0}\hlstd{(home,} \hlstr{"CIRI_BIBS_projects/2020_05_Etienne_covid/"}\hlstd{)}
\hlstd{tab}\hlkwb{<-}\hlkwd{read.delim}\hlstd{(}\hlkwd{paste0}\hlstd{(workdir,}
\hlstr{"covid_comp/covid_comp_complete.txt"}\hlstd{),} \hlkwc{h}\hlstd{=T,} \hlkwc{sep}\hlstd{=}\hlstr{"\textbackslash{}t"}\hlstd{)}
\hlkwd{dim}\hlstd{(tab)}
\end{alltt}
\begin{verbatim}
## [1] 332 141
\end{verbatim}
\begin{alltt}
\hlstd{tab}\hlopt{$}\hlstd{Gene.name}\hlkwb{<-}\hlkwd{as.character}\hlstd{(tab}\hlopt{$}\hlstd{Gene.name.x)}
\hlstd{tab}\hlopt{$}\hlstd{Gene.name[tab}\hlopt{$}\hlstd{PreyGene}\hlopt{==}\hlstr{"MTARC1"}\hlstd{]}\hlkwb{<-}\hlstr{"MTARC1"}
\end{alltt}
\end{kframe}
\end{knitrout}
\section{Comparisons Primates}
\subsection{Janet Young's results (Young-primate) VS DGINN-full's results}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "omega" dans la sortie de dginn.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp[tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp}\hlopt{==}\hlstr{"na"}\hlstd{]}\hlkwb{<-}\hlnum{NA}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp}\hlkwb{<-}\hlkwd{as.numeric}\hlstd{(}\hlkwd{as.character}\hlstd{(}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp))}
\hlkwd{plot}\hlstd{(tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp,}
\hlkwc{xlab}\hlstd{=}\hlstr{"Omega Young-primate"}\hlstd{,}
\hlkwc{ylab}\hlstd{=}\hlstr{"DGINN-full's"}\hlstd{,}
\hlkwc{cex}\hlstd{=}\hlnum{0.3}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlnum{0}\hlstd{,}\hlnum{1}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlkwd{lm}\hlstd{(tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp}\hlopt{~}\hlstd{tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0),}
\hlkwc{col}\hlstd{=}\hlstr{"red"}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{<}\hlnum{0.4} \hlopt{&}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp}\hlopt{>}\hlnum{0.5}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{<}\hlnum{0.1} \hlopt{&}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp}\hlopt{>}\hlnum{0.3}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{>}\hlnum{0.33} \hlopt{&}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp}\hlopt{<}\hlnum{0.2}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{>}\hlnum{0.6} \hlopt{&}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp}\hlopt{<}\hlnum{0.6}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/omegaM7M8_1-1}
\end{knitrout}
\subsection{Janet Young's results (Young-primate) VS Cooper's result}
Comparaison des Omega: colonne L "whole.gene.dN.dS.model.0" VS colonne "cooper.primates.Average\_dNdS".
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{tab}\hlopt{$}\hlstd{cooper.primates.Average_dNdS}\hlkwb{<-}\hlkwd{as.numeric}\hlstd{(}\hlkwd{as.character}\hlstd{(}
\hlstd{tab}\hlopt{$}\hlstd{cooper.primates.Average_dNdS))}
\hlkwd{plot}\hlstd{(tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlstd{tab}\hlopt{$}\hlstd{cooper.primates.Average_dNdS,}
\hlkwc{xlab}\hlstd{=}\hlstr{"Omega Young-primate"}\hlstd{,}
\hlkwc{ylab}\hlstd{=}\hlstr{"Omega Cooper-primate"}\hlstd{,}
\hlkwc{cex}\hlstd{=}\hlnum{0.3}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlnum{0}\hlstd{,}\hlnum{1}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlkwd{lm}\hlstd{(tab}\hlopt{$}\hlstd{cooper.primates.Average_dNdS}\hlopt{~}\hlstd{tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0),}
\hlkwc{col}\hlstd{=}\hlstr{"red"}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{<}\hlnum{0.15} \hlopt{&}
\hlstd{tab}\hlopt{$}\hlstd{cooper.primates.Average_dNdS}\hlopt{>}\hlnum{0.4}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstd{cooper.primates.Average_dNdS,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name,} \hlkwc{cex}\hlstd{=}\hlnum{0.5}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{<}\hlnum{0.3} \hlopt{&}
\hlstd{tab}\hlopt{$}\hlstd{cooper.primates.Average_dNdS}\hlopt{>}\hlnum{0.5}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstd{cooper.primates.Average_dNdS,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name,} \hlkwc{cex}\hlstd{=}\hlnum{0.5}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0}\hlopt{>}\hlnum{0.3} \hlopt{&}
\hlstd{tab}\hlopt{$}\hlstd{cooper.primates.Average_dNdS}\hlopt{<}\hlnum{0.1}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{whole.gene.dN.dS.model.0,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstd{cooper.primates.Average_dNdS,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name,} \hlkwc{cex}\hlstd{=}\hlnum{0.5}\hlstd{)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/omegaM7M8_2-1}
\end{knitrout}
\subsection{Cooper's results (Cooper-primate) VS DGINN-full's results}
Comparaison des Omega: colonne "cooper.primates.Average\_dNdS" VS colonne "omega" dans la sortie de dginn.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{plot}\hlstd{(tab}\hlopt{$}\hlstd{cooper.primates.Average_dNd,}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp,}
\hlkwc{xlab}\hlstd{=}\hlstr{"Omega Cooper-primate"}\hlstd{,}
\hlkwc{ylab}\hlstd{=}\hlstr{"DGINN-full's"}\hlstd{,}
\hlkwc{cex}\hlstd{=}\hlnum{0.3}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlnum{0}\hlstd{,}\hlnum{1}\hlstd{)}
\hlkwd{abline}\hlstd{(}\hlkwd{lm}\hlstd{(tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp}\hlopt{~}\hlstd{tab}\hlopt{$}\hlstd{cooper.primates.Average_dNd),} \hlkwc{col}\hlstd{=}\hlstr{"red"}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{cooper.primates.Average_dNd}\hlopt{<}\hlnum{0.4} \hlopt{&}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp}\hlopt{>}\hlnum{0.5}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{cooper.primates.Average_dNd,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name,} \hlkwc{cex}\hlstd{=}\hlnum{0.5}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{cooper.primates.Average_dNd}\hlopt{<}\hlnum{0.1} \hlopt{&}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp}\hlopt{>}\hlnum{0.3}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{cooper.primates.Average_dNd,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name,} \hlkwc{cex}\hlstd{=}\hlnum{0.5}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{cooper.primates.Average_dNd}\hlopt{>}\hlnum{0.7} \hlopt{&}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp}\hlopt{<}\hlnum{0.3}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{cooper.primates.Average_dNd,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name,} \hlkwc{cex}\hlstd{=}\hlnum{0.5}\hlstd{)}
\hlstd{outlier}\hlkwb{<-}\hlstd{tab[tab}\hlopt{$}\hlstd{cooper.primates.Average_dNd}\hlopt{>}\hlnum{0.45} \hlopt{&}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp}\hlopt{<}\hlnum{0.2}\hlstd{,]}
\hlkwd{text}\hlstd{(}\hlkwc{x}\hlstd{=outlier}\hlopt{$}\hlstd{cooper.primates.Average_dNd,}
\hlkwc{y}\hlstd{=outlier}\hlopt{$}\hlstd{dginn.primate_omegaM0Bpp,}
\hlstd{outlier}\hlopt{$}\hlstd{Gene.name,} \hlkwc{cex}\hlstd{=}\hlnum{0.5}\hlstd{)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/omegaM7M8_3-1}
\end{knitrout}
\section{Overlap}
\subsection{Mondrian}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{library}\hlstd{(Mondrian)}
\hlstd{monddata}\hlkwb{<-}\hlkwd{as.data.frame}\hlstd{(tab}\hlopt{$}\hlstd{Gene.name)}
\hlkwd{dim}\hlstd{(monddata)}
\end{alltt}
\begin{verbatim}
## [1] 332 1
\end{verbatim}
\begin{alltt}
\hlstd{dginnfulltmp}\hlkwb{<-}\hlkwd{rowSums}\hlstd{(}\hlkwd{cbind}\hlstd{(tab}\hlopt{$}\hlstd{dginn.primate_BUSTED}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_BppM1M2}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_BppM7M8}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_codemlM1M2}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_codemlM7M8}\hlopt{==}\hlstr{"Y"}\hlstd{))}
\hlstd{monddata}\hlopt{$}\hlstd{primates_young}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(}
\hlstd{tab}\hlopt{$}\hlstd{pVal.M8vsM7}\hlopt{<}\hlnum{0.05}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlstd{monddata}\hlopt{$}\hlstd{primate_cooper}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(}
\hlstd{tab}\hlopt{$}\hlstd{cooper.primates.M7.M8_p_value}\hlopt{<}\hlnum{0.05}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_full}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(}
\hlstd{dginnfulltmp}\hlopt{>=}\hlnum{3}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{mondrian}\hlstd{(}\hlkwd{na.omit}\hlstd{(monddata[,}\hlnum{2}\hlopt{:}\hlnum{4}\hlstd{]),}
\hlkwc{labels}\hlstd{=}\hlkwd{c}\hlstd{(}\hlstr{"Young"}\hlstd{,} \hlstr{"Cooper"}\hlstd{,} \hlstr{"DGINN-full >=3"} \hlstd{))}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/mondrianprimates-1}
\begin{kframe}\begin{alltt}
\hlstd{monddata}\hlopt{$}\hlstd{primates_dginn_full}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(}
\hlstd{dginnfulltmp}\hlopt{>=}\hlnum{4}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{mondrian}\hlstd{(}\hlkwd{na.omit}\hlstd{(monddata[,}\hlnum{2}\hlopt{:}\hlnum{4}\hlstd{]),}
\hlkwc{labels}\hlstd{=}\hlkwd{c}\hlstd{(}\hlstr{"Young"}\hlstd{,} \hlstr{"Cooper"}\hlstd{,} \hlstr{"DGINN-full >=4"}\hlstd{))}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/mondrianprimates-2}
\end{knitrout}
\subsection{subsetR}
Just another representation of the same result.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{library}\hlstd{(UpSetR)}
\hlstd{upsetdata}\hlkwb{<-}\hlkwd{as.data.frame}\hlstd{(tab}\hlopt{$}\hlstd{Gene.name)}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_young}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(tab}\hlopt{$}\hlstd{pVal.M8vsM7}\hlopt{<}\hlnum{0.05}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlstd{upsetdata}\hlopt{$}\hlstd{primate_cooper}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(}
\hlstd{tab}\hlopt{$}\hlstd{cooper.primates.M7.M8_p_value}\hlopt{<}\hlnum{0.05}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_full}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnfulltmp}\hlopt{>=}\hlnum{3}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{upset}\hlstd{(}\hlkwd{na.omit}\hlstd{(upsetdata),} \hlkwc{nsets} \hlstd{=} \hlnum{3}\hlstd{,} \hlkwc{matrix.color} \hlstd{=} \hlstr{"#DC267F"}\hlstd{,}
\hlkwc{main.bar.color} \hlstd{=} \hlstr{"#648FFF"}\hlstd{,} \hlkwc{sets.bar.color} \hlstd{=} \hlstr{"#FE6100"}\hlstd{)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/subsetprimates-1}
\begin{kframe}\begin{alltt}
\hlcom{###}
\hlstd{upsetdata}\hlopt{$}\hlstd{primates_dginn_full}\hlkwb{<-}\hlkwd{ifelse}\hlstd{(dginnfulltmp}\hlopt{>=}\hlnum{4}\hlstd{,} \hlnum{1}\hlstd{,}\hlnum{0}\hlstd{)}
\hlkwd{upset}\hlstd{(}\hlkwd{na.omit}\hlstd{(upsetdata),} \hlkwc{nsets} \hlstd{=} \hlnum{3}\hlstd{,} \hlkwc{matrix.color} \hlstd{=} \hlstr{"#DC267F"}\hlstd{,}
\hlkwc{main.bar.color} \hlstd{=} \hlstr{"#648FFF"}\hlstd{,} \hlkwc{sets.bar.color} \hlstd{=} \hlstr{"#FE6100"}\hlstd{)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/subsetprimates-2}
\end{knitrout}
\section{Gene List}
Genes under positive selection for at least 4 methods.
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{dginnfulltmp}\hlkwb{<-}\hlkwd{rowSums}\hlstd{(}\hlkwd{cbind}\hlstd{(tab}\hlopt{$}\hlstd{dginn.primate_BUSTED}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_BppM1M2}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_BppM7M8}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_codemlM1M2}\hlopt{==}\hlstr{"Y"}\hlstd{,}
\hlstd{tab}\hlopt{$}\hlstd{dginn.primate_codemlM7M8}\hlopt{==}\hlstr{"Y"}\hlstd{))}
\hlstd{tab}\hlopt{$}\hlstd{Gene.name[dginnfulltmp}\hlopt{>=}\hlnum{4} \hlopt{&} \hlkwd{is.na}\hlstd{(dginnfulltmp)}\hlopt{==}\hlstd{F]}
\end{alltt}
\begin{verbatim}
## [1] "ACADM" "BCS1L" "BRD4" "CDK5RAP2" "CEP135"
## [6] "CEP68" "CLIP4" "DNMT1" "DPH5" "EMC1"
## [11] "ERO1LB" "FYCO1" "GCC2" "GGH" "GHITM"
## [16] "GIGYF2" "GLA" "GOLGA7" "HECTD1" "IDE"
## [21] "ITGB1" "LARP1" "LARP4B" "LMAN2" "MARK1"
## [26] "MIPOL1" "MPHOSPH10" "MYCBP2" "NDUFAF2" "NDUFB9"
## [31] "NUPL1" "PCNT" "POLA1" "PRIM2" "PRKAR2A"
## [36] "PVR" "REEP6" "RIPK1" "SAAL1" "SEPSECS"
## [41] "SIRT5" "SLC25A21" "SLC27A2" "TMEM39B" "TOR1AIP1"
## [46] "TUBGCP2" "UBAP2" "UGGT2" "VPS39" "ZNF318"
\end{verbatim}
\begin{alltt}
\hlstd{tab}\hlopt{$}\hlstd{Gene.name[dginnfulltmp}\hlopt{>=}\hlnum{3} \hlopt{&} \hlkwd{is.na}\hlstd{(dginnfulltmp)}\hlopt{==}\hlstd{F]}
\end{alltt}
\begin{verbatim}
## [1] "ACADM" "ADAM9" "AP2A2" "ATE1" "BCS1L"
## [6] "BRD4" "BZW2" "CDK5RAP2" "CEP135" "CEP68"
## [11] "CLIP4" "CNTRL" "DNMT1" "DPH5" "EDEM3"
## [16] "EIF4E2" "EMC1" "ERO1LB" "EXOSC2" "FYCO1"
## [21] "GCC2" "GGH" "GHITM" "GIGYF2" "GLA"
## [26] "GOLGA7" "GOLGB1" "GORASP1" "HDAC2" "HECTD1"
## [31] "HS6ST2" "IDE" "ITGB1" "LARP1" "LARP4B"
## [36] "LARP7" "LMAN2" "MARK1" "MDN1" "MIPOL1"
## [41] "MOV10" "MPHOSPH10" "MRPS5" "MYCBP2" "NAT14"
## [46] "NDUFAF2" "NDUFB9" "NGLY1" "NPC2" "NUPL1"
## [51] "PCNT" "PITRM1" "PLAT" "PLOD2" "PMPCB"
## [56] "POLA1" "POR" "PRIM2" "PRKAR2A" "PTBP2"
## [61] "PVR" "RAB14" "RAB1A" "RAB2A" "RAP1GDS1"
## [66] "RBX1" "REEP6" "RIPK1" "RPL36" "SAAL1"
## [71] "SCCPDH" "SEPSECS" "SIRT5" "SLC25A21" "SLC27A2"
## [76] "STOM" "TIMM8B" "TMEM39B" "TOR1AIP1" "TRIM59"
## [81] "TRMT1" "TUBGCP2" "UBAP2" "UGGT2" "USP54"
## [86] "VPS39" "ZNF318"
\end{verbatim}
\begin{alltt}
\hlstd{tmp}\hlkwb{<-}\hlstd{tab[dginnfulltmp}\hlopt{>=}\hlnum{4} \hlopt{&} \hlkwd{is.na}\hlstd{(dginnfulltmp)}\hlopt{==}\hlstd{F,}
\hlkwd{c}\hlstd{(}\hlstr{"Gene.name"}\hlstd{,}\hlstr{"dginn.primate_BUSTED"}\hlstd{,} \hlstr{"dginn.primate_BppM1M2"}\hlstd{,}
\hlstr{"dginn.primate_BppM7M8"}\hlstd{,}\hlstr{"dginn.primate_codemlM1M2"}\hlstd{,}\hlstr{"dginn.primate_codemlM7M8"}\hlstd{)]}
\hlkwd{write.table}\hlstd{(tmp,} \hlstr{"geneList_DGINN_full_primate_pos4.txt"}\hlstd{,} \hlkwc{row.names}\hlstd{=F,} \hlkwc{quote}\hlstd{=F)}
\end{alltt}
\end{kframe}
\end{knitrout}
\section{Shiny like}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{makeFig1} \hlkwb{<-} \hlkwa{function}\hlstd{(}\hlkwc{df}\hlstd{)\{}
\hlcom{# prepare data for colors etc}
\hlstd{colMethods} \hlkwb{<-} \hlkwd{c}\hlstd{(}\hlstr{"deepskyblue4"}\hlstd{,} \hlstr{"darkorange"} \hlstd{,} \hlstr{"deepskyblue3"} \hlstd{,} \hlstr{"mediumseagreen"} \hlstd{,} \hlstr{"yellow3"} \hlstd{,} \hlstr{"black"}\hlstd{)}
\hlstd{nameMethods} \hlkwb{<-} \hlkwd{c}\hlstd{(}\hlstr{"BUSTED"}\hlstd{,} \hlstr{"BppM1M2"}\hlstd{,} \hlstr{"BppM7M8"}\hlstd{,} \hlstr{"codemlM1M2"}\hlstd{,} \hlstr{"codemlM7M8"}\hlstd{,} \hlstr{"MEME"}\hlstd{)}
\hlstd{metColor} \hlkwb{<-} \hlkwd{data.frame}\hlstd{(}\hlkwc{Name} \hlstd{= nameMethods ,} \hlkwc{Col} \hlstd{= colMethods ,} \hlkwc{stringsAsFactors} \hlstd{=} \hlnum{FALSE}\hlstd{)}
\hlcom{# subset for this specific figure}
\hlcom{#df <- df[df$nbY >= 1, ] # to drop genes found by 0 methods (big datasets)}
\hlstd{xt} \hlkwb{<-} \hlstd{df[,} \hlkwd{c}\hlstd{(}\hlstr{"BUSTED"}\hlstd{,} \hlstr{"BppM1M2"}\hlstd{,} \hlstr{"BppM7M8"}\hlstd{,} \hlstr{"codemlM1M2"}\hlstd{,} \hlstr{"codemlM7M8"}\hlstd{)]}
\hlstd{xt}\hlopt{$}\hlstd{Gene} \hlkwb{<-} \hlstd{df}\hlopt{$}\hlstd{Gene}
\hlstd{nbrMeth} \hlkwb{<-} \hlnum{5}
\hlcom{# reverse order of dataframe so that genes with the most Y are at the bottom (to be on top of the barplot)}
\hlstd{xt[,}\hlnum{1}\hlopt{:}\hlnum{5}\hlstd{]} \hlkwb{<-} \hlkwd{ifelse}\hlstd{(xt[,}\hlnum{1}\hlopt{:}\hlnum{5}\hlstd{]} \hlopt{==} \hlstr{"Y"}\hlstd{,} \hlnum{1}\hlstd{,} \hlnum{0}\hlstd{)}
\hlcom{# sort and Filter the 0 lines}
\hlstd{xt}\hlkwb{<-}\hlstd{xt[}\hlkwd{order}\hlstd{(}\hlkwd{rowSums}\hlstd{(xt[,}\hlnum{1}\hlopt{:}\hlnum{5}\hlstd{])),]}
\hlstd{xt}\hlkwb{<-}\hlstd{xt[}\hlkwd{rowSums}\hlstd{(xt[,}\hlnum{1}\hlopt{:}\hlnum{5}\hlstd{])}\hlopt{>}\hlnum{2}\hlstd{,]}
\hlkwd{row.names}\hlstd{(xt)}\hlkwb{<-}\hlstd{xt}\hlopt{$}\hlstd{Gene}
\hlstd{xt}\hlkwb{<-}\hlstd{xt[,}\hlnum{1}\hlopt{:}\hlnum{5}\hlstd{]}
\hlstd{colFig1} \hlkwb{<-} \hlstd{metColor[}\hlkwd{which}\hlstd{(metColor}\hlopt{$}\hlstd{Name} \hlopt{%in%} \hlkwd{colnames}\hlstd{(xt)) , ]}
\hlcom{##### PART 1 : NUMBER OF METHODS}
\hlkwd{par}\hlstd{(}\hlkwc{xpd} \hlstd{=} \hlnum{NA} \hlstd{,} \hlkwc{mar}\hlstd{=}\hlkwd{c}\hlstd{(}\hlnum{2}\hlstd{,}\hlnum{7}\hlstd{,}\hlnum{4}\hlstd{,}\hlnum{0}\hlstd{) ,} \hlkwc{oma} \hlstd{=} \hlkwd{c}\hlstd{(}\hlnum{0}\hlstd{,}\hlnum{0}\hlstd{,}\hlnum{0}\hlstd{,}\hlnum{0}\hlstd{) ,} \hlkwc{mgp} \hlstd{=} \hlkwd{c}\hlstd{(}\hlnum{3}\hlstd{,}\hlnum{0.3}\hlstd{,}\hlnum{0}\hlstd{))}
\hlstd{h} \hlkwb{=} \hlkwd{barplot}\hlstd{(}
\hlkwd{t}\hlstd{(xt),}
\hlkwc{border} \hlstd{=} \hlnum{NA} \hlstd{,}
\hlkwc{axes} \hlstd{= F ,}
\hlkwc{col} \hlstd{=} \hlkwd{adjustcolor}\hlstd{(colFig1}\hlopt{$}\hlstd{Col,} \hlkwc{alpha.f} \hlstd{=} \hlnum{1}\hlstd{),}
\hlkwc{horiz} \hlstd{= T ,}
\hlkwc{las} \hlstd{=} \hlnum{2} \hlstd{,}
\hlkwc{main} \hlstd{=} \hlstr{"Methods detecting positive selection"} \hlstd{,}
\hlkwc{cex.main} \hlstd{=} \hlnum{0.85}\hlstd{,}
\hlkwc{cex.names} \hlstd{=} \hlkwd{min}\hlstd{(}\hlnum{50}\hlopt{/}\hlkwd{nrow}\hlstd{(xt),} \hlnum{1.5}\hlstd{)}
\hlstd{)}
\hlkwd{axis}\hlstd{(}\hlnum{3}\hlstd{,} \hlkwc{line} \hlstd{=} \hlnum{0}\hlstd{,} \hlkwc{at} \hlstd{=} \hlkwd{c}\hlstd{(}\hlnum{0}\hlopt{:}\hlstd{nbrMeth),} \hlkwc{label} \hlstd{=} \hlkwd{c}\hlstd{(}\hlstr{"0"}\hlstd{,} \hlkwd{rep}\hlstd{(}\hlstr{""}\hlstd{, nbrMeth} \hlopt{-}\hlnum{1}\hlstd{), nbrMeth),} \hlkwc{tck} \hlstd{=} \hlnum{0.02}\hlstd{)}
\hlkwd{legend}\hlstd{(}\hlstr{"bottomleft"}\hlstd{,}
\hlkwc{horiz} \hlstd{= T,}
\hlkwc{border} \hlstd{= colFig1}\hlopt{$}\hlstd{Col,}
\hlkwc{legend} \hlstd{= colFig1}\hlopt{$}\hlstd{Name,}
\hlkwc{fill} \hlstd{= colFig1}\hlopt{$}\hlstd{Col,}
\hlkwc{cex} \hlstd{=} \hlnum{0.8}\hlstd{,}
\hlkwc{bty} \hlstd{=} \hlstr{"n"}\hlstd{,}
\hlkwc{xpd} \hlstd{=} \hlnum{NA}
\hlstd{)}
\hlstd{\}}
\end{alltt}
\end{kframe}
\end{knitrout}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlkwd{source}\hlstd{(}\hlstr{"covid_comp_shiny.R"}\hlstd{)}
\hlstd{df}\hlkwb{<-}\hlkwd{read.delim}\hlstd{(}\hlkwd{paste0}\hlstd{(workdir,}
\hlstr{"/data/DGINN_202005281649summary_cleaned.csv"}\hlstd{),}
\hlkwc{fill}\hlstd{=T,} \hlkwc{h}\hlstd{=T,} \hlkwc{sep}\hlstd{=}\hlstr{","}\hlstd{)}
\hlkwd{names}\hlstd{(df)}
\end{alltt}
\begin{verbatim}
## [1] "File" "Name" "Gene"
## [4] "GeneSize" "NbSpecies" "omegaM0Bpp"
## [7] "omegaM0codeml" "BUSTED" "BUSTED.p.value"
## [10] "MEME.NbSites" "MEME.PSS" "BppM1M2"
## [13] "BppM1M2.p.value" "BppM1M2.NbSites" "BppM1M2.PSS"
## [16] "BppM7M8" "BppM7M8.p.value" "BppM7M8.NbSites"
## [19] "BppM7M8.PSS" "codemlM1M2" "codemlM1M2.p.value"
## [22] "codemlM1M2.NbSites" "codemlM1M2.PSS" "codemlM7M8"
## [25] "codemlM7M8.p.value" "codemlM7M8.NbSites" "codemlM7M8.PSS"
\end{verbatim}
\begin{alltt}
\hlstd{dftmp}\hlkwb{<-}\hlstd{tab[,}\hlkwd{c}\hlstd{(}\hlstr{"File"}\hlstd{,} \hlstr{"Name"}\hlstd{,} \hlstr{"Gene.name"}\hlstd{,}
\hlstr{"GeneSize"}\hlstd{,} \hlstr{"dginn.primate_NbSpecies"}\hlstd{,} \hlstr{"dginn.primate_omegaM0Bpp"}\hlstd{,}
\hlstr{"dginn.primate_omegaM0codeml"}\hlstd{,} \hlstr{"dginn.primate_BUSTED"}\hlstd{,} \hlstr{"dginn.primate_BUSTED.p.value"}\hlstd{,}
\hlstr{"dginn.primate_MEME.NbSites"}\hlstd{,} \hlstr{"dginn.primate_MEME.PSS"}\hlstd{,} \hlstr{"dginn.primate_BppM1M2"}\hlstd{,}
\hlstr{"dginn.primate_BppM1M2.p.value"}\hlstd{,} \hlstr{"dginn.primate_BppM1M2.NbSites"}\hlstd{,} \hlstr{"dginn.primate_BppM1M2.PSS"}\hlstd{,}
\hlstr{"dginn.primate_BppM7M8"}\hlstd{,} \hlstr{"dginn.primate_BppM7M8.p.value"}\hlstd{,} \hlstr{"dginn.primate_BppM7M8.NbSites"}\hlstd{,}
\hlstr{"dginn.primate_BppM7M8.PSS"}\hlstd{,} \hlstr{"dginn.primate_codemlM1M2"}\hlstd{,} \hlstr{"dginn.primate_codemlM1M2.p.value"}\hlstd{,}
\hlstr{"dginn.primate_codemlM1M2.NbSites"}\hlstd{,}\hlstr{"dginn.primate_codemlM1M2.PSS"}\hlstd{,} \hlstr{"dginn.primate_codemlM7M8"}\hlstd{,}
\hlstr{"dginn.primate_codemlM7M8.p.value"}\hlstd{,} \hlstr{"dginn.primate_codemlM7M8.NbSites"} \hlstd{,} \hlstr{"dginn.primate_codemlM7M8.PSS"}\hlstd{)]}
\hlkwd{names}\hlstd{(dftmp)}\hlkwb{<-}\hlkwd{names}\hlstd{(df)}
\hlkwd{makeFig1}\hlstd{(dftmp)}
\end{alltt}
\end{kframe}
\includegraphics[width=\maxwidth]{figure/unnamed-chunk-3-1}
\end{knitrout}
\end{document}