For detailed summary of the analysis, see README.md file.
For detailed summary of the analysis, see \textit{README.md} file.
~
Briefly, illumina paired-end reads from 2 Ebola strains, Ebola-800 and Ebola-911 were analysed.
The aim of this study is to identify all differences between the sequences of 800 and 911 compared to the reference sequences.
For this, 2 approaches are possible, first, to assemble genomes de novo and the compare genome sequences to reference genome sequence, second, to align reads against the reference genomes and use software which identify variant directly from the read alignement.
After quality check and read cleaning by Trimmomatic, I 1/ assembled both genome de novo using SPADES 2/ mapped read against a reference genome (Bowtie) and called variants using 2 softwares (bcftools and Freebayes) 3/ Reassembled reads using only mapped reads (SPADES).
~
The second assembly using only reads that mapped against the reference genome was of higher quality. I will use this one in this analysis.
~
The aim of this report is to identify and compare the variations found using these different approaches.
\section{Comparison of genome assemblies with reference genome}
Sequences from the reference genome and the 2 SPADES assembly (2nd version, mapped reads) were manually aligned. See report on the GP gene Ebola\_assembly.pdf.
\item Print Variable positions between ref, Ebola-800 and Ebola-911:
\end{itemize}
<<>>=
var
@
Remarque: The alignment step involved an insertion pos 6918, so all the position > 6918 correspond to a position +1 in the alignemnt compared to the reference sequence alone.
Remarque: The alignment step involved an insertion pos 6918, so all the position $>$6918 correspond to a position +1 in the alignement compared to the reference sequence alone.
\section{Comparison of SNP calling with reference genome}
I will now analyse the variant identified using the reads directly aligned against the genome. To see if we recover the same positions?
Variants identified using the reads directly aligned against the genome will be analysed, to see if we recover the same positions.
\subsection{bcftool}
Thee first program used for this variant calling is bcftool.
For detailed summary of the analysis, see README.md file.
For detailed summary of the analysis, see \textit{README.md} file.
~
Briefly, illumina paired-end reads from 2 Ebola strains, Ebola-800 and Ebola-911 were analysed.
The aim of this study is to identify all differences between the sequences of 800 and 911 compared to the reference sequences.
For this, 2 approaches are possible, first, to assemble genomes de novo and the compare genome sequences to reference genome sequence, second, to align reads against the reference genomes and use software which identify variant directly from the read alignement.
After quality check and read cleaning by Trimmomatic, I 1/ assembled both genome de novo using SPADES 2/ mapped read against a reference genome (Bowtie) and called variants using 2 softwares (bcftools and Freebayes) 3/ Reassembled reads using only mapped reads (SPADES).
~
The second assembly using only reads that mapped against the reference genome was of higher quality. I will use this one in this analysis.
~
The aim of this report is to identify and compare the variations found using these different approaches.
\section{Comparison of genome assemblies with reference genome}
Sequences from the reference genome and the 2 SPADES assembly (2nd version, mapped reads) were manually aligned. See report on the GP gene Ebola\_assembly.pdf.
## ref "g" "c" "a" "t" "c" "c" "t" "t" "g" "a" "c" "c" "-" "a" "c" "g" "t" "c" "g" "g" "c"
## ebola800 "t" "t" "g" "c" "t" "t" "t" "t" "a" "c" "a" "t" "a" "a" "t" "a" "c" "t" "a" "a" "t"
## ebola911 "g" "c" "a" "c" "t" "t" "c" "c" "a" "c" "a" "t" "-" "c" "t" "a" "c" "t" "a" "a" "c"
## 444548928185220402410241125512762618261856918
## ref "g" "c" "a" "t" "c" "c" "t" "t" "g" "a" "c" "c" "-"
## ebola800 "t" "t" "g" "c" "t" "t" "t" "t" "a" "c" "a" "t" "a"
## ebola911 "g" "c" "a" "c" "t" "t" "c" "c" "a" "c" "a" "t" "-"
## 69499555105581078510905141881813918454
## ref "a" "c" "g" "t" "c" "g" "g" "c"
## ebola800 "a" "t" "a" "c" "t" "a" "a" "t"
## ebola911 "c" "t" "a" "c" "t" "a" "a" "c"
\end{verbatim}
\end{kframe}
\end{knitrout}
Remarque: The alignment step involved an insertion pos 6918, so all the position > 6918 correspond to a position +1 in the alignemnt compared to the reference sequence alone.
Remarque: The alignment step involved an insertion pos 6918, so all the position $>$6918 correspond to a position +1 in the alignement compared to the reference sequence alone.
\section{Comparison of SNP calling with reference genome}
I will now analyse the variant identified using the reads directly aligned against the genome. To see if we recover the same positions?
Variants identified using the reads directly aligned against the genome will be analysed, to see if we recover the same positions.
\subsection{bcftool}
Thee first program used for this variant calling is bcftool.