Commit 07473fe9 authored by mcariou's avatar mcariou
Browse files

update README

parent 321a98f2
......@@ -10,6 +10,8 @@ Create a phylogenetic pipeline for legionella genes.
### 1.1. Outils
-[emboss](http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/transeq.html) for translation of CDS to make the proteic blast database
-blast et cie
-efetch: eg sur ubuntu
......@@ -19,12 +21,12 @@ sudo apt-get install -y efetch
sudo apt install ncbi-entrez-direct
```
-aligner (prank?)
-aligner ([prank](http://wasabiapp.org/software/prank/)?)
```
sudo apt install prank
```
-trimal
-[trimal](http://wasabiapp.org/software/prank/)
```
git clone https://github.com/scapella/trimal.git
cd trimal/source
......@@ -32,7 +34,7 @@ make
cp trimal /home/adminmarie/bin/
```
-phyml
-[phyml](https://github.com/stephaneguindon/phyml)
```
git clone https://github.com/stephaneguindon/phyml.git
cd phyml
......@@ -106,6 +108,8 @@ This is a proteic blast, thus the nucleotidic sequences are extracted using blas
## 4. BLAST table parsing
### 4.1. Script générique
Parser les résultats :
ajouter espèce (org)
......@@ -144,6 +148,22 @@ This script take as input a blastn output and filter criteria, get species names
*Instead of keeping randomly **maxseqpertax** sequences chosen randomly, it would be better to keep non redondant sequences, for example with [treemmer](https://github.com/fmenardo/Treemmer)*
### 4.2. PSI BLAST and reference phylogeny
```
~/script/4_parse_PSIblast.sh ~/2021_legio/out_blastn/78Lp_uniprot.psiblast \
~/2021_legio/phylolegio/doc/tabAss.txt
~/2021_legio/fasta/78Lp
~/2021_legio/genes/78Lp_uniprot.fasta
0.0001 0.3 0.5 3
USAGE: ./4_parse_PSIblast.sh $1=PSIblast_out $2=tableau d'assemblage
$3=fasta_out_repertory $4=fasta_of_query
$5=seuil_evalue $6=percid $7=percoverlap
$8=maxseqpertax
```
For each query, the script extract the PSIblast hits, select the best hit for each species and write the corresponding nucleotidique sequence.
Nucleotidique sequences or extracted from the nucleotidique database using blastdbcmd (blast output correspond to proteic databases, but sequence identifications are the same.
## 5. Align selected sequences and make phylogeny
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment