Commit 3afc5354 authored by mcariou's avatar mcariou
Browse files

update README

parent 78503659
......@@ -165,36 +165,35 @@ USAGE: ./4_parse_PSIblast.sh $1=PSIblast_out $2=tableau d'assemblage
For each query, the script extract the PSIblast hits, select the best hit for each species and write the corresponding nucleotidique sequence.
Nucleotidique sequences or extracted from the nucleotidique database using blastdbcmd (blast output correspond to proteic databases, but sequence identifications are the same.
## 5. Align selected sequences and make phylogeny
```
~/script/5_aln_phy.sh ~/fasta/ lp0952_ortho
```
run prank, trimal and phyml on the fasta output from step 4.
### 4.3. How to Taxonomizer?
A prerequisite for this step is to have install and prepare a Taxonomizer database.
## 6. To do
```
install.packages("taxonomizr")
library(taxonomizr)
prepareDatabase("/home/mcariou/2021_legio/accessionTaxa.sql")
```
The path to accessionTaxa.sql needs then to be provided to the 4th script.
22/11/2021
- attention le script R 4 ne marche que sur le cluster,
Il y a des chemin en dur pour l'appel à taxonomizr + à la base de données.
## 5. Align selected sequences and make phylogeny
###
```
~/script/5_cat_aln_phy.sh ~/fasta/78Lp/
```
concatenate alignments from all genes, trimal and phyml on the fasta output from step 4.
Not sure if this works for unique genes.
- Adapt the pipeline to use more than one gene, make concatenates, and format presence/absence for all genes.
=> problème, ne permet pas de conserver plusieurs souches de pneumophila à l'heure actuelle
## 6. To do
=> make concatenate.
- problème, ne permet pas de conserver plusieurs souches de pneumophila à l'heure actuelle
- Integration dans Nextflow
=> make the pipeline for the phylogeny.
=> include possible branching
=> beautiful output
- beautiful output
- Revoir le script qui génère la banque proteic pour le rendre utilisable plus généralement.
......
#!/bin/bash
echo "USAGE: ./5_aln_phy.sh \$1=fasta_path \$2=fasta_names(wo extension)"
echo "--------------------------------------------------"
##################################################################################################################
### En local
#./5_aln_phy.sh ~/Documents/CIRI_BIBS_projects/2021_04_Doublet/pipeline/fasta/ lp0952_ortho
### PSMN
#./
##################################################################################################################
# variable
OUT=$1
FASTA=$1/$2.fasta
ALN=$1/$2
PHYLIP=$1/$2_aln.phylip
if [ -e $FASTA ] ; then
echo $FASTA" exists"
#prank -d=$FASTA +F -o=$ALN
echo $ALN
## Convert to phylipX
trimal -in $ALN.best.fas -out $PHYLIP -phylip
## phylogeny
phyml -i $PHYLIP -d nt -m HKY85 -a e -c 4 -s NNI -b -1
else
echo $FASTA "do not exists. incorrect input"
fi ;
# fin
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment