From 978f757690627090436143479bc5a52bda75c810 Mon Sep 17 00:00:00 2001 From: Laurent Modolo <laurent@modolo.fr> Date: Tue, 23 Oct 2018 15:44:32 +0200 Subject: [PATCH] TP_experimental_biologists.md: update with new nf path --- doc/TP_experimental_biologists.md | 47 +++++++++++++++++++------------ 1 file changed, 29 insertions(+), 18 deletions(-) diff --git a/doc/TP_experimental_biologists.md b/doc/TP_experimental_biologists.md index 8fddce85..4c8e39f2 100644 --- a/doc/TP_experimental_biologists.md +++ b/doc/TP_experimental_biologists.md @@ -42,11 +42,12 @@ You are now on the main page of your fork of the [pipelines/nextflow](https://gi The [README.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/README.md) file contains instructions to run your pipeline and test its installation. -The [CONTRIBUTING.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/CONTRIBUTING.md) file contains guidelines to follow if you want to contribute to the [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) (making a merge request for example). +The [CONTRIBUTING.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/CONTRIBUTING.md) file contains guidelines if you want to contribute to the [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) (making a merge request for example). The [data](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/tree/master/data) folder will be the place where you store the raw data for your analysis. The [results](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/tree/master/results) folder will be the place where you store the results of your analysis. -Note that the content of these two folders should never be saved on git. + +> Note that the content of `data` and `results` folders should never be saved on git. The [doc](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/tree/master/doc) folder contains the documentation of this practical course. @@ -94,7 +95,7 @@ The [README.md](https://gitlab.biologie.ens-lyon.fr/PSMN/modules/blob/master/REA ### `nf_modules` -The `src/nf_modules` folder contains templates of [nextflow](https://www.nextflow.io/) wrappers for the tools available in [Docker](https://www.docker.com/what-docker) and [SGE](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:sge). The details of the [nextflow](https://www.nextflow.io/) wrapper will be presented in the next section. Alongside the `.nf` and `.config` there is a `tests` folder that contains a `tests.sh` script to run test on the tool. +The `src/nf_modules` folder contains templates of [nextflow](https://www.nextflow.io/) wrappers for the tools available in [Docker](https://www.docker.com/what-docker) and [SGE](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:sge). The details of the [nextflow](https://www.nextflow.io/) wrapper will be presented in the next section. Alongside the `.nf` and `.config` files, there is a `tests.sh` script to run test on the tool. # Nextflow pipeline @@ -152,7 +153,7 @@ Channel .set { fasta_file } ``` -Here we defined the channel `fasta_file` that is going to send every fasta file from the folder `data/fasta/` into the process that take it as input. +Here we defined the channel `fasta_file` that is going to send every fasta file from the folder `data/tiny_dataset/fasta/` into the process that take it as input. Add the definition of the channel to the `src/fasta_sampler.nf` file and commit to your repository. @@ -164,7 +165,8 @@ After writing this first pipeline, you may want to test it. To do that, first cl You can then run the following commands to download your project on your computer: ```sh -git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/<usr_name>/nextflow.git +git config --global http.sslVerify false +git clone https://gitlab.biologie.ens-lyon.fr/<usr_name>/nextflow.git cd nextflow src/install_nextflow.sh ``` @@ -173,7 +175,7 @@ We also need data to run our pipeline: ``` cd data -git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git +git clone https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git cd .. ``` @@ -246,7 +248,7 @@ For this practical, we are going to need the following tools: To initialize these tools, follow the **Installing** section of the [README.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/README.md) file. -If you are using a CBP computer don’t forget to clean up your docker containers at the end of the practical with the following command: +**If you are using a CBP computer don’t forget to clean up your docker containers at the end of the practical with the following command:** ```sh docker rm $(docker stop $(docker ps -aq)) @@ -255,9 +257,9 @@ docker rmi $(docker images -qf "dangling=true") ## Cutadapt -The first step of the pipeline is to remove any Illumina adaptor left in your read files. +The first step of the pipeline is to remove any Illumina adaptors left in your read files. -Open the WebIDE and create a `src/RNASeq.nf` file. Browse for [src/nf_modules/cutadapt/cutadapt.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/cutadapt/cutadapt.nf), this file contains examples for cutadapt. We are interested in the *Illumina adaptor removal*, *for paired-end data* section of the code. Copy this code in your pipeline and commit. +Open the WebIDE and create a `src/RNASeq.nf` file. Browse for [src/nf_modules/cutadapt/adaptor_removal_paired.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/cutadapt/adaptor_removal_paired.nf), this file contains examples for cutadapt. We are interested in the *Illumina adaptor removal*, *for paired-end data* section of the code. Copy this code in your pipeline and commit. Compared to before, we have few new lines: @@ -293,7 +295,7 @@ For the `fastq_sampler.nf` pipeline we used the command `head` present in most b - launch the process in a Docker container that has cutadapt installed - launch the process with SGE while loading the correct module to have cutadapt available -We are not going to use the first option which requires no configuration for nextflow but tedious tools installations. Instead, we are going to use existing *wrappers* and tell nextflow about it. This is what the [src/nf_modules/cutadapt/cutadapt.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/cutadapt/cutadapt.config) is used for. +We are not going to use the first option which requires no configuration for nextflow but tedious tools installations. Instead, we are going to use existing *wrappers* and tell nextflow about it. This is what the [src/nf_modules/cutadapt/adaptor_removal_paired.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/cutadapt/adaptor_removal_paired.config) is used for. Copy the content of this config file to an `src/RNASeq.config` file. This file is structured in process blocks. Here we are only interested in configuring `adaptor_removal` process not `trimming` process. So you can remove the `trimming` block and commit. @@ -308,7 +310,7 @@ You can test your pipeline with the following command: The second step of the pipeline is to trim reads by quality. -Browse for [src/nf_modules/UrQt/urqt.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/UrQt/urqt.nf), this file contains examples for UrQt. We are interested in the *for paired-end data* section of the code. Copy the process section code in your pipeline and commit. +Browse for [src/nf_modules/UrQt/trimming_paired.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/UrQt/trimming_paired.nf), this file contains examples for UrQt. We are interested in the *for paired-end data* section of the code. Copy the process section code in your pipeline and commit. This code won’t work if you try to run it: the `fastq_file` channel is already consumed by the `adaptor_removal` process. In nextflow once a channel is used by a process, it ceases to exist. Moreover, we don’t want to trim the input fastq, we want to trim the fastq that comes from the `adaptor_removal` process. @@ -326,7 +328,7 @@ set pair_id, file(reads) from fastq_files_cut The two processes are now connected by the channel `fastq_files_cut`. -Add the content of the [src/nf_modules/UrQt/urqt.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/UrQt/urqt.config) file to your `src/RNASeq.config` file and commit. +Add the content of the [src/nf_modules/UrQt/trimming_paired.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/UrQt/trimming_paired.config) file to your `src/RNASeq.config` file and commit. You can test your pipeline. @@ -334,7 +336,7 @@ You can test your pipeline. Kallisto need the sequences of the transcripts that need to be quantified. We are going to extract these sequences from the reference `data/tiny_dataset/fasta/tiny_v2.fasta` with the `bed` annotation `data/tiny_dataset/annot/tiny.bed`. -You can copy to your `src/RNASeq.nf` file the content of [src/nf_modules/BEDtools/bedtools.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/BEDtools/bedtools.nf) and to your `src/RNASeq.config` file the content of [src/nf_modules/BEDtools/bedtools.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/BEDtools/bedtools.config). +You can copy to your `src/RNASeq.nf` file the content of [src/nf_modules/BEDtools/fasta_from_bed.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/BEDtools/fasta_from_bed.nf) and to your `src/RNASeq.config` file the content of [src/nf_modules/BEDtools/fasta_from_bed.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/BEDtools/fasta_from_bed.config). Commit your work and test your pipeline with the following command: @@ -346,7 +348,7 @@ Commit your work and test your pipeline with the following command: Kallisto run in two steps: the indexation of the reference and the quantification on this index. -You can copy to your `src/RNASeq.nf` file the relevant content of [src/nf_modules/Kallisto/kallisto.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/kallisto.nf) and to your `src/RNASeq.config` file the content of [src/nf_modules/Kallisto/kallisto.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/kallisto.config). +You can copy to your `src/RNASeq.nf` file the content of the files [src/nf_modules/Kallisto/indexing.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/indexing.nf) and [src/nf_modules/Kallisto/mapping_paired.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/mapping_paired.nf). You can add to your file `src/RNASeq.config` file the content of the files [src/nf_modules/Kallisto/indexing.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/indexing.config) and [src/nf_modules/Kallisto/mapping_paired.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/mapping_paired.config). We are going to work with paired-end so only copy the relevant processes. The `index_fasta` process needs to take as input the output of your `fasta_from_bed` process. The `fastq` input of your `mapping_fastq` process needs to take as input the output of your `index_fasta` process and the `trimming` process. @@ -384,12 +386,21 @@ ln -s /Xnfs/lbmcdb/common/modules/modulefiles ~/privatemodules echo "module use ~/privatemodules" >> .bashrc ``` +Create and go to your `scratch` folder: + +```sh +mkdir -p /scratch/<login> +cd /scratch/<login> +echo "module use ~/privatemodules" >> .bashrc +``` + Then you need to clone your pipeline and get the data: ```sh -git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/lmodolo/nextflow.git +git config --global http.sslVerify false +git clone https://gitlab.biologie.ens-lyon.fr/lmodolo/nextflow.git cd nextflow/data -git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git +git clone https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git cd .. ``` @@ -400,13 +411,13 @@ As we don’t want nextflow to be killed in case of disconnection, we start by l ```sh tmux module load nextflow/0.28.2 -nextflow src/RNASeq.nf -c src/RNASeq.config -profile sge --fastq "data/tiny_dataset/fastq/*_R{1,2}.fastq" --fasta "data/tiny_dataset/fasta/tiny_v2.fasta" --bed "data/tiny_dataset/annot/tiny.bed" +nextflow src/RNASeq.nf -c src/RNASeq.config -profile sge --fastq "data/tiny_dataset/fastq/*_R{1,2}.fastq" --fasta "data/tiny_dataset/fasta/tiny_v2.fasta" --bed "data/tiny_dataset/annot/tiny.bed" -w /scratch/<login> ``` To use the scratch for nextflow computations add the option : ```sh --w /scratch/login +-w /scratch/<login> ``` You just ran your pipeline on the PSMN! -- GitLab