TP_experimental_biologists.md: update with new nf path

978f7576 · Laurent Modolo · 35e3593b · 978f7576
Unverified Commit 978f7576 authored 6 years ago by Laurent Modolo
--- a/doc/TP_experimental_biologists.md
+++ b/doc/TP_experimental_biologists.md
@@ -42,11 +42,12 @@ You are now on the main page of your fork of the [pipelines/nextflow](https://gi

 The [README.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/README.md) file contains instructions to run your pipeline and test its installation.

-The [CONTRIBUTING.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/CONTRIBUTING.md) file contains guidelines to follow if you want to contribute to the [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) (making a merge request for example).
+The [CONTRIBUTING.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/CONTRIBUTING.md) file contains guidelines if you want to contribute to the [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) (making a merge request for example).

 The [data](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/tree/master/data) folder will be the place where you store the raw data for your analysis.
 The [results](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/tree/master/results) folder will be the place where you store the results of your analysis.
-Note that the content of these two folders should never be saved on git.
+
+> Note that the content of `data` and `results` folders should never be saved on git.

 The [doc](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/tree/master/doc) folder contains the documentation of this practical course.

@@ -94,7 +95,7 @@ The [README.md](https://gitlab.biologie.ens-lyon.fr/PSMN/modules/blob/master/REA

 ### `nf_modules`

-The `src/nf_modules` folder contains templates of [nextflow](https://www.nextflow.io/) wrappers for the tools available in [Docker](https://www.docker.com/what-docker) and [SGE](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:sge). The details of the [nextflow](https://www.nextflow.io/) wrapper will be presented in the next section. Alongside the `.nf` and `.config` there is a `tests` folder that contains a `tests.sh` script to run test on the tool.
+The `src/nf_modules` folder contains templates of [nextflow](https://www.nextflow.io/) wrappers for the tools available in [Docker](https://www.docker.com/what-docker) and [SGE](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:sge). The details of the [nextflow](https://www.nextflow.io/) wrapper will be presented in the next section. Alongside the `.nf` and `.config` files, there is a `tests.sh` script to run test on the tool.

 # Nextflow pipeline

@@ -152,7 +153,7 @@ Channel
  .set { fasta_file }
 ```

-Here we defined the channel `fasta_file` that is going to send every fasta file from the folder `data/fasta/` into the process that take it as input.
+Here we defined the channel `fasta_file` that is going to send every fasta file from the folder `data/tiny_dataset/fasta/` into the process that take it as input.

 Add the definition of the channel to the `src/fasta_sampler.nf` file and commit to your repository.

@@ -164,7 +165,8 @@ After writing this first pipeline, you may want to test it. To do that, first cl
 You can then run the following commands to download your project on your computer:

 ```sh
-git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/<usr_name>/nextflow.git
+git config --global http.sslVerify false
+git clone https://gitlab.biologie.ens-lyon.fr/<usr_name>/nextflow.git
 cd nextflow
 src/install_nextflow.sh
 ```
@@ -173,7 +175,7 @@ We also need data to run our pipeline:

 ```
 cd data
-git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git
+git clone https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git
 cd ..
 ```

@@ -246,7 +248,7 @@ For this practical, we are going to need the following tools:

 To initialize these tools, follow the **Installing** section of the [README.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/README.md) file.

-If you are using a CBP computer don’t forget to clean up your docker containers at the end of the practical with the following command:
+**If you are using a CBP computer don’t forget to clean up your docker containers at the end of the practical with the following command:**

 ```sh
 docker rm $(docker stop $(docker ps -aq))
@@ -255,9 +257,9 @@ docker rmi $(docker images -qf "dangling=true")

 ## Cutadapt

-The first step of the pipeline is to remove any Illumina adaptor left in your read files.
+The first step of the pipeline is to remove any Illumina adaptors left in your read files.

-Open the WebIDE and create a `src/RNASeq.nf` file. Browse for [src/nf_modules/cutadapt/cutadapt.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/cutadapt/cutadapt.nf), this file contains examples for cutadapt. We are interested in the *Illumina adaptor removal*, *for paired-end data* section of the code. Copy this code in your pipeline and commit.
+Open the WebIDE and create a `src/RNASeq.nf` file. Browse for [src/nf_modules/cutadapt/adaptor_removal_paired.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/cutadapt/adaptor_removal_paired.nf), this file contains examples for cutadapt. We are interested in the *Illumina adaptor removal*, *for paired-end data* section of the code. Copy this code in your pipeline and commit.

 Compared to before, we have few new lines:

@@ -293,7 +295,7 @@ For the `fastq_sampler.nf` pipeline we used the command `head` present in most b
 - launch the process in a Docker container that has cutadapt installed
 - launch the process with SGE while loading the correct module to have cutadapt available

-We are not going to use the first option which requires no configuration for nextflow but tedious tools installations. Instead, we are going to use existing *wrappers* and tell nextflow about it. This is what the [src/nf_modules/cutadapt/cutadapt.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/cutadapt/cutadapt.config) is used for.
+We are not going to use the first option which requires no configuration for nextflow but tedious tools installations. Instead, we are going to use existing *wrappers* and tell nextflow about it. This is what the [src/nf_modules/cutadapt/adaptor_removal_paired.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/cutadapt/adaptor_removal_paired.config) is used for.

 Copy the content of this config file to an `src/RNASeq.config` file. This file is structured in process blocks. Here we are only interested in configuring `adaptor_removal` process not `trimming` process. So you can remove the `trimming` block and commit.

@@ -308,7 +310,7 @@ You can test your pipeline with the following command:

 The second step of the pipeline is to trim reads by quality.

-Browse for [src/nf_modules/UrQt/urqt.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/UrQt/urqt.nf), this file contains examples for UrQt. We are interested in the *for paired-end data* section of the code. Copy the process section code in your pipeline and commit.
+Browse for [src/nf_modules/UrQt/trimming_paired.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/UrQt/trimming_paired.nf), this file contains examples for UrQt. We are interested in the *for paired-end data* section of the code. Copy the process section code in your pipeline and commit.

 This code won’t work if you try to run it: the `fastq_file` channel is already consumed by the `adaptor_removal` process. In nextflow once a channel is used by a process, it ceases to exist. Moreover, we don’t want to trim the input fastq, we want to trim the fastq that comes from the `adaptor_removal` process.

@@ -326,7 +328,7 @@ set pair_id, file(reads) from fastq_files_cut

 The two processes are now connected by the channel `fastq_files_cut`.

-Add the content of the [src/nf_modules/UrQt/urqt.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/UrQt/urqt.config) file to your `src/RNASeq.config` file and commit.
+Add the content of the [src/nf_modules/UrQt/trimming_paired.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/UrQt/trimming_paired.config) file to your `src/RNASeq.config` file and commit.

 You can test your pipeline.

@@ -334,7 +336,7 @@ You can test your pipeline.

 Kallisto need the sequences of the transcripts that need to be quantified. We are going to extract these sequences from the reference `data/tiny_dataset/fasta/tiny_v2.fasta` with the `bed` annotation `data/tiny_dataset/annot/tiny.bed`.

-You can copy to your `src/RNASeq.nf` file the content of [src/nf_modules/BEDtools/bedtools.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/BEDtools/bedtools.nf) and to your `src/RNASeq.config` file the content of [src/nf_modules/BEDtools/bedtools.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/BEDtools/bedtools.config).
+You can copy to your `src/RNASeq.nf` file the content of [src/nf_modules/BEDtools/fasta_from_bed.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/BEDtools/fasta_from_bed.nf) and to your `src/RNASeq.config` file the content of [src/nf_modules/BEDtools/fasta_from_bed.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/BEDtools/fasta_from_bed.config).

 Commit your work and test your pipeline with the following command:

@@ -346,7 +348,7 @@ Commit your work and test your pipeline with the following command:

 Kallisto run in two steps: the indexation of the reference and the quantification on this index.

-You can copy to your `src/RNASeq.nf` file the relevant content of [src/nf_modules/Kallisto/kallisto.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/kallisto.nf) and to your `src/RNASeq.config` file the content of [src/nf_modules/Kallisto/kallisto.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/kallisto.config).
+You can copy to your `src/RNASeq.nf` file the content of the files [src/nf_modules/Kallisto/indexing.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/indexing.nf) and [src/nf_modules/Kallisto/mapping_paired.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/mapping_paired.nf). You can add to your file `src/RNASeq.config` file the content of the files [src/nf_modules/Kallisto/indexing.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/indexing.config) and [src/nf_modules/Kallisto/mapping_paired.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/mapping_paired.config).

 We are going to work with paired-end so only copy the relevant processes. The `index_fasta` process needs to take as input the output of your `fasta_from_bed` process. The `fastq` input of your `mapping_fastq` process needs to take as input the output of your `index_fasta` process and the `trimming` process.

@@ -384,12 +386,21 @@ ln -s /Xnfs/lbmcdb/common/modules/modulefiles ~/privatemodules
 echo "module use ~/privatemodules" >> .bashrc
 ```

+Create and go to your `scratch` folder:
+
+```sh
+mkdir -p /scratch/<login>
+cd /scratch/<login>
+echo "module use ~/privatemodules" >> .bashrc
+```
+
 Then you need to clone your pipeline and get the data:

 ```sh
-git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/lmodolo/nextflow.git
+git config --global http.sslVerify false
+git clone https://gitlab.biologie.ens-lyon.fr/lmodolo/nextflow.git
 cd nextflow/data
-git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git
+git clone https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git
 cd ..
 ```

@@ -400,13 +411,13 @@ As we don’t want nextflow to be killed in case of disconnection, we start by l
 ```sh
 tmux
 module load nextflow/0.28.2
-nextflow src/RNASeq.nf -c src/RNASeq.config -profile sge --fastq "data/tiny_dataset/fastq/*_R{1,2}.fastq" --fasta "data/tiny_dataset/fasta/tiny_v2.fasta" --bed "data/tiny_dataset/annot/tiny.bed"
+nextflow src/RNASeq.nf -c src/RNASeq.config -profile sge --fastq "data/tiny_dataset/fastq/*_R{1,2}.fastq" --fasta "data/tiny_dataset/fasta/tiny_v2.fasta" --bed "data/tiny_dataset/annot/tiny.bed" -w /scratch/<login>
 ```

 To use the scratch for nextflow computations add the option :

 ```sh
-w /scratch/login
+-w /scratch/<login>
 ```

 You just ran your pipeline on the PSMN!