Skip to content
Snippets Groups Projects
Unverified Commit 978f7576 authored by Laurent Modolo's avatar Laurent Modolo
Browse files

TP_experimental_biologists.md: update with new nf path

parent 35e3593b
No related branches found
No related tags found
No related merge requests found
......@@ -42,11 +42,12 @@ You are now on the main page of your fork of the [pipelines/nextflow](https://gi
The [README.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/README.md) file contains instructions to run your pipeline and test its installation.
The [CONTRIBUTING.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/CONTRIBUTING.md) file contains guidelines to follow if you want to contribute to the [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) (making a merge request for example).
The [CONTRIBUTING.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/CONTRIBUTING.md) file contains guidelines if you want to contribute to the [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) (making a merge request for example).
The [data](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/tree/master/data) folder will be the place where you store the raw data for your analysis.
The [results](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/tree/master/results) folder will be the place where you store the results of your analysis.
Note that the content of these two folders should never be saved on git.
> Note that the content of `data` and `results` folders should never be saved on git.
The [doc](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/tree/master/doc) folder contains the documentation of this practical course.
......@@ -94,7 +95,7 @@ The [README.md](https://gitlab.biologie.ens-lyon.fr/PSMN/modules/blob/master/REA
### `nf_modules`
The `src/nf_modules` folder contains templates of [nextflow](https://www.nextflow.io/) wrappers for the tools available in [Docker](https://www.docker.com/what-docker) and [SGE](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:sge). The details of the [nextflow](https://www.nextflow.io/) wrapper will be presented in the next section. Alongside the `.nf` and `.config` there is a `tests` folder that contains a `tests.sh` script to run test on the tool.
The `src/nf_modules` folder contains templates of [nextflow](https://www.nextflow.io/) wrappers for the tools available in [Docker](https://www.docker.com/what-docker) and [SGE](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:sge). The details of the [nextflow](https://www.nextflow.io/) wrapper will be presented in the next section. Alongside the `.nf` and `.config` files, there is a `tests.sh` script to run test on the tool.
# Nextflow pipeline
......@@ -152,7 +153,7 @@ Channel
.set { fasta_file }
```
Here we defined the channel `fasta_file` that is going to send every fasta file from the folder `data/fasta/` into the process that take it as input.
Here we defined the channel `fasta_file` that is going to send every fasta file from the folder `data/tiny_dataset/fasta/` into the process that take it as input.
Add the definition of the channel to the `src/fasta_sampler.nf` file and commit to your repository.
......@@ -164,7 +165,8 @@ After writing this first pipeline, you may want to test it. To do that, first cl
You can then run the following commands to download your project on your computer:
```sh
git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/<usr_name>/nextflow.git
git config --global http.sslVerify false
git clone https://gitlab.biologie.ens-lyon.fr/<usr_name>/nextflow.git
cd nextflow
src/install_nextflow.sh
```
......@@ -173,7 +175,7 @@ We also need data to run our pipeline:
```
cd data
git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git
git clone https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git
cd ..
```
......@@ -246,7 +248,7 @@ For this practical, we are going to need the following tools:
To initialize these tools, follow the **Installing** section of the [README.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/README.md) file.
If you are using a CBP computer don’t forget to clean up your docker containers at the end of the practical with the following command:
**If you are using a CBP computer don’t forget to clean up your docker containers at the end of the practical with the following command:**
```sh
docker rm $(docker stop $(docker ps -aq))
......@@ -255,9 +257,9 @@ docker rmi $(docker images -qf "dangling=true")
## Cutadapt
The first step of the pipeline is to remove any Illumina adaptor left in your read files.
The first step of the pipeline is to remove any Illumina adaptors left in your read files.
Open the WebIDE and create a `src/RNASeq.nf` file. Browse for [src/nf_modules/cutadapt/cutadapt.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/cutadapt/cutadapt.nf), this file contains examples for cutadapt. We are interested in the *Illumina adaptor removal*, *for paired-end data* section of the code. Copy this code in your pipeline and commit.
Open the WebIDE and create a `src/RNASeq.nf` file. Browse for [src/nf_modules/cutadapt/adaptor_removal_paired.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/cutadapt/adaptor_removal_paired.nf), this file contains examples for cutadapt. We are interested in the *Illumina adaptor removal*, *for paired-end data* section of the code. Copy this code in your pipeline and commit.
Compared to before, we have few new lines:
......@@ -293,7 +295,7 @@ For the `fastq_sampler.nf` pipeline we used the command `head` present in most b
- launch the process in a Docker container that has cutadapt installed
- launch the process with SGE while loading the correct module to have cutadapt available
We are not going to use the first option which requires no configuration for nextflow but tedious tools installations. Instead, we are going to use existing *wrappers* and tell nextflow about it. This is what the [src/nf_modules/cutadapt/cutadapt.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/cutadapt/cutadapt.config) is used for.
We are not going to use the first option which requires no configuration for nextflow but tedious tools installations. Instead, we are going to use existing *wrappers* and tell nextflow about it. This is what the [src/nf_modules/cutadapt/adaptor_removal_paired.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/cutadapt/adaptor_removal_paired.config) is used for.
Copy the content of this config file to an `src/RNASeq.config` file. This file is structured in process blocks. Here we are only interested in configuring `adaptor_removal` process not `trimming` process. So you can remove the `trimming` block and commit.
......@@ -308,7 +310,7 @@ You can test your pipeline with the following command:
The second step of the pipeline is to trim reads by quality.
Browse for [src/nf_modules/UrQt/urqt.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/UrQt/urqt.nf), this file contains examples for UrQt. We are interested in the *for paired-end data* section of the code. Copy the process section code in your pipeline and commit.
Browse for [src/nf_modules/UrQt/trimming_paired.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/UrQt/trimming_paired.nf), this file contains examples for UrQt. We are interested in the *for paired-end data* section of the code. Copy the process section code in your pipeline and commit.
This code won’t work if you try to run it: the `fastq_file` channel is already consumed by the `adaptor_removal` process. In nextflow once a channel is used by a process, it ceases to exist. Moreover, we don’t want to trim the input fastq, we want to trim the fastq that comes from the `adaptor_removal` process.
......@@ -326,7 +328,7 @@ set pair_id, file(reads) from fastq_files_cut
The two processes are now connected by the channel `fastq_files_cut`.
Add the content of the [src/nf_modules/UrQt/urqt.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/UrQt/urqt.config) file to your `src/RNASeq.config` file and commit.
Add the content of the [src/nf_modules/UrQt/trimming_paired.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/UrQt/trimming_paired.config) file to your `src/RNASeq.config` file and commit.
You can test your pipeline.
......@@ -334,7 +336,7 @@ You can test your pipeline.
Kallisto need the sequences of the transcripts that need to be quantified. We are going to extract these sequences from the reference `data/tiny_dataset/fasta/tiny_v2.fasta` with the `bed` annotation `data/tiny_dataset/annot/tiny.bed`.
You can copy to your `src/RNASeq.nf` file the content of [src/nf_modules/BEDtools/bedtools.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/BEDtools/bedtools.nf) and to your `src/RNASeq.config` file the content of [src/nf_modules/BEDtools/bedtools.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/BEDtools/bedtools.config).
You can copy to your `src/RNASeq.nf` file the content of [src/nf_modules/BEDtools/fasta_from_bed.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/BEDtools/fasta_from_bed.nf) and to your `src/RNASeq.config` file the content of [src/nf_modules/BEDtools/fasta_from_bed.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/BEDtools/fasta_from_bed.config).
Commit your work and test your pipeline with the following command:
......@@ -346,7 +348,7 @@ Commit your work and test your pipeline with the following command:
Kallisto run in two steps: the indexation of the reference and the quantification on this index.
You can copy to your `src/RNASeq.nf` file the relevant content of [src/nf_modules/Kallisto/kallisto.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/kallisto.nf) and to your `src/RNASeq.config` file the content of [src/nf_modules/Kallisto/kallisto.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/kallisto.config).
You can copy to your `src/RNASeq.nf` file the content of the files [src/nf_modules/Kallisto/indexing.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/indexing.nf) and [src/nf_modules/Kallisto/mapping_paired.nf](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/mapping_paired.nf). You can add to your file `src/RNASeq.config` file the content of the files [src/nf_modules/Kallisto/indexing.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/indexing.config) and [src/nf_modules/Kallisto/mapping_paired.config](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/src/nf_modules/Kallisto/mapping_paired.config).
We are going to work with paired-end so only copy the relevant processes. The `index_fasta` process needs to take as input the output of your `fasta_from_bed` process. The `fastq` input of your `mapping_fastq` process needs to take as input the output of your `index_fasta` process and the `trimming` process.
......@@ -384,12 +386,21 @@ ln -s /Xnfs/lbmcdb/common/modules/modulefiles ~/privatemodules
echo "module use ~/privatemodules" >> .bashrc
```
Create and go to your `scratch` folder:
```sh
mkdir -p /scratch/<login>
cd /scratch/<login>
echo "module use ~/privatemodules" >> .bashrc
```
Then you need to clone your pipeline and get the data:
```sh
git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/lmodolo/nextflow.git
git config --global http.sslVerify false
git clone https://gitlab.biologie.ens-lyon.fr/lmodolo/nextflow.git
cd nextflow/data
git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git
git clone https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git
cd ..
```
......@@ -400,13 +411,13 @@ As we don’t want nextflow to be killed in case of disconnection, we start by l
```sh
tmux
module load nextflow/0.28.2
nextflow src/RNASeq.nf -c src/RNASeq.config -profile sge --fastq "data/tiny_dataset/fastq/*_R{1,2}.fastq" --fasta "data/tiny_dataset/fasta/tiny_v2.fasta" --bed "data/tiny_dataset/annot/tiny.bed"
nextflow src/RNASeq.nf -c src/RNASeq.config -profile sge --fastq "data/tiny_dataset/fastq/*_R{1,2}.fastq" --fasta "data/tiny_dataset/fasta/tiny_v2.fasta" --bed "data/tiny_dataset/annot/tiny.bed" -w /scratch/<login>
```
To use the scratch for nextflow computations add the option :
```sh
-w /scratch/login
-w /scratch/<login>
```
You just ran your pipeline on the PSMN!
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment