From 8a2d1d85639ad2a7e05efce9aa52cf828bc913e5 Mon Sep 17 00:00:00 2001
From: Laurent Modolo <laurent.modolo@ens-lyon.fr>
Date: Tue, 5 Jun 2018 20:39:15 +0200
Subject: [PATCH] TP.md: fasta_sampler section

---
 doc/TP_experimental_biologists.md | 84 ++++++++++++++++++++++++++++---
 1 file changed, 77 insertions(+), 7 deletions(-)

diff --git a/doc/TP_experimental_biologists.md b/doc/TP_experimental_biologists.md
index e235a93..96b30cf 100644
--- a/doc/TP_experimental_biologists.md
+++ b/doc/TP_experimental_biologists.md
@@ -96,11 +96,7 @@ The [README.md](https://gitlab.biologie.ens-lyon.fr/PSMN/modules/blob/master/REA
 
 The `src/nf_modules` folder contains templates of [nextflow](https://www.nextflow.io/) wrapper for the tools available in [Docker](https://www.docker.com/what-docker) and [SGE](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:sge). The details of the [nextflow](https://www.nextflow.io/) wrapper will be presented in the next section. Alongside the `.nf` and `.config` there is a `tests` folder that contains a `tests.sh` script to run test on the tool.
 
-# Build your own RNASeq pipeline
-
-In this section you are going to build your own pipeline for RNASeq analysis from the code available in the `src/nf_modules` folder.
-
-## Nextflow pipeline
+# Nextflow pipeline
 
 A pipeline is a succession of **process**. Each process has data input(s) and optional data output(s). Data flow are modeled as **channels**.
 
@@ -142,6 +138,8 @@ At the end of the script, a file named `sample.fasta` is found in the root the f
 
 Using the WebIDE of Gitlab create a file `src/fasta_sampler.nf` with this process and commit to your repository.
 
+![webide](img/webide.png)
+
 ### Channels
 
 Why bother with channels ? In the above example, the advantages of channels are not really clear. We could have just given the `fasta` file to the process. But what if we have many fasta file to process ? What if we have sub processes to run on each of the sampled fasta files ? Nextflow can easily deal with these problems with the help of channels.
@@ -159,7 +157,11 @@ Here we defined a channel `fasta_file` that is going to send every fasta file fr
 Add the definition of the channel to the `src/fasta_sampler.nf` file and commit to your repository.
 
 
-# Run your pipeline locally
+### Run your pipeline locally
+
+After writing this first pipeline, you may want to test it. To do that first clone your repository. To easily do that set visibility level to *public* in the settings of your project.
+
+You can then run the following commands to download your project on your computer :
 
 ```sh
 git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/<usr_name>/nextflow.git
@@ -167,13 +169,80 @@ cd nextflow
 src/install_nextflow.sh
 ```
 
+We also need data to run our pipeline :
+
+```
+cd data
+git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git
+cd ..
+```
+
+We can run our pipeline with the following command:
+
+```sh
+./nextflow src/fasta_sampler.nf
+```
+
+### Getting our results
+
+Our pipeline seems to work but we don't know where is the `sample.fasta`. To get results out of a process, we need to tell nextflow to write it somewhere (we may don't need to get every intermediate files in our results).
+
+To do that we need to add the following line before the `input:` section:
+
+```Groovy
+  publishDir "results/sampling/", mode: 'copy'
+```
+
+Every file described in the `output:` section will be copied from nextflow to the folder `results/sampling/`.
+
+Add this to you `src/fasta_sampler.nf` file with the WebIDE and commit to your repository.
+Pull your modifications locally with the command:
+
+```sh
+git pull origin master
+```
+
+You can run you pipeline again and check the content of the folder `results/sampling`.
+
+### Fasta everywhere
+
+We ran our pipeline on one fasta file. How nextflow would handle 100 of them ? To test that we need to duplicate the `tiny_v2.fasta` file: 
+
+```sh
+for i in {1..100}
+do
+ cp data/tiny_dataset/fasta/tiny_v2.fasta data/tiny_dataset/fasta/tiny_v2_${i}.fasta
+done
+```
+
+You can run you pipeline again and check the content of the folder `results/sampling`.
+
+Every `fasta_sampler` process write a `sample.fasta` file. We need to make the name of the output file dependent of the name of the input file.
+
+```Groovy
+  output:
+    file "*_sample.fasta" into fasta_sample
+
+  script:
+"""
+head ${fasta} > ${fasta.baseName}_sample.fasta
+"""
+```
+
+Add this to you `src/fasta_sampler.nf` file with the WebIDE and commit to your repository before pulling your modifications locally.
+You can run you pipeline again and check the content of the folder `results/sampling`.
+
+# Build your own RNASeq pipeline
+
+In this section you are going to build your own pipeline for RNASeq analysis from the code available in the `src/nf_modules` folder.
+
 ## Create your Docker containers
 
 For this  practical, we are going to need the following tools :
 
 - For Illumina adaptor removal : cutadapt
 - For reads trimming by quality : UrQt
-- For mapping and quantifying reads : Kallisto, RSEM and Bowtie2
+- For mapping and quantifying reads : Kallisto, Bowtie2
 
 To initialize these tools, follow the **Installing** section of the [README.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/README.md) file.
 
@@ -182,6 +251,7 @@ To initialize these tools, follow the **Installing** section of the [README.md](
 
 
 
+
  
 
 
-- 
GitLab