Verified Commit e90c027d authored by Laurent Modolo's avatar Laurent Modolo
Browse files

update guide to build pipeline

parent 91ec481f
......@@ -196,7 +196,7 @@ You can the use tests on `read.size()` to define conditional `script` block:
--json ${file_prefix}_fastp.json \
--report_title ${file_prefix}
"""
else if (reads.size() == 1)
else
"""
fastp --thread ${task.cpus} \
${params.fastp} \
......
---
title: "TP for experimental biologists"
author: Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)
date: 6 Jun 2018
output:
pdf_document:
toc: true
toc_depth: 3
number_sections: true
highlight: tango
latex_engine: xelatex
---
The Goal of this practical is to learn how to build your own pipeline with nextflow and using the tools already *wrapped*.
For this we are going to build a small RNASeq analysis pipeline that should run the following steps:
- remove Illumina adaptors
- trim reads by quality
- build the index of a reference genome
- estimate the amount of RNA fragments mapping to the transcripts of this genome
**To do this practical you will need to have [Docker](https://www.docker.com/) installed and running on your computer**
# Initialize your own project
You are going to build a pipeline for you or your team. So the first step is to create your own project.
## Forking
Instead of reinventing the wheel, you can use the [LBMC/nextflow](https://gitbio.ens-lyon.fr/LBMC/nextflow) as a template.
To easily do so, go to the [LBMC/nextflow](https://gitbio.ens-lyon.fr/LBMC/nextflow) repository and click on the [**fork**](https://gitbio.ens-lyon.fr/LBMC/nextflow/forks/new) button (you need to log-in).
![fork button](img/fork.png)
In git, the [action of forking](https://git-scm.com/book/en/v2/GitHub-Contributing-to-a-Project) means that you are going to make your own private copy of a repository. You can then write modifications in your project, and if they are of interest for the source repository create a merge request (here [LBMC/nextflow](https://gitbio.ens-lyon.fr/LBMC/nextflow)). Merge requests are sent to the source repository to ask the maintainers to integrate modifications.
![merge request button](img/merge_request.png)
## Project organisation
This project (and yours) follows the [guide of good practices for the LBMC](http://www.ens-lyon.fr/LBMC/intranet/services-communs/pole-bioinformatique/ressources/good_practice_LBMC)
You are now on the main page of your fork of the [LBMC/nextflow](https://gitbio.ens-lyon.fr/LBMC/nextflow). You can explore this project, all the code in it is under the CeCILL licence (in the [LICENCE](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/LICENSE) file).
The [README.md](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/README.md) file contains instructions to run your pipeline and test its installation.
The [CONTRIBUTING.md](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/CONTRIBUTING.md) file contains guidelines if you want to contribute to the [LBMC/nextflow](https://gitbio.ens-lyon.fr/LBMC/nextflow) (making a merge request for example).
The [data](https://gitbio.ens-lyon.fr/LBMC/nextflow/tree/master/data) folder will be the place where you store the raw data for your analysis.
The [results](https://gitbio.ens-lyon.fr/LBMC/nextflow/tree/master/results) folder will be the place where you store the results of your analysis.
> **The content of `data` and `results` folders should never be saved on git.**
The [doc](https://gitbio.ens-lyon.fr/LBMC/nextflow/tree/master/doc) folder contains the documentation of this practical course.
And most interestingly for you, the [src](https://gitbio.ens-lyon.fr/LBMC/nextflow/tree/master/src) contains code to wrap tools. This folder contains one visible subdirectories `nf_modules` some pipeline examples and other hidden files.
### `nf_modules`
The `src/nf_modules` folder contains templates of [nextflow](https://www.nextflow.io/) wrappers for the tools available in [Docker](https://www.docker.com/what-docker). The details of the [nextflow](https://www.nextflow.io/) wrapper will be presented in the next section. Alongside the `.nf` and `.config` files, there is a `tests.sh` script to run test on the tool.
# Nextflow pipeline
A pipeline is a succession of **process**. Each process has data input(s) and optional data output(s). Data flows are modeled as **channels**.
## Processes
Here is an example of **process**:
```Groovy
process sample_fasta {
input:
file fasta from fasta_file
output:
file "sample.fasta" into fasta_sample
script:
"""
head ${fasta} > sample.fasta
"""
}
```
We have the process `sample_fasta` that takes a `fasta_file` **channel** as input and as output a `fasta_sample` **channel**. The process itself is defined in the `script:` block and within `"""`.
```Groovy
input:
file fasta from fasta_file
```
When we zoom on the `input:` block we see that we define a variable `fasta` of type `file` from the `fasta_file` **channel**. This mean that groovy is going to write a file named as the content of the variable `fasta` in the root of the folder where `script:` is executed.
```Groovy
output:
file "sample.fasta" into fasta_sample
```
At the end of the script, a file named `sample.fasta` is found in the root the folder where `script:` is executed and send into the **channel** `fasta_sample`.
Using the WebIDE of Gitlab, create a file `src/fasta_sampler.nf` with this process and commit it to your repository.
![webide](img/webide.png)
## Channels
Why bother with channels? In the above example, the advantages of channels are not really clear. We could have just given the `fasta` file to the process. But what if we have many fasta files to process? What if we have sub processes to run on each of the sampled fasta files? Nextflow can easily deal with these problems with the help of channels.
> **Channels** are streams of items that are emitted by a source and consumed by a process. A process with a channel as input will be run on every item send through the channel.
```Groovy
Channel
.fromPath( "data/tiny_dataset/fasta/*.fasta" )
.set { fasta_file }
```
Here we defined the channel `fasta_file` that is going to send every fasta file from the folder `data/tiny_dataset/fasta/` into the process that take it as input.
Add the definition of the channel to the `src/fasta_sampler.nf` file and commit it to your repository.
## Run your pipeline locally
After writing this first pipeline, you may want to test it. To do that, first clone your repository. To easily do that set the visibility level to *public* in the settings/General/Permissions page of your project.
You can then run the following commands to download your project on your computer:
and then :
```sh
git clone git@gitbio.ens-lyon.fr:<usr_name>/nextflow.git
cd nextflow
src/install_nextflow.sh
```
We also need data to run our pipeline:
```
cd data
git clone git@gitbio.ens-lyon.fr:LBMC/hub/tiny_dataset.git
cd ..
```
We can run our pipeline with the following command:
```sh
./nextflow src/fasta_sampler.nf
```
## Getting your results
Our pipeline seems to work but we don’t know where is the `sample.fasta`. To get results out of a process, we need to tell nextflow to write it somewhere (we may don’t need to get every intermediate file in our results).
To do that we need to add the following line before the `input:` section:
```Groovy
publishDir "results/sampling/", mode: 'copy'
```
Every file described in the `output:` section will be copied from nextflow to the folder `results/sampling/`.
Add this to your `src/fasta_sampler.nf` file with the WebIDE and commit to your repository.
Pull your modifications locally with the command:
```sh
git pull origin master
```
You can run your pipeline again and check the content of the folder `results/sampling`.
## Fasta everywhere
We ran our pipeline on one fasta file. How would nextflow handle 100 of them? To test that we need to duplicate the `tiny_v2.fasta` file:
```sh
for i in {1..100}
do
cp data/tiny_dataset/fasta/tiny_v2.fasta data/tiny_dataset/fasta/tiny_v2_${i}.fasta
done
```
You can run your pipeline again and check the content of the folder `results/sampling`.
Every `fasta_sampler` process write a `sample.fasta` file. We need to make the name of the output file dependent of the name of the input file.
```Groovy
output:
file "*_sample.fasta" into fasta_sample
script:
"""
head ${fasta} > ${fasta.baseName}_sample.fasta
"""
```
Add this to your `src/fasta_sampler.nf` file with the WebIDE and commit it to your repository before pulling your modifications locally.
You can run your pipeline again and check the content of the folder `results/sampling`.
# Build your own RNASeq pipeline
In this section you are going to build your own pipeline for RNASeq analysis from the code available in the `src/nf_modules` folder.
## Cutadapt
The first step of the pipeline is to remove any Illumina adaptors left in your read files.
Open the WebIDE and create a `src/RNASeq.nf` file. Browse for [src/nf_modules/cutadapt/adaptor_removal_paired.nf](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/src/nf_modules/cutadapt/adaptor_removal_paired.nf), this file contains examples for cutadapt. We are interested in the *Illumina adaptor removal*, *for paired-end data* section of the code. Copy this code in your pipeline and commit it.
Compared to before, we have few new lines:
```Groovy
params.fastq = "$baseDir/data/fastq/*_{1,2}.fastq"
```
We declare a variable that contains the path of the fastq file to look for. The advantage of using `params.fastq` is that the option `--fastq` is now a parameter of your pipeline.
Thus, you can call your pipeline with the `--fastq` option:
```sh
./nextflow src/RNASeq.nf --fastq "data/tiny_dataset/fastq/*_R{1,2}.fastq"
```
```Groovy
log.info "fastq files: ${params.fastq}"
```
This line simply displays the value of the variable
```Groovy
Channel
.fromFilePairs( params.fastq )
```
As we are working with paired-end RNASeq data, we tell nextflow to send pairs of fastq in the `fastq_file` channel.
### cutadapt.config
For the `fastq_sampler.nf` pipeline we used the command `head` present in most base UNIX systems. Here we want to use `cutadapt` which is not. Therefore, we have three main options:
- install cutadapt locally so nextflow can use it
- launch the process in a [Docker](https://www.docker.com/) container that has cutadapt installed
- launch the process in a [Singularity](https://singularity.lbl.gov/) container (what we do on the PSMN and CCIN2P3)
We are not going to use the first option which requires no configuration for nextflow but tedious tools installations. Instead, we are going to use existing *wrappers* and tell nextflow about it. This is what the [src/nf_modules/cutadapt/adaptor_removal_paired.config](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/src/nf_modules/cutadapt/adaptor_removal_paired.config) is used for.
Copy the content of this config file to an `src/RNASeq.config` file. This file is structured in process blocks. Here we are only interested in configuring `adaptor_removal` process not `trimming` process. So you can remove the `trimming` block and commit it.
You can test your pipeline with the following command:
```sh
./nextflow src/RNASeq.nf -c src/RNASeq.config -profile docker --fastq "data/tiny_dataset/fastq/*_R{1,2}.fastq"
```
## UrQt
The second step of the pipeline is to trim reads by quality.
Browse for [src/nf_modules/urqt/trimming_paired.nf](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/src/nf_modules/urqt/trimming_paired.nf), this file contains examples for UrQt. We are interested in the *for paired-end data* section of the code. Copy the process section code in your pipeline and commit it.
This code won’t work if you try to run it: the `fastq_file` channel is already consumed by the `adaptor_removal` process. In nextflow once a channel is used by a process, it ceases to exist. Moreover, we don’t want to trim the input fastq, we want to trim the fastq that comes from the `adaptor_removal` process.
Therefore, you need to change the line:
```Groovy
set pair_id, file(reads) from fastq_files
```
In the `trimming` process to:
```Groovy
set pair_id, file(reads) from fastq_files_cut
```
The two processes are now connected by the channel `fastq_files_cut`.
Add the content of the [src/nf_modules/urqt/trimming_paired.config](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/src/nf_modules/urqt/trimming_paired.config) file to your `src/RNASeq.config` file and commit it.
You can test your pipeline.
## BEDtools
Kallisto need the sequences of the transcripts that need to be quantified. We are going to extract these sequences from the reference `data/tiny_dataset/fasta/tiny_v2.fasta` with the `bed` annotation `data/tiny_dataset/annot/tiny.bed`.
You can copy to your `src/RNASeq.nf` file the content of [src/nf_modules/bedtools/fasta_from_bed.nf](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/src/nf_modules/bedtools/fasta_from_bed.nf) and to your `src/RNASeq.config` file the content of [src/nf_modules/bedtools/fasta_from_bed.config](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/src/nf_modules/bedtools/fasta_from_bed.config).
Commit your work and test your pipeline with the following command:
```sh
./nextflow src/RNASeq.nf -c src/RNASeq.config -profile docker --fastq "data/tiny_dataset/fastq/*_R{1,2}.fastq" --fasta "data/tiny_dataset/fasta/tiny_v2.fasta" --bed "data/tiny_dataset/annot/tiny.bed"
```
## Kallisto
Kallisto run in two steps: the indexation of the reference and the quantification on this index.
You can copy to your `src/RNASeq.nf` file the content of the files [src/nf_modules/kallisto/indexing.nf](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/src/nf_modules/kallisto/indexing.nf) and [src/nf_modules/kallisto/mapping_paired.nf](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/src/nf_modules/kallisto/mapping_paired.nf). You can add to your file `src/RNASeq.config` file the content of the files [src/nf_modules/kallisto/indexing.config](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/src/nf_modules/kallisto/indexing.config) and [src/nf_modules/kallisto/mapping_paired.config](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/src/nf_modules/kallisto/mapping_paired.config).
We are going to work with paired-end so only copy the relevant processes. The `index_fasta` process needs to take as input the output of your `fasta_from_bed` process. The `fastq` input of your `mapping_fastq` process needs to take as input the output of your `index_fasta` process and the `trimming` process.
Commit your work and test your pipeline.
You now have a RNASeq analysis pipeline that can run locally with Docker!
## Additional nextflow option
With nextflow you can restart the computation of a pipeline and get a trace of the process with the following options:
```sh
-resume -with-dag results/RNASeq_dag.pdf -with-timeline results/RNASeq_timeline
```
# Run your RNASeq pipeline on the PSMN
First you need to connect to the PSMN:
```sh
login@allo-psmn
```
Then once connected to `allo-psmn`, you can connect to `e5-2667v4comp1`:
```sh
login@e5-2667v4comp1
```
## Set your environment
Create and go to your `scratch` folder:
```sh
mkdir -p /scratch/Bio/<login>
cd /scratch/Bio/<login>
```
Then you need to clone your pipeline and get the data:
```sh
git clone https://gitbio.ens-lyon.fr/<usr_name>/nextflow.git
cd nextflow/data
git clone https://gitbio.ens-lyon.fr/LBMC/hub/tiny_dataset.git
cd ..
```
## Run nextflow
As we don’t want nextflow to be killed in case of disconnection, we start by launching `tmux`. In case of deconnection, you can restore your session with the command `tmux a` and close one with `ctr + b + d`
```sh
tmux
src/install_nextflow.sh
./nextflow src/RNASeq.nf -c src/RNASeq.config -profile psmn --fastq "data/tiny_dataset/fastq/*_R{1,2}.fastq" --fasta "data/tiny_dataset/fasta/tiny_v2.fasta" --bed "data/tiny_dataset/annot/tiny.bed" -w /scratch/Bio/<login>
```
To use the scratch for nextflow computations add the option :
```sh
-w /scratch/<login>
```
You just ran your pipeline on the PSMN!
......@@ -2,10 +2,10 @@
The goal of this pratical is to walk you through the nextflow pipeline building process you will learn:
- How to use this [git repository (LBMC/nextflow)](https://gitbio.ens-lyon.fr/LBMC/nextflow) as a template for your project.
- The basis of [Nextflow](https://www.nextflow.io/) the pipeline manager that we use at the lab.
- How to build a simple pipeline for the transcript level quantification of RNASeq data
- How to run the exact same pipeline on a computing center ([PSMN](http://www.ens-lyon.fr/PSMN/doku.php))
1. How to use this [git repository (LBMC/nextflow)](https://gitbio.ens-lyon.fr/LBMC/nextflow) as a template for your project.
2. The basis of [Nextflow](https://www.nextflow.io/) the pipeline manager that we use at the lab.
3. How to build a simple pipeline for the transcript level quantification of RNASeq data
4. How to run the exact same pipeline on a computing center ([PSMN](http://www.ens-lyon.fr/PSMN/doku.php))
This guide assumes that you followed the [Git basis, trainning course](https://gitbio.ens-lyon.fr/LBMC/hub/formations/git_basis).
......@@ -87,10 +87,18 @@ file "sample.fasta", emit: fasta_sample
At the end of the script, a file named `sample.fasta` is found in the root the folder where `script:` is executed and will be emited as `fasta_sample`.
Using the WebIDE of Gitlab, create a file `src/fasta_sampler.nf` with this process and commit it to your repository.
Using the WebIDE of Gitlab, create a file `src/fasta_sampler.nf`
![webide](img/webide.png)
The first line that you need to add is:
```Groovy
nextflow.enable.dsl=2
```
Then add the `sample_fastq` process and commit it to your repository.
## Workflow
In Nexflow, `process` blocks are chained together within a `workflow` block.
......@@ -98,26 +106,15 @@ For the time beeing, we only have one `process` so `workflow` may look like an u
```
workflow {
take:
fasta_files
main:
sample_fasta(fasta_file)
sample_fasta(fasta_file)
}
```
Like `process` blocks `workflow` can take some imputs:
```
take:
fasta_files
```
Like `process` blocks `workflow` can take some imputs: `fasta_files`
and transmit this input to `process`es
```
main:
sample_fasta(fasta_file)
sample_fasta(fasta_file)
```
The `main:` block is where we are goint to call our `process`(es)
......@@ -219,3 +216,249 @@ Add this to your `src/fasta_sampler.nf` file with the WebIDE and commit it to yo
You can run your pipeline again and check the content of the folder `results/sampling`.
# Build your own RNASeq pipeline
In this section you are going to build your own pipeline for RNASeq analysis from the code available in the `src/nf_modules` folder.
Open the WebIDE and create a `src/RNASeq.nf` file.
The first line that we are going to add is:
```Groovy
nextflow.enable.dsl=2
```
## fastp
The first step of the pipeline is to remove any Illumina adaptors left in your read files and to trim your reads by quality.
The [LBMC/nextflow](https://gitbio.ens-lyon.fr/LBMC/nextflow) template provide you with many tools for which you can find predefined `process` block.
You can find a list of these tools in the [`src/nf_modules`](./src/nf_modules) folder.
You can also ask for a new tool by creating an [new issue for it](https://gitbio.ens-lyon.fr/LBMC/nextflow/-/issues/new) in the [LBMC/nextflow](https://gitbio.ens-lyon.fr/LBMC/nextflow) project.
We are going to include the [`src/nf_modules/fastp/main.nf`](./src/nf_modules/fastp/main.nf) in our `src/RNASeq.nf` file
```Groovy
include { fastp } from "./nf_modules/fastp/main.nf"
```
With this ligne we can call the `fastp` block in our future `workflow` without having to write it !
If we check the content of the file [`src/nf_modules/fastp/main.nf`](./src/nf_modules/fastp/main.nf), we can see that by including `fastp`, we are including a sub-`workflow` (we will come back on this object latter).
This `sub-workflow` takes a `fastq` `channel`. We need to make one
The `./nf_modules/fastp/main.nf` is relative to the `src/RNASeq.nf` file, this is why we don't include the `src/` part of the path.
```Groovy
channel
.fromFilePairs( "data/tiny_dataset/fastq/*_R{1,2}.fastq", size: -1)
.set { fastq_files }
```
The `.fromFilePairs()` can create a `channel` of pair of fastq files. Therefore, the items emited by the `fastq_files` channel are going to be pairs of fastq for paired-end data.
The option `size: -1` allows arbitrary number of associated files. Therefore, we can use the same `channel` creation for single-end data.
We can now include the `workflow` definition, passing the `fastq_files` `channel` to `fastp` to our `src/RNASeq.nf` file
```Groovy
workflow {
fastp(fastq_files)
}
```
You can commit your `src/RNASeq.nf` file, `pull` your modification locally and run your pipeline with the command:
```Groovy
./nextflow src/RNASeq.nf
```
What is happening ?
## Nextflow `-profile`
Nextflow tells you the following error: `fastp: command not found`. You don't have `fastp` installed on your computer.
Tools instalation can be a tedious process and reinstalling old version of those tools to reproduce old analyses can be very difficult.
Containers technologies like [Docker](https://www.docker.com/) or [Singularity](https://sylabs.io/singularity/) allows to create small virtual environments where we can install a software in a given version with all it's dependencies. This environement can be saved, and share, to have access to this exact working version of the software.
> Why two differents systems ?
> Docker is easy to use and can be installed on Windows / MacOS / GNU/Linux but need admin rights
> Singularity can only be used on GNU/Linux but dont need admin rights, and can be used on shared environement
The [LBMC/nextflow](https://gitbio.ens-lyon.fr/LBMC/nextflow) template provide you with [4 differents `-profile`s to run your pipeline](https://gitbio.ens-lyon.fr/LBMC/nextflow/-/blob/master/doc/getting_started.md#nextflow-profile).
Profiles are defined in the [`src/nextflow.config`](./src/nextflow.config), which is the default configuration file for your pipeline (you don't have to edit this file).
To run the pipeline locally you can use the profile `singularity` or `docker`
```Groovy
./nextflow src/RNASeq.nf -profile singularity
```
The `fastp` `singularity` or `docker` image is downloaded automatically and the fastq file are processed.
## Pipeline `--` arguments
We have defined the fastq file path within our `src/RNASeq.nf` file.
But, what if we want to share our pipeline with someone who don't want to analyse the `tiny_dataset` and but other fastq.
We can define a variable instead of fixing the path.
```Groovy
params.fastq = "data/fastq/*_{1,2}.fastq"
channel
.fromFilePairs( params.fastq, size: -1)
.set { fastq_files }
```
We declare a variable that contains the path of the fastq file to look for. The advantage of using `params.fastq` is that the option `--fastq` is now a parameter of your pipeline.
Thus, you can call your pipeline with the `--fastq` option:
```sh
./nextflow src/RNASeq.nf -profile singularity --fastq "data/tiny_dataset/fastq/*_R{1,2}.fastq"
```
We can also add the following line:
```Groovy
log.info "fastq files: ${params.fastq}"
```
This line simply displays the value of the variable
## BEDtools
We need the sequences of the transcripts that need to be quantified. We are going to extract these sequences from the reference `data/tiny_dataset/fasta/tiny_v2.fasta` with the `bed` annotation `data/tiny_dataset/annot/tiny.bed`.
You include the `fasta_from_bed` process from the [src/nf_modules/bedtools/main.nf](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/src/nf_modules/bedtools/main.nf) file to your `src/RNASeq.nf` file.
You need to be able to input a `fasta_files` `channel` and a `bed_files` `channel`.
```Groovy
log.info "fasta file : ${params.fasta}"
log.info "bed file : ${params.bed}"
channel
.fromPath( params.fasta )
.ifEmpty { error "Cannot find any fasta files matching: ${params.fasta}" }
.map { it -> [it.simpleName, it]}
.set { fasta_files }
channel
.fromPath( params.bed )
.ifEmpty { error "Cannot find any bed files matching: ${params.bed}" }
.map { it -> [it.simpleName, it]}
.set { bed_files }
```
We introduce 2 new directives:
- `.ifEmpty { error "Cannot find any fasta files matching: ${params.fasta}" }` to throw an error if the path of the file is not right
- `.map { it -> [it.simpleName, it]}` to transform our `channel` to a format compatible with the [`CONTRIBUTING`](../CONTRIBUTING.md) rules
We can add the `fastq_from_bed` step to our `workflow`
```Groovy
workflow {
sample_fasta(fasta_file)
fasta_from_bed(fasta_files, bed_files)
}
```
Commit your work and test your pipeline with the following command:
```sh
./nextflow src/RNASeq.nf -profile singularity --fastq "data/tiny_dataset/fastq/*_R{1,2}.fastq" --fasta "data/tiny_dataset/fasta/tiny_v2.fasta" --bed "data/tiny_dataset/annot/tiny.bed"
```
## Kallisto
Kallisto run in two steps: the indexation of the reference and the quantification on this index.
You can include two `process`es with the following syntax:
```Groovy
include { index_fasta; mapping_fastq } from './nf_modules/kallisto/main.nf'
```
The `index_fasta` process needs to take as input the output of your `fasta_from_bed` `process`.
The input of your `mapping_fastq` `process` needs to take as input and the output of your `index_fasta` `process` and the `fastp` `process`.
The output of a `process` is accessible through `<process_name>.out`.
In the cases where we have an `emit: <channel_name>` we can access the corrsponding channel with `<process_name>.out.<channel_name>`
```Groovy
workflow {
fastp(fastq_files)
fasta_from_bed(fasta_files, bed_files)
index_fasta(fasta_from_bed.out.fasta)
mapping_fastq(index_fasta.out.index.collect(), fastp.out.fastq)
}
```
Commit your work and test your pipeline.
## Returning results
By default none of the `process` defined in `src/nf_modules` use the `publishDir` instruction.
You can specify their `publishDir` directory by specifying the :
```Groovy
params.<process_name>_out = "path"
```
Where "path" will describe a path within the `results` folder