-
Laurent Modolo authoredLaurent Modolo authored
Building your own pipeline
The goal of this guide is to walk you through the Nextflow pipeline building process you will learn:
- How to use this git repository (LBMC/nextflow) as a template for your project.
- The basis of Nextflow the pipeline manager that we use at the lab.
- How to build a simple pipeline for the transcript-level quantification of RNASeq data
- How to run the exact same pipeline on a computing center (PSMN)
This guide assumes that you followed the Git basis, training course.
Initialize your own project
You are going to build a pipeline for you or your team. So the first step is to create your own project.
Forking
Instead of reinventing the wheel, you can use the LBMC/nextflow as a template. To easily do so, go to the LBMC/nextflow repository and click on the fork button (you need to log-in).
In git, the action of forking means that you are going to make your own private copy of a repository. This repository will keep a link with the original LBMC/nextflow project from which you will be able to
-
get updates
LBMC/nextflow
from the repository - propose update (see contributing guide)
Project organization
This project (and yours) follows the guide of good practices for the LBMC
You are now on the main page of your fork of the LBMC/nextflow. You can explore this project, all the codes in it is under the CeCILL licence (in the LICENCE file).
The README.md file contains instructions to run your pipeline and test its installation.
The CONTRIBUTING.md file contains guidelines if you want to contribute to the LBMC/nextflow.
The data folder will be the place where you store the raw data for your analysis. The results folder will be the place where you store the results of your analysis.
The content of data
and results
folders should never be saved on git.
The doc folder contains the documentation and this guide.
And most interestingly for you, the src contains code to wrap tools. This folder contains one visible subdirectories nf_modules
some pipeline examples and other hidden folders and files.
Nextflow pipeline
A pipeline is a succession of process. Each process
has data input(s) and optional data output(s). Data flows are modeled as channels.
Processes
Here is an example of process:
process sample_fasta {
input:
path fasta
output:
path "sample.fasta", emit: fasta_sample
script:
"""
head ${fasta} > sample.fasta
"""
}
We have the process sample_fasta
that takes fasta path
input and as output a fasta path
. The process
task itself is defined in the script:
block and within """
.
input:
path fasta
When we zoom on the input:
block, we see that we define a variable fasta
of type path
.
This means that the sample_fasta
process
is going to get a flux of fasta file(s).
Nextflow is going to write a file named as the content of the variable fasta
in the root of the folder where script:
is executed.
output:
path "sample.fasta", emit: fasta_sample
At the end of the script, a file named sample.fasta
is found in the root the folder where script:
is executed and will be emitted as fasta_sample
.