@@ -45,7 +45,7 @@ The [`src/nf_modules/<tool_name>`](./src/nf_modules/fastp/) should contain a [`m
### container informations
The first two lines of [`main.nf`](./src/nf_modules/fastp/main.nf) should define two variables
```
```Groovy
version = "0.20.1"
container_url = "lbmc/fastp:${version}"
```
...
...
@@ -57,7 +57,7 @@ In addition to the `container` directive, each `process` should have one of the
-`small_mem_mono_cpus`
-`small_mem_multi_cpus`
```
```Groovy
process fastp {
container = "${container_url}"
label = "big_mem_multi_cpus"
...
...
@@ -71,7 +71,7 @@ Before each process, you should declare at least two `params.` variables:
- A `params.<process_name>` defaulting to `""` (empty string) to allow user to add more command line option to your process without rewriting the process definition
- A `params.<process_name>_out` defaulting to `""` (empty string) that define the `results/` subfolder where the process output should be copied if the user wants to save the process output
```
```Groovy
params.fastp = ""
params.fastp_out = ""
process fastp {
...
...
@@ -102,7 +102,8 @@ You should always use `tuple` for input and output channel format with at least:
- a `path` for the file(s) that you want to process
for example:
```
```Groovy
process fastp {
container = "${container_url}"
label "big_mem_multi_cpus"
...
...
@@ -127,7 +128,7 @@ So you have to keep that in mind if you want to use it to define output file nam
If you want to use information within the `file_id` to name outputs in your `script` section, you can use the following snipet:
```
```Groovy
script:
if (file_id instanceof List){
file_prefix = file_id[0]
...
...
@@ -140,13 +141,13 @@ and use the `file_prefix` variable.
This also means that channel emitting `path` item should be transformed with at least the following map function:
```
```Groovy
.map { it -> [it.simpleName, it]}
```
for example
```
```Groovy
channel
.fromPath( params.fasta )
.ifEmpty { error "Cannot find any fasta files matching: ${params.fasta}" }
...
...
@@ -161,13 +162,13 @@ The rationale behind taking a `file_id` and emitting the same `file_id` is to fa
When oppening fastq files with `channel.fromFilePairs( params.fastq )`, item in the channel have the following shape:
```
```Groovy
[file_id, [read_1_file, read_2_file]]
```
To make this call more generic, we can use the `size: -1` option, and accept arbitrary number of associated fastq files:
```
```Groovy
channel.fromFilePairs( params.fastq, size: -1 )
```
...
...
@@ -176,7 +177,7 @@ will thus give `[file_id, [read_1_file, read_2_file]]` for paired-end data and `
You can the use tests on `read.size()` to define conditional `script` block:
```
```Groovy
...
script:
if (file_id instanceof List){
...
...
@@ -218,7 +219,7 @@ With the following example, the user can simply include the `fastp` step without
By specifying the `params.fastp_protocol`, the `fastp` step will transparently switch betwen the different `fastp` `process`es.
Here `fastp_default` or `fastp_accel_1splus`, and other protocols can be added later, pipeline will be able to handle these new protocols by simply updating from the `upstream` repository without changing their codes.
```
```Groovy
params.fastp_protocol = ""
workflow fastp {
take:
...
...
@@ -264,7 +265,7 @@ This recipe should have:
The [`docker_init.sh`](./src/.docker_module/fastp/0.20.1/docker_init.sh) script is a small sh script with the following content:
@@ -23,7 +23,7 @@ To easily do so, go to the [LBMC/nextflow](https://gitbio.ens-lyon.fr/LBMC/nextf
In git, the [action of forking](https://git-scm.com/book/en/v2/GitHub-Contributing-to-a-Project) means that you are going to make your own private copy of a repository.
This repository will keep a link with the original [LBMC/nextflow](https://gitbio.ens-lyon.fr/LBMC/nextflow) project from which you will be able to
-[get updates](https://gitbio.ens-lyon.fr/LBMC/nextflow#getting-the-last-updates)[LBMC/nextflow](https://gitbio.ens-lyon.fr/LBMC/nextflow) from the repository
-[get updates](https://gitbio.ens-lyon.fr/LBMC/nextflow#getting-the-last-updates)`LBMC/nextflow` from the repository
- propose update (see [contributing guide](https://gitbio.ens-lyon.fr/LBMC/nextflow/-/blob/master/CONTRIBUTING.md#forking))
...
...
@@ -40,7 +40,7 @@ The [CONTRIBUTING.md](https://gitbio.ens-lyon.fr/LBMC/nextflow/blob/master/CONTR
The [data](https://gitbio.ens-lyon.fr/LBMC/nextflow/tree/master/data) folder will be the place where you store the raw data for your analysis.
The [results](https://gitbio.ens-lyon.fr/LBMC/nextflow/tree/master/results) folder will be the place where you store the results of your analysis.
> **The content of `data` and `results` folders should never be saved on git.**
**The content of `data` and `results` folders should never be saved on git.**
The [doc](https://gitbio.ens-lyon.fr/LBMC/nextflow/tree/master/doc) folder contains the documentation and this guide.
...
...
@@ -104,7 +104,7 @@ Then add the `sample_fastq` process and commit it to your repository.
In Nexflow, `process` blocks are chained together within a `workflow` block.
For the time being, we only have one `process` so `workflow` may look like an unnecessary complication, but keep in mind that we want to be able to write complex bioinformatic pipeline.
```
```Groovy
workflow {
sample_fasta(fasta_file)
}
...
...
@@ -113,11 +113,10 @@ workflow {
Like `process` blocks `workflow` can take some inputs: `fasta_files`
and transmit this input to `process`es
```
```Groovy
sample_fasta(fasta_file)
```
The `main:` block is where we are going to call our `process`(es)
Add the definition of the `workflow` to the `src/fasta_sampler.nf` file and commit it to your repository.
## Channels
...
...
@@ -141,9 +140,7 @@ Add the definition of the `channel`, above the `workflow` block, to the `src/fas
After writing this first pipeline, you may want to test it. To do that, first clone your repository.
After following the [Git basis, training course](https://gitbio.ens-lyon.fr/LBMC/hub/formations/git_basis), you should have an up-to-date `ssh` configuration to connect to the `gitbio.ens-lyon.fr` git server.
You can then run the following commands to download your project on your computer:
and then :
You can run the following commands to download your project on your computer:
@@ -168,7 +165,7 @@ We can run our pipeline with the following command:
## Getting your results
Our pipeline seems to work but we don’t know where is the `sample.fasta`. To get results out of a process, we need to tell nextflow to write it somewhere (we may don’t need to get every intermediate file in our results).
Our pipeline seems to work but we don’t know where is the `sample.fasta`. To get results out of a `process`, we need to tell nextflow to write it somewhere (we may don’t need to get every intermediate file in our results).
To do that we need to add the following line before the `input:` section: