Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • dev
  • master
  • v0.1.0
  • v0.1.2
  • v0.2.0
  • v0.2.1
  • v0.2.2
  • v0.2.3
  • v0.2.4
  • v0.2.5
  • v0.2.6
  • v0.2.7
  • v0.2.8
  • v0.2.9
  • v0.3.0
  • v0.4.0
  • v2.0.0
17 results

Target

Select target project
No results found
Select Git revision
  • dev
  • master
  • revert-0d3ded33
3 results
Show changes
392 files
+ 12314
2408
Compare changes
  • Side-by-side
  • Inline

Files

+6 −0
Original line number Diff line number Diff line
# SPDX-FileCopyrightText: 2022 Laurent Modolo <laurent.modolo@ens-lyon.fr>
#
# SPDX-License-Identifier: AGPL-3.0-or-later

nextflow
.nextflow.log*
.nextflow/
work/
results
workspace.code-workspace
+7 −3
Original line number Diff line number Diff line
[submodule "src/sge_modules"]
	path = src/sge_modules
	url = gitlab_lbmc:PSMN/modules.git
# SPDX-FileCopyrightText: 2022 Laurent Modolo <laurent.modolo@ens-lyon.fr>
#
# SPDX-License-Identifier: AGPL-3.0-or-later

[submodule "src/.docker_modules/hicstuff/3.1.3/hicstuff"]
	path = src/.docker_modules/hicstuff/3.1.3/hicstuff
	url = git@github.com:koszullab/hicstuff.git

.reuse/dep5

0 → 100644
+10 −0
Original line number Diff line number Diff line
Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Upstream-Name: nextflow
Upstream-Contact: Laurent Modolo <laurent.modolo@ens-lyon.fr>
Source: https://gitbio.ens-lyon.fr/LBMC/nextflow

# Sample paragraph, commented out:
#
# Files: src/*
# Copyright: $YEAR $NAME <$CONTACT>
# License: ...

CHANGELOG

deleted100644 → 0
+0 −0
Original line number Diff line number Diff line

CHANGELOG.md

0 → 100644
+123 −0
Original line number Diff line number Diff line
<!--
SPDX-FileCopyrightText: 2022 Laurent Modolo <laurent.modolo@ens-lyon.fr>

SPDX-License-Identifier: CC-BY-SA-4.0
-->

# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.4.0] - 2019-11-18
### Added
- Add new tools (star,...)
- conda support at the psmn

## Changed
- configuration simplification
- docker and singularity image download instead of local build
- hidden directories in `src` for project clarity (only `nf_modules` is visible)

## Removed
- conda support at in2p3 with `-profile in2p3_conda`

## [0.3.0] - 2019-05-23
### Added
- Add new tools (umi_tools, fastp,...)
- singularity support at in2p3 with `-profile in2p3`
- conda support at in2p3 with `-profile in2p3_conda`


## [0.2.9] - 2019-03-26
### Added
- Add new tools (fastq, macs2, umitools, ...)
- singularity support

### Changed
- every tool name is now in lowercase in each module section

## [0.2.7] - 2018-10-23
### Added
- Add new tools (BWA, GATK, sambamba, ...)

### Changed
- `sge` profile is now called `psmn` profile to prepare tests in the CCIN2P3
- every `psmn` config file has an update configuration for mono or 16 cpus queues
- update process naming to follow new nextflow format

## [0.2.6] - 2018-08-23
### Added
- Added `src/training_dataset.nf` to build a small training dataset from NGS data

### Changed
- the structure of `src/nf_modules`: the `tests` folder was removed

## [0.2.5] - 2018-08-22
### Added
- This fine changelog

### Changed
- the structure of `src/nf_modules`: the `tests` folder was removed


## [0.2.4] - 2018-08-02
### Changed
- add `paired_id` variable in the output of every single-end data processes to match the paired output


## [0.2.3] - 2018-07-25
### Added
- List of tools available as nextflow, docker or sge module to the `README.md`


## [0.2.2] - 2018-07-23
### Added
- SRA module from cigogne/nextflow-master 52b510e48daa1fb7


## [0.2.1] - 2018-07-23
### Added
- List of tools available as nextflow, docker or sge module


## [0.2.0] - 2018-06-18
### Added
- `doc/TP_computational_biologists.md`
- Kallisto/0.44.0

### Changed
- add `paired_id` variable in the output of every paired data processes
- BEDtools: fixes for fasta handling
- UrQt: fix git version in Docker


## [0.1.2] - 2018-06-18
### Added
- `doc/tp_experimental_biologist.md` and Makefile to build the pdf
- tests files for BEDtools

### Changed
- Kallisto: various fixes
- UrQt: improve output and various fixes

### Removed
- `src/nf_test.config` modules have their own `.config`


## [0.1.2] - 2018-06-18
### Added
- `doc/tp_experimental_biologist.md` and Makefile to build the pdf
- tests files for BEDtools

### Changed
- Kallisto: various fixes
- UrQt: improve output and various fixes

### Removed
- `src/nf_test.config` modules have their own `.config`


## [0.1.0] - 2018-05-06
This is the first working version of the repository as a nextflow module repository
+269 −68
Original line number Diff line number Diff line
<!--
SPDX-FileCopyrightText: 2022 Laurent Modolo <laurent.modolo@ens-lyon.fr>

SPDX-License-Identifier: CC-BY-SA-4.0
-->

# Contributing

When contributing to this repository, please first discuss the change you wish to make via issue,
email, or any other method with the owners of this repository before making a change. 
When contributing to this repository, please first discuss the change you wish to make via issues,
email, or on the [ENS-Bioinfo channel](https://matrix.to/#/#ens-bioinfo:matrix.org) before making a change. 

## Forking

In git, the [action of forking](https://git-scm.com/book/en/v2/GitHub-Contributing-to-a-Project) means that you are going to make your own private copy of a repository. You can then write modifications in your project, and if they are of interest for the source repository create a merge request (here [LBMC/nextflow](https://gitbio.ens-lyon.fr/LBMC/nextflow)). Merge requests are sent to the source repository to ask the maintainers to integrate modifications.

![merge request button](./doc/img/merge_request.png)

## Project organization

The `LBMC/nextflow` project is structured as follows:
- all the code is in the `src/` folder
- scripts downloading external tools should download them in the `bin/` folder
- all the documentation (including this file) can be found int he `doc/` folder
- the `data` and `results` folders contain the data and results of your pipelines and are ignored by `git`

## Code structure

The `src/` folder is where we want to save the pipeline (`.nf`) scripts. This folder also contains
- the `src/install_nextflow.sh` to install the nextflow executable at the root of the project.
- some pipelines examples (like the one build during the nf_pratical)
- the `src/nextflow.config` global configuration file which contains the `docker`, `singularity`, `psmn` and `ccin2p3` profiles.
- the `src/nf_modules` folder contains per tools `main.nf` modules with predefined process that users can import in their projects with the [DSL2](https://www.nextflow.io/docs/latest/dsl2.html)

But also some hidden folders that users don't need to see when building their pipeline:
- the `src/.docker_modules` contains the recipes for the `docker` containers used in the `src/nf_modules/<tool_names>/main.nf` files
- the `src/.singularity_in2p3` and `src/.singularity_psmn` are symbolic links to the shared folder where the singularity images are downloaded on the PSMN and CCIN2P3 

# Proposing a new tool

Each tool named `<tool_name>` must have two dedicated folders:

- [`src/nf_modules/<tool_name>`](./src/nf_modules/fastp/) where users can find `.nf` files to include
- [`src/.docker_modules/<tool_name>/<version_number>`](./src/.docker_modules/fastp/0.20.1/) where we have the [`Dockerfile`](./src/.docker_modules/fastp/0.20.1/Dockerfile) to construct the container used in the `main.nf` file

## `src/nf_module` guide lines

We are going to take the [`fastp`, `nf_module`](./src/nf_modules/fastp/) as an example.

The [`src/nf_modules/<tool_name>`](./src/nf_modules/fastp/) should contain a [`main.nf`](./src/nf_modules/fastp/main.nf) file that describe at least one process using `<tool_name>`

### container informations

The first two lines of [`main.nf`](./src/nf_modules/fastp/main.nf) should define two variables
```Groovy
version = "0.20.1"
container_url = "lbmc/fastp:${version}"
```

we can then use the `container_url` definition in each `process` in the `container` attribute.
In addition to the `container` directive, each `process` should have one of the following `label` attributes (defined in the `src/nextflow.config` file)
- `big_mem_mono_cpus`
- `big_mem_multi_cpus`
- `small_mem_mono_cpus`
- `small_mem_multi_cpus`

```Groovy
process fastp {
  container = "${container_url}"
  label = "big_mem_multi_cpus"
  ...
}
```

### process options

Before each process, you should declare at least two `params.` variables:
- A `params.<process_name>` defaulting to `""` (empty string) to allow user to add more command line option to your process without rewriting the process definition
- A `params.<process_name>_out` defaulting to `""` (empty string) that define the `results/` subfolder where the process output should be copied if the user wants to save the process output

```Groovy
params.fastp = ""
params.fastp_out = ""
process fastp {
  container = "${container_url}"
  label "big_mem_multi_cpus"
  if (params.fastp_out != "") {
    publishDir "results/${params.fastp_out}", mode: 'copy'
  }
  ...
  script:
"""
fastp --thread ${task.cpus} \
${params.fastp} \
...
"""
}
```

The user can then change the value of these variables:
- from the command line `--fastp "--trim_head1=10"``
- with the `include` command within their pipeline: `include { fastq } from "nf_modules/fastq/main" addParams(fastq_out: "QC/fastq/")
- by defining the variable within their pipeline: `params.fastq_out = "QC/fastq/"

### `input` and `output` format

You should always use `tuple` for input and output channel format with at least:
- a `val` containing variable(s) related to the item
- a `path` for the file(s) that you want to process

for example:

```Groovy
process fastp {
  container = "${container_url}"
  label "big_mem_multi_cpus"
  tag "$file_id"
  if (params.fastp_out != "") {
    publishDir "results/${params.fastp_out}", mode: 'copy'
  }

  input:
  tuple val(file_id), path(reads)

  output:
    tuple val(file_id), path("*.fastq.gz"), emit: fastq
    tuple val(file_id), path("*.html"), emit: html
    tuple val(file_id), path("*.json"), emit: report
...
```

Here `file_id` can be anything from a simple identifier to a list of several variables.
In which case the first item of the List should be usable as a file prefix.
So you have to keep that in mind if you want to use it to define output file names (you can test for that with `file_id instanceof List`).
In some case, the `file_id` may be a Map to have a cleaner access to the `file_id` content by explicit keywords.

If you want to use information within the `file_id` to name outputs in your `script` section, you can use the following snipet:

```Groovy
  script:
    switch(file_id) {
    case {it instanceof List}:
      file_prefix = file_id[0]
    break
    case {it instanceof Map}:
      file_prefix = file_id.values()[0]
    break
    default:
      file_prefix = file_id
    break
  }
```

and use the `file_prefix` variable.

This also means that channel emitting `path` item should be transformed with at least the following map function:

```Groovy
.map { it -> [it.simpleName, it]}
```

for example

```Groovy
channel
  .fromPath( params.fasta )
  .ifEmpty { error "Cannot find any fasta files matching: ${params.fasta}" }
  .map { it -> [it.simpleName, it]}
  .set { fasta_files }
```


The rationale behind taking a `file_id` and emitting the same `file_id` is to facilitate complex channel operations in pipelines without having to rewrite the `process` blocks.

### dealing with paired-end and single-end data

When oppening fastq files with `channel.fromFilePairs( params.fastq )`, item in the channel have the following shape:

```Groovy
[file_id, [read_1_file, read_2_file]]
```