Skip to content
Snippets Groups Projects
Commit f945638e authored by your name's avatar your name
Browse files

src/docker_modules/UrQt/d62c1f8/Dockerfile : solve conflict

parents 30e42975 75375e05
No related branches found
No related tags found
No related merge requests found
Showing
with 251 additions and 30 deletions
# nextflow pipeline # nextflow pipeline
This repository is a template and a library repository to help you build nextflow pipeline. This repository is a template and a library repository to help you build nextflow pipeline.
You can fork this repository to build your own pipeline.
To get the last commits from this repository into your fork use the following commands:
```sh
git remote add upstream https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow.git
git pull upstream master
```
## Getting Started ## Getting Started
...@@ -47,7 +54,7 @@ find src/docker_modules/ -name "docker_init.sh" | awk '{system($0)}' ...@@ -47,7 +54,7 @@ find src/docker_modules/ -name "docker_init.sh" | awk '{system($0)}'
## Running the tests ## Running the tests
To run tests we first need to get a trainning set To run tests we first need to get a training set
```sh ```sh
cd data cd data
git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git git clone -c http.sslVerify=false https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset.git
......
all: TP_experimental_biologists.pdf all: TP_experimental_biologists.pdf TP_computational_biologists.pdf
TP_experimental_biologists.pdf: TP_experimental_biologists.md TP_experimental_biologists.pdf: TP_experimental_biologists.md
R -e 'require(rmarkdown); rmarkdown::render("TP_experimental_biologists.md")' R -e 'require(rmarkdown); rmarkdown::render("TP_experimental_biologists.md")'
TP_computational_biologists.pdf: TP_computational_biologists.md
R -e 'require(rmarkdown); rmarkdown::render("TP_computational_biologists.md")'
---
title: "TP for computational biologists"
author: Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)
date: 20 Jun 2018
output:
pdf_document:
toc: true
toc_depth: 3
number_sections: true
highlight: tango
latex_engine: xelatex
---
The goal of this practical is to learn how to *wrap* tools in [Docker](https://www.docker.com/what-docker) or [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules) to make them available to nextflow on a personal computer or at the [PSMN](http://www.ens-lyon.fr/PSMN/doku.php).
Here we assume that you followed the [TP for experimental biologists](./TP_experimental_biologists.md), and that you know the basics of [Docker containers](https://www.docker.com/what-container) and [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules). We are also going to assume that you know how to build and use a nextflow pipeline from the template [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow).
For the practical you can either work with the WebIDE of Gitlab, or locally as described in the [git: basis formation](https://gitlab.biologie.ens-lyon.fr/formations/git_basis).
# Docker
To run a tool within a [Docker container](https://www.docker.com/what-container) you need to write a `Dockerfile`.
[`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) are found in the [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) project under `src/docker_modules/`. Each [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) is paired with a [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) file like following the example for `Kallisto` version `0.43.1`:
```sh
$ ls -l src/docker_modules/Kallisto/0.43.1/
total 16K
drwxr-xr-x 2 laurent users 4.0K Jun 5 19:06 ./
drwxr-xr-x 3 laurent users 4.0K Jun 6 09:49 ../
-rw-r--r-- 1 laurent users 587 Jun 5 19:06 Dockerfile
-rwxr-xr-x 1 laurent users 79 Jun 5 19:06 docker_init.sh*
```
## [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh)
The [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) is a simple sh script with executable rights (`chmod +x`). By executing this script, the user creates a [Docker container](https://www.docker.com/what-container) with the tool installed a specific version. You can check the [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) file of any implemented tools as a template.
Remember that the name of the [container](https://www.docker.com/what-container) must be in lower case and in the format `<tool_name>:<version>`.
For tools without a version number you can use a commit hash instead.
## [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile)
The recipe to wrap your tool in a [Docker container](https://www.docker.com/what-container) is written in a [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) file.
For `Kallisto` version `0.44.0` the header of the `Dockerfile` is :
```Docker
FROM ubuntu:18.04
MAINTAINER Laurent Modolo
ENV KALLISTO_VERSION=0.44.0
```
The `FROM` instruction means that the [container](https://www.docker.com/what-container) is initialized from a bare installation of Ubuntu 18.04. You can check the versions of Ubuntu available [here](https://hub.docker.com/_/ubuntu/) or others operating systems like [debian](https://hub.docker.com/_/debian/) or [worst](https://hub.docker.com/r/microsoft/windowsservercore/).
Then we declare the *maintainer* of the container. Before declaring an environment variable for the container named `KALLISTO_VERSION`, which contains the version of the tool wrapped. This this bash variable will be declared for the user root within the [container](https://www.docker.com/what-container).
You should always declare a variable `TOOLSNAME_VERSION` that contains the version number of commit number of the tools you wrap. In simple cases you just have to modify this line to create a new `Dockerfile` for another version of the tool.
The following lines of the [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) are a succession of `bash` commands executed as the **root** user within the container.
Each `RUN` block is run sequentially by `Docker`. If there is an error or modifications in a `RUN` block, only this block and the following `RUN` will be executed.
You can learn more about the building of Docker containers [here](https://docs.docker.com/engine/reference/builder/#usage).
When you build your [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile), instead of launching many times the [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) script to tests your [container](https://www.docker.com/what-container), you can connect to a base container in interactive mode to launch tests your commands.
```sh
docker run -it ubuntu:18.04 bash
KALLISTO_VERSION=0.44.0
```
# SGE / [PSMN](http://www.ens-lyon.fr/PSMN/doku.php)
To run easily tools on the [PSMN](http://www.ens-lyon.fr/PSMN/doku.php), you need to build your own [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules).
You can read the Contributing guide for the [PMSN/modules](https://gitlab.biologie.ens-lyon.fr/PSMN/modules) project [here](https://gitlab.biologie.ens-lyon.fr/PSMN/modules/blob/master/CONTRIBUTING.md)
# Nextflow
The last step to wrap your tool is to make it available in nextflow. For this you need to create at least 4 files, like the following for Kallisto version `0.44.0`:
```sh
ls -lR src/nf_modules/Kallisto
src/nf_modules/Kallisto/:
total 12
-rw-r--r-- 1 laurent users 866 Jun 18 17:13 kallisto.config
-rw-r--r-- 1 laurent users 2711 Jun 18 17:13 kallisto.nf
drwxr-xr-x 2 laurent users 4096 Jun 18 17:14 tests/
src/nf_modules/Kallisto/tests:
total 16
-rw-r--r-- 1 laurent users 551 Jun 18 17:14 index.nf
-rw-r--r-- 1 laurent users 901 Jun 18 17:14 mapping_paired.nf
-rw-r--r-- 1 laurent users 1037 Jun 18 17:14 mapping_single.nf
-rwxr-xr-x 1 laurent users 627 Jun 18 17:14 tests.sh*
```
The [`kallisto.config`](./src/nf_modules/Kallisto/kallisto.config) file contains instructions for two profiles : `sge` and `docker`.
The [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) file contains nextflow processes to use `Kallisto`.
The [`tests/tests.sh`](./src/nf_modules/Kallisto/tests/tests.sh) script (with executable rights), contains a series of nextflow calls on the other `.nf` files of the [`tests/`](./src/nf_modules/kallisto/tests/) folder. Those tests correspond to execution of the processes present in the [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) file on the [LBMC/tiny_dataset](https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset) dataset with the `docker` profile. You can read the *Running the tests* section of the [README.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/README.md).
## [`kallisto.config`](./src/nf_modules/Kallisto/kallisto.config)
The `.config` file defines the configuration to apply to your process conditionally to the value of the `-profile` option. You must define configuration for at least the `sge` and `docker` profile.
```Groovy
profiles {
docker {
docker.temp = 'auto'
docker.enabled = true
process {
}
}
sge {
process{
}
}
```
### `docker` profile
The `docker` profile starts by enabling docker for the whole pipeline. After that you only have to define the container name for each process:
For example, for `Kallisto` with the version `0.44.0`, we have:
```Groovy
process {
$index_fasta {
container = "kallisto:0.44.0"
}
$mapping_fastq {
container = "kallisto:0.44.0"
}
}
```
### `sge` profile
The `sge` profile defines for each process all the informations necessary to launch your process on a given queue with SGE at the [PSMN](http://www.ens-lyon.fr/PSMN/doku.php).
For example, for `Kallisto`, we have:
```Groovy
process{
$index_fasta {
beforeScript = "module purge; module load Kallisto/0.44.0"
executor = "sge"
cpus = 1
memory = "5GB"
time = "6h"
queueSize = 1000
pollInterval = '60sec'
queue = 'h6-E5-2667v4deb128'
penv = 'openmp8'
}
$mapping_fastq {
beforeScript = "module purge; module load Kallisto/0.44.0"
executor = "sge"
cpus = 4
memory = "5GB"
time = "6h"
queueSize = 1000
pollInterval = '60sec'
queue = 'h6-E5-2667v4deb128'
penv = 'openmp8'
}
}
```
The `beforeScript` variable is executed before the main script for the corresponding process.
## [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf)
The [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) file contains examples of nextflow process that execute Kallisto.
- Each example must be usable as it is to be incorporated in a nextflow pipeline.
- You need to define, default value for the parameters passed to the process.
- Input and output must be clearly defined.
- Your process should be usable as a starting process or a process retrieving the output of another process.
For more informations on processes and channels you can check the [nextflow documentation](https://www.nextflow.io/docs/latest/index.html).
## Making your wrapper available to the LBMC
To make your module available to the LBMC you must have a `tests.sh` script and one or many `docker_init.sh` scripts working without errors.
All the processes in your `.nf` must be covered by the tests.
After pushing your modifications on your forked repository, you can make a Merge Request to the [PSMN/modules](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) **dev** branch. Where it will be tested and integrated to the **master** branch.
You can read more on this process [here](https://guides.github.com/introduction/flow/)
FROM ubuntu:18.04
MAINTAINER Laurent Modolo
ENV KALLISTO_VERSION=0.44.0
ENV PACKAGES curl=7.58.0* \
ca-certificates=20180409
RUN apt-get update && \
apt-get install -y --no-install-recommends ${PACKAGES} && \
apt-get clean
RUN curl -k -L https://github.com/pachterlab/kallisto/releases/download/v${KALLISTO_VERSION}/kallisto_linux-v${KALLISTO_VERSION}.tar.gz -o kallisto_linux-v${KALLISTO_VERSION}.tar.gz && \
tar xzf kallisto_linux-v${KALLISTO_VERSION}.tar.gz && \
cp kallisto_linux-v${KALLISTO_VERSION}/kallisto /usr/bin && \
rm -Rf kallisto_linux-v${KALLISTO_VERSION}*
#!/bin/sh
docker build src/docker_modules/Kallisto/0.44.0 -t 'kallisto:0.44.0'
FROM ubuntu:18.04 FROM ubuntu:18.04
MAINTAINER Laurent Modolo MAINTAINER Laurent Modolo
ENV PACKAGES git=1:2.17.1* \ ENV PACKAGES git=1:2.17* \
build-essential=12.4* \ build-essential=12.4* \
ca-certificates=20180409 \ ca-certificates=20180409 \
zlib1g-dev=1:1.2.11* zlib1g-dev=1:1.2.11*
......
/* /*
* bedtools : * bedtools :
* Imputs : fastq files * Imputs : fasta files
* Output : fastq files * Imputs : bed files
* Output : fasta files
*/ */
/* fasta extraction */ /* fasta extraction */
params.fastq = "$baseDir/data/fasta/*.fasta" params.fasta = "$baseDir/data/fasta/*.fasta"
params.bed = "$baseDir/data/annot/*.bed" params.bed = "$baseDir/data/annot/*.bed"
log.info "fasta file : ${params.fasta}" log.info "fasta file : ${params.fasta}"
......
...@@ -66,7 +66,7 @@ process mapping_fastq { ...@@ -66,7 +66,7 @@ process mapping_fastq {
file index from index_files.toList() file index from index_files.toList()
output: output:
file "*.bam" into bam_files set pair_id, "*.bam" into bam_files
script: script:
""" """
......
...@@ -23,7 +23,7 @@ process mapping_fastq { ...@@ -23,7 +23,7 @@ process mapping_fastq {
file index from index_files.toList() file index from index_files.toList()
output: output:
file "*.bam" into bam_files set pair_id, "*.bam" into bam_files
script: script:
""" """
...@@ -37,3 +37,4 @@ if grep -q "Error" ${pair_id}_bowtie2_report.txt; then ...@@ -37,3 +37,4 @@ if grep -q "Error" ${pair_id}_bowtie2_report.txt; then
fi fi
""" """
} }
...@@ -4,17 +4,17 @@ profiles { ...@@ -4,17 +4,17 @@ profiles {
docker.enabled = true docker.enabled = true
process { process {
$index_fasta { $index_fasta {
container = "kallisto:0.43.1" container = "kallisto:0.44.0"
} }
$mapping_fastq { $mapping_fastq {
container = "kallisto:0.43.1" container = "kallisto:0.44.0"
} }
} }
} }
sge { sge {
process{ process{
$index_fasta { $index_fasta {
beforeScript = "module purge; module load Kallisto/0.43.1" beforeScript = "module purge; module load Kallisto/0.44.0"
executor = "sge" executor = "sge"
cpus = 1 cpus = 1
memory = "5GB" memory = "5GB"
...@@ -25,7 +25,7 @@ profiles { ...@@ -25,7 +25,7 @@ profiles {
penv = 'openmp8' penv = 'openmp8'
} }
$mapping_fastq { $mapping_fastq {
beforeScript = "module purge; module load Kallisto/0.43.1" beforeScript = "module purge; module load Kallisto/0.44.0"
executor = "sge" executor = "sge"
cpus = 4 cpus = 4
memory = "5GB" memory = "5GB"
......
...@@ -58,7 +58,7 @@ process mapping_fastq { ...@@ -58,7 +58,7 @@ process mapping_fastq {
publishDir "results/mapping/quantification/", mode: 'copy' publishDir "results/mapping/quantification/", mode: 'copy'
input: input:
file reads from fastq_files set pair_id, file(reads) from fastq_files
file index from index_files.toList() file index from index_files.toList()
output: output:
...@@ -68,8 +68,8 @@ process mapping_fastq { ...@@ -68,8 +68,8 @@ process mapping_fastq {
""" """
mkdir ${reads[0].baseName} mkdir ${reads[0].baseName}
kallisto quant -i ${index} -t ${task.cpus} \ kallisto quant -i ${index} -t ${task.cpus} \
--bias --bootstrap-samples 100 -o ${reads[0].baseName} \ --bias --bootstrap-samples 100 -o ${pair_id} \
${reads[0]} ${reads[1]} &> ${reads[0].baseName}_kallisto_report.txt ${reads[0]} ${reads[1]} &> ${pair_id}_kallisto_report.txt
""" """
} }
......
...@@ -14,7 +14,7 @@ Channel ...@@ -14,7 +14,7 @@ Channel
.set { index_files } .set { index_files }
process mapping_fastq { process mapping_fastq {
tag "$pair_id" tag "$reads"
cpus 4 cpus 4
publishDir "results/mapping/quantification/", mode: 'copy' publishDir "results/mapping/quantification/", mode: 'copy'
...@@ -27,7 +27,7 @@ process mapping_fastq { ...@@ -27,7 +27,7 @@ process mapping_fastq {
script: script:
""" """
mkdir ${pair_id} mkdir ${reads[0].baseName}
kallisto quant -i ${index} -t ${task.cpus} \ kallisto quant -i ${index} -t ${task.cpus} \
--bias --bootstrap-samples 100 -o ${pair_id} \ --bias --bootstrap-samples 100 -o ${pair_id} \
${reads[0]} ${reads[1]} &> ${pair_id}_kallisto_report.txt ${reads[0]} ${reads[1]} &> ${pair_id}_kallisto_report.txt
......
...@@ -6,14 +6,15 @@ Channel ...@@ -6,14 +6,15 @@ Channel
.set { fastq_files } .set { fastq_files }
process trimming { process trimming {
tag "$pair_id" tag "${reads}"
cpus 4 cpus 4
publishDir "results/fastq/trimming/", mode: 'copy'
input: input:
set pair_id, file(reads) from fastq_files set pair_id, file(reads) from fastq_files
output: output:
file "*_trim_R{1,2}.fastq.gz" into fastq_files_cut set pair_id, "*_trim_R{1,2}.fastq.gz" into fastq_files_trim
script: script:
""" """
...@@ -23,3 +24,4 @@ UrQt --t 20 --m ${task.cpus} --gz \ ...@@ -23,3 +24,4 @@ UrQt --t 20 --m ${task.cpus} --gz \
> ${pair_id}_trimming_report.txt > ${pair_id}_trimming_report.txt
""" """
} }
...@@ -24,17 +24,17 @@ process trimming { ...@@ -24,17 +24,17 @@ process trimming {
publishDir "results/fastq/trimming/", mode: 'copy' publishDir "results/fastq/trimming/", mode: 'copy'
input: input:
file reads from fastq_files set pair_id, file(reads) from fastq_files
output: output:
file "*_trim_R{1,2}.fastq.gz" into fastq_files_trim set pair_id, "*_trim_R{1,2}.fastq.gz" into fastq_files_trim
script: script:
""" """
UrQt --t 20 --m ${task.cpus} --gz \ UrQt --t 20 --m ${task.cpus} --gz \
--in ${reads[0]} --inpair ${reads[1]} \ --in ${reads[0]} --inpair ${reads[1]} \
--out ${reads[0].baseName}_trim_R1.fastq.gz --outpair ${reads[1].baseName}_trim_R2.fastq.gz \ --out ${pair_id}_trim_R1.fastq.gz --outpair ${pair_id}_trim_R2.fastq.gz \
> ${reads[0].baseName}_trimming_report.txt > ${pair_id}_trimming_report.txt
""" """
} }
......
...@@ -27,7 +27,7 @@ process adaptor_removal { ...@@ -27,7 +27,7 @@ process adaptor_removal {
set pair_id, file(reads) from fastq_files set pair_id, file(reads) from fastq_files
output: output:
file "*_cut_R{1,2}.fastq.gz" into fastq_files_cut set pair_id, "*_cut_R{1,2}.fastq.gz" into fastq_files_cut
script: script:
""" """
...@@ -91,7 +91,7 @@ process trimming { ...@@ -91,7 +91,7 @@ process trimming {
set pair_id, file(reads) from fastq_files set pair_id, file(reads) from fastq_files
output: output:
file "*_trim_R{1,2}.fastq.gz" into fastq_files_trim set pair_id, "*_trim_R{1,2}.fastq.gz" into fastq_files_trim
script: script:
""" """
......
...@@ -7,12 +7,13 @@ Channel ...@@ -7,12 +7,13 @@ Channel
process adaptor_removal { process adaptor_removal {
tag "$pair_id" tag "$pair_id"
publishDir "results/fastq/adaptor_removal/", mode: 'copy'
input: input:
set pair_id, file(reads) from fastq_files set pair_id, file(reads) from fastq_files
output: output:
file "*_cut_R{1,2}.fastq.gz" into fastq_files_cut set pair_id, "*_cut_R{1,2}.fastq.gz" into fastq_files_cut
script: script:
""" """
...@@ -21,4 +22,3 @@ process adaptor_removal { ...@@ -21,4 +22,3 @@ process adaptor_removal {
${reads[0]} ${reads[1]} > ${pair_id}_report.txt ${reads[0]} ${reads[1]} > ${pair_id}_report.txt
""" """
} }
...@@ -7,12 +7,13 @@ Channel ...@@ -7,12 +7,13 @@ Channel
process trimming { process trimming {
tag "$pair_id" tag "$pair_id"
publishDir "results/fastq/trimming/", mode: 'copy'
input: input:
set pair_id, file(reads) from fastq_files set pair_id, file(reads) from fastq_files
output: output:
file "*_trim_R{1,2}.fastq.gz" into fastq_files_cut set pair_id, "*_trim_R{1,2}.fastq.gz" into fastq_files_trim
script: script:
""" """
...@@ -21,4 +22,3 @@ process trimming { ...@@ -21,4 +22,3 @@ process trimming {
${reads[0]} ${reads[1]} > ${pair_id}_report.txt ${reads[0]} ${reads[1]} > ${pair_id}_report.txt
""" """
} }
sge_modules @ 03a80f96
Subproject commit 94be868ea503b4810b110b35520d61f129035967 Subproject commit 03a80f96cfe966f0ac855f0ac12a0b39b9ca2064
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment