diff --git a/doc/TP_computational_biologists.md b/doc/TP_computational_biologists.md index 881c25c53532a75a213ff143dce47d638662f9c9..09b9574e5bb2171799e987225389ab4cb39f891b 100644 --- a/doc/TP_computational_biologists.md +++ b/doc/TP_computational_biologists.md @@ -13,15 +13,15 @@ highlight: tango The goal of this practical is to learn how to *wrap* tools in [Docker](https://www.docker.com/what-docker) or [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules) to make them available to nextflow on a personal computer or at the [PSMN](http://www.ens-lyon.fr/PSMN/doku.php). -Here we assume that you followed the [TP for experimental biologists](./TP_experimental_biologists.md), and that you know the basics on [Docker containers](https://www.docker.com/what-container) and [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules) usage. We are also going to assume that you know how to build and use a nextflow pipeline from the template [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow). +Here we assume that you followed the [TP for experimental biologists](./TP_experimental_biologists.md), and that you know the basics of [Docker containers](https://www.docker.com/what-container) and [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules). We are also going to assume that you know how to build and use a nextflow pipeline from the template [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow). -For the practical you can either work with the WebIDE of Gitlab, or locally as described in [git : the basis formation](https://gitlab.biologie.ens-lyon.fr/formations/git_basis). +For the practical you can either work with the WebIDE of Gitlab, or locally as described in the [git: basis formation](https://gitlab.biologie.ens-lyon.fr/formations/git_basis). # Docker To run a tool within a [Docker container](https://www.docker.com/what-container) you need to write a `Dockerfile`. -[`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) are found in the [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) project under `src/docker_modules/`. Each [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) are paired with a [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) file like following the example for `Kallisto` version `0.43.1`: +[`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) are found in the [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) project under `src/docker_modules/`. Each [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) is paired with a [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) file like following the example for `Kallisto` version `0.43.1`: ```sh $ ls -l src/docker_modules/Kallisto/0.43.1/ @@ -33,7 +33,10 @@ drwxr-xr-x 3 laurent users 4.0K Jun 6 09:49 ../ ``` ## [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) -The [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) is a simple sh script with the executable right (`chmod +x`).By executing this script, the user creates the [Docker container](https://www.docker.com/what-container) for the tools in a specific version. You can check the [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) file of any implemented tools as a template. Remember that the name of the [container](https://www.docker.com/what-container) must be in lower case. +The [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) is a simple sh script with executable rights (`chmod +x`). By executing this script, the user creates a [Docker container](https://www.docker.com/what-container) with the tool installed a specific version. You can check the [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) file of any implemented tools as a template. + +Remember that the name of the [container](https://www.docker.com/what-container) must be in lower case and in the format `<tool_name>:<version>`. +For tools without a version number you can use a commit hash instead. ## [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) @@ -48,29 +51,29 @@ MAINTAINER Laurent Modolo ENV KALLISTO_VERSION=0.44.0 ``` -This means that we initialize the [container](https://www.docker.com/what-container) from a bare installation of Ubuntu 18.04. You can check the ubuntu available versions [here](https://hub.docker.com/_/ubuntu/) or others operating systems like [debian](https://hub.docker.com/_/debian/) or [worst](https://hub.docker.com/r/microsoft/windowsservercore/). +The `FROM` instruction means that the [container](https://www.docker.com/what-container) is initialized from a bare installation of Ubuntu 18.04. You can check the versions of Ubuntu available [here](https://hub.docker.com/_/ubuntu/) or others operating systems like [debian](https://hub.docker.com/_/debian/) or [worst](https://hub.docker.com/r/microsoft/windowsservercore/). + +Then we declare the *maintainer* of the container. Before declaring an environment variable for the container named `KALLISTO_VERSION`, which contains the version of the tool wrapped. This this bash variable will be declared for the user root within the [container](https://www.docker.com/what-container). + +You should always declare a variable `TOOLSNAME_VERSION` that contains the version number of commit number of the tools you wrap. In simple cases you just have to modify this line to create a new `Dockerfile` for another version of the tool. -Then we declare the *maintainer* of the container. Before declaring a environment variable for the container named `KALLISTO_VERSION` which contains the version of the tools wrapped. This means that this bash variable will be declared within the [container](https://www.docker.com/what-container). +The following lines of the [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) are a succession of `bash` commands executed as the **root** user within the container. +Each `RUN` block is run sequentially by `Docker`. If there is an error or modifications in a `RUN` block, only this block and the following `RUN` will be executed. -You should always declare a variable `TOOLSNAME_VERSION` that contains the version number of commit number of the tools you wrap. Therefore in simple case you just have to modify this line to create a new `Dockerfile` for another version of the tool. +You can learn more about the building of Docker containers [here](https://docs.docker.com/engine/reference/builder/#usage). -The following of the [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) is a succession of `bash` commands executed as the **root** user within the container. -When you build your [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile), instead of launching many times the [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) script you can connect to a base container in interactive mode to launch tests your commands. +When you build your [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile), instead of launching many times the [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) script to tests your [container](https://www.docker.com/what-container), you can connect to a base container in interactive mode to launch tests your commands. ```sh docker run -it ubuntu:18.04 bash KALLISTO_VERSION=0.44.0 ``` -Each `RUN` block is run sequentially by `Docker`. If there is an error or modifications in a `RUN` block, only this block and the following `RUN` will be executed. +# SGE / [PSMN](http://www.ens-lyon.fr/PSMN/doku.php) -You can learn more about the building of Docker containers [here](https://docs.docker.com/engine/reference/builder/#usage). - -# SGE - -To run easily tools on the PSMN, you need to build your own [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules). +To run easily tools on the [PSMN](http://www.ens-lyon.fr/PSMN/doku.php), you need to build your own [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules). -You can read the Contributing guide of the [PMSN/modules](https://gitlab.biologie.ens-lyon.fr/PSMN/modules) [here](https://gitlab.biologie.ens-lyon.fr/PSMN/modules/blob/master/CONTRIBUTING.md) +You can read the Contributing guide for the [PMSN/modules](https://gitlab.biologie.ens-lyon.fr/PSMN/modules) project [here](https://gitlab.biologie.ens-lyon.fr/PSMN/modules/blob/master/CONTRIBUTING.md) # Nextflow @@ -95,7 +98,7 @@ total 16 The [`kallisto.config`](./src/nf_modules/Kallisto/kallisto.config) file contains instructions for two profiles : `sge` and `docker`. The [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) file contains nextflow processes to use `Kallisto`. -The [`tests/tests.sh`](./src/nf_modules/Kallisto/tests/tests.sh) script, contains a series of nextflow calls on the other `.nf` files of the [`tests/`](./src/nf_modules/kallisto/tests/) folder. Those tests correspond to execution of the processes present in the [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) file on the [LBMC/tiny_dataset](https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset) dataset with the `docker` profile. You can read the *Running the tests* section of the [README.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/README.md). +The [`tests/tests.sh`](./src/nf_modules/Kallisto/tests/tests.sh) script (with executable rights), contains a series of nextflow calls on the other `.nf` files of the [`tests/`](./src/nf_modules/kallisto/tests/) folder. Those tests correspond to execution of the processes present in the [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) file on the [LBMC/tiny_dataset](https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset) dataset with the `docker` profile. You can read the *Running the tests* section of the [README.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/README.md). ## [`kallisto.config`](./src/nf_modules/Kallisto/kallisto.config) @@ -117,8 +120,8 @@ profiles { ### `docker` profile -The `docker` profile start by enabling docker for the whole pipeline. After that you only have to define the container name of each process: -For example, for `Kallisto`, we have: +The `docker` profile starts by enabling docker for the whole pipeline. After that you only have to define the container name for each process: +For example, for `Kallisto` with the version `0.44.0`, we have: ```Groovy process { @@ -133,7 +136,7 @@ process { ### `sge` profile -The `sge` profile define for each process all the information necessary to launch your process on a give queue at the PSMN. +The `sge` profile defines for each process all the informations necessary to launch your process on a given queue with SGE at the [PSMN](http://www.ens-lyon.fr/PSMN/doku.php). For example, for `Kallisto`, we have: ```Groovy @@ -163,23 +166,25 @@ process{ } ``` -The `beforeScript` variable is executed before the main script of the corresponding process. +The `beforeScript` variable is executed before the main script for the corresponding process. ## [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) The [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) file contains examples of nextflow process that execute Kallisto. -- Each example must be usable as is to be incorporated in a nextflow pipeline. +- Each example must be usable as it is to be incorporated in a nextflow pipeline. - You need to define, default value for the parameters passed to the process. - Input and output must be clearly defined. -- Your process usable as a starting process or a process retrieving the output of another process. +- Your process should be usable as a starting process or a process retrieving the output of another process. For more informations on processes and channels you can check the [nextflow documentation](https://www.nextflow.io/docs/latest/index.html). ## Making your wrapper available to the LBMC -To make your module available to the LBMC you must have a `tests.sh` script and one or many `docker_init.sh` scripts working without errors . +To make your module available to the LBMC you must have a `tests.sh` script and one or many `docker_init.sh` scripts working without errors. +All the processes in your `.nf` must be covered by the tests. + +After pushing your modifications on your forked repository, you can make a Merge Request to the [PSMN/modules](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) **dev** branch. Where it will be tested and integrated to the **master** branch. -Then after pushing your modification on your forked repository, you can make a Merge Request to the [PSMN/modules](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) **dev** branch. Where it will be tested and -integrated to the **master** branch. +You can read more on this process [here](https://guides.github.com/introduction/flow/) diff --git a/src/Rnaseq.config b/src/Rnaseq.config new file mode 100644 index 0000000000000000000000000000000000000000..1170cb83eb825ee111311c118e77de57c1c55dd7 --- /dev/null +++ b/src/Rnaseq.config @@ -0,0 +1,121 @@ +profiles { + docker { + docker.temp = 'auto' + docker.enabled = true + process { + $adaptor_removal { + container = "cutadapt:1.14" + } + } + } + sge { + process{ + $adaptor_removal { + beforeScript = "module purge; module load cutadapt/1.14" + executor = "sge" + cpus = 1 + memory = "5GB" + time = "6h" + queueSize = 1000 + pollInterval = '60sec' + queue = 'h6-E5-2667v4deb128' + penv = 'openmp8' + } + } + } +} + +profiles { + docker { + docker.temp = 'auto' + docker.enabled = true + process { + $trimming { + container = "urqt:d62c1f8" + } + } + } + sge { + process{ + $trimming { + beforeScript = "module purge; module load UrQt/d62c1f8" + executor = "sge" + cpus = 4 + memory = "5GB" + time = "6h" + queueSize = 1000 + pollInterval = '60sec' + queue = 'h6-E5-2667v4deb128' + penv = 'openmp8' + } + } + } +} +profiles { + docker { + docker.temp = 'auto' + docker.enabled = true + process { + $fasta_from_bed { + container = "bedtools:2.25.0" + } + } + } + sge { + process{ + $fasta_from_bed { + beforeScript = "module purge; module load BEDtools/2.25.0" + executor = "sge" + cpus = 1 + memory = "5GB" + time = "6h" + queueSize = 1000 + pollInterval = '60sec' + queue = 'h6-E5-2667v4deb128' + penv = 'openmp8' + } + } + } +} + +profiles { + docker { + docker.temp = 'auto' + docker.enabled = true + process { + $index_fasta { + container = "kallisto:0.43.1" + } + $mapping_fastq { + container = "kallisto:0.43.1" + } + } + } + sge { + process{ + $index_fasta { + beforeScript = "module purge; module load Kallisto/0.43.1" + executor = "sge" + cpus = 1 + memory = "5GB" + time = "6h" + queueSize = 1000 + pollInterval = '60sec' + queue = 'h6-E5-2667v4deb128' + penv = 'openmp8' + } + $mapping_fastq { + beforeScript = "module purge; module load Kallisto/0.43.1" + executor = "sge" + cpus = 4 + memory = "5GB" + time = "6h" + queueSize = 1000 + pollInterval = '60sec' + queue = 'h6-E5-2667v4deb128' + penv = 'openmp8' + } + } + } +} + diff --git a/src/Rnaseq.nf b/src/Rnaseq.nf new file mode 100644 index 0000000000000000000000000000000000000000..1b59e883aa31c4925197b580791e19d739b9190d --- /dev/null +++ b/src/Rnaseq.nf @@ -0,0 +1,122 @@ +params.fastq = "$baseDir/data/fastq/*_{1,2}.fastq" +params.fasta = "$baseDir/data/fasta/*.fasta" +params.bed = "$baseDir/data/annot/*.bed" + +log.info "fasta file : ${params.fasta}" +log.info "bed file : ${params.bed}" +log.info "fastq files : ${params.fastq}" + +Channel + .fromPath( params.fasta ) + .ifEmpty { error "Cannot find any fasta files matching: ${params.fasta}" } + .set { fasta_files } + +Channel + .fromPath( params.bed ) + .ifEmpty { error "Cannot find any bed files matching: ${params.bed}" } + .set { bed_files } + +Channel + .fromFilePairs( params.fastq ) + .ifEmpty { error "Cannot find any fastq files matching: ${params.fastq}" } + .set { fastq_files } + +process adaptor_removal { + tag "$pair_id" + publishDir "results/fastq/adaptor_removal/", mode: 'copy' + + input: + set pair_id, file(reads) from fastq_files + + output: + file "*_cut_R{1,2}.fastq.gz" into fastq_files_cut + + script: + """ + cutadapt -a AGATCGGAAGAG -g CTCTTCCGATCT -A AGATCGGAAGAG -G CTCTTCCGATCT \ + -o ${pair_id}_cut_R1.fastq.gz -p ${pair_id}_cut_R2.fastq.gz \ + ${reads[0]} ${reads[1]} > ${pair_id}_report.txt + """ +} + +process trimming { + tag "${reads}" + cpus 4 + publishDir "results/fastq/trimming/", mode: 'copy' + + input: + file reads from fastq_files_cut + + output: + file "*_trim_R{1,2}.fastq.gz" into fastq_files_trim + + script: +""" +UrQt --t 20 --m ${task.cpus} --gz \ +--in ${reads[0]} --inpair ${reads[1]} \ +--out ${reads[0].baseName}_trim_R1.fastq.gz --outpair ${reads[1].baseName}_trim_R2.fastq.gz \ +> ${reads[0].baseName}_trimming_report.txt +""" +} + +process fasta_from_bed { + tag "${bed.baseName}" + cpus 4 + publishDir "results/fasta/", mode: 'copy' + + input: + file fasta from fasta_files + file bed from bed_files + + output: + file "*_extracted.fasta" into fasta_files_extracted + + script: +""" +bedtools getfasta -name \ +-fi ${fasta} -bed ${bed} -fo ${bed.baseName}_extracted.fasta +""" +} + +process index_fasta { + tag "$fasta.baseName" + publishDir "results/mapping/index/", mode: 'copy' + + input: + file fasta from fasta_files_extracted + + output: + file "*.index*" into index_files + + script: +""" +kallisto index -k 31 --make-unique -i ${fasta.baseName}.index ${fasta} \ +> ${fasta.baseName}_kallisto_report.txt +""" +} + + +process mapping_fastq { + tag "$reads" + cpus 4 + publishDir "results/mapping/quantification/", mode: 'copy' + + input: + file reads from fastq_files_trim + file index from index_files + + output: + file "*" into counts_files + + script: +""" +mkdir ${reads[0].baseName} +kallisto quant -i ${index} -t ${task.cpus} \ +--bias --bootstrap-samples 100 -o ${reads[0].baseName} \ +${reads[0]} ${reads[1]} &> ${reads[0].baseName}_kallisto_report.txt +""" +} + + + + diff --git a/src/docker_modules/HISAT2/2.0.0/Dockerfile b/src/docker_modules/HISAT2/2.0.0/Dockerfile new file mode 100644 index 0000000000000000000000000000000000000000..8b253adc6eb8ef62b89563ab932c5bd89c8afe5d --- /dev/null +++ b/src/docker_modules/HISAT2/2.0.0/Dockerfile @@ -0,0 +1,23 @@ +FROM ubuntu:18.04 +MAINTAINER Nicolas Fontrodona + +ENV HISAT2_VERSION=2.0.0 +ENV PACKAGES unzip=6.0* \ + gcc=4:7.3.0* \ + g++=4:7.3.0* \ + make=4.1* \ + curl=7.58.0* \ + ca-certificates=20180409 + +RUN apt-get update && \ + apt-get install -y --no-install-recommends ${PACKAGES} && \ + apt-get clean + +RUN curl -k -L http://ccb.jhu.edu/software/hisat2/downloads/hisat2-${HISAT2_VERSION}-beta-source.zip -o hisat2_linux-v${HISAT2_VERSION}.zip && \ +unzip hisat2_linux-v${HISAT2_VERSION}.zip && \ +cd hisat2-${HISAT2_VERSION}-beta && \ +make && \ +cp hisat2 /usr/bin && \ +cp hisat2-* /usr/bin && \ +rm -Rf hisat2-${HISAT2_VERSION}-beta + diff --git a/src/docker_modules/HISAT2/2.0.0/docker_init.sh b/src/docker_modules/HISAT2/2.0.0/docker_init.sh new file mode 100644 index 0000000000000000000000000000000000000000..f497a67af04b382913f516d2b0fa78c4a00a0d26 --- /dev/null +++ b/src/docker_modules/HISAT2/2.0.0/docker_init.sh @@ -0,0 +1,2 @@ +#!/bin/sh +docker build src/docker_modules/HISAT2/2.0.0 -t 'hisat2:2.0.0' diff --git a/src/fasta_sampler.nf b/src/fasta_sampler.nf new file mode 100644 index 0000000000000000000000000000000000000000..d1200ed496c77756cde525835f581b71b2528990 --- /dev/null +++ b/src/fasta_sampler.nf @@ -0,0 +1,18 @@ +Channel + .fromPath( "data/tiny_dataset/fasta/*.fasta" ) + .set { fasta_file } + +process sample_fasta { + publishDir "results/sampling/", mode: 'copy' + + input: +file fasta from fasta_file + + output: +file "*_sample.fasta" into fasta_sample + + script: +""" +head ${fasta} > ${fasta.baseName}_sample.fasta +""" +}