Skip to content
Snippets Groups Projects
Verified Commit 946942bd authored by Laurent Modolo's avatar Laurent Modolo
Browse files

TP_computational_biologists.md

parent 53680ef9
No related branches found
No related tags found
No related merge requests found
...@@ -13,15 +13,15 @@ highlight: tango ...@@ -13,15 +13,15 @@ highlight: tango
The goal of this practical is to learn how to *wrap* tools in [Docker](https://www.docker.com/what-docker) or [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules) to make them available to nextflow on a personal computer or at the [PSMN](http://www.ens-lyon.fr/PSMN/doku.php). The goal of this practical is to learn how to *wrap* tools in [Docker](https://www.docker.com/what-docker) or [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules) to make them available to nextflow on a personal computer or at the [PSMN](http://www.ens-lyon.fr/PSMN/doku.php).
Here we assume that you followed the [TP for experimental biologists](./TP_experimental_biologists.md), and that you know the basics on [Docker containers](https://www.docker.com/what-container) and [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules) usage. We are also going to assume that you know how to build and use a nextflow pipeline from the template [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow). Here we assume that you followed the [TP for experimental biologists](./TP_experimental_biologists.md), and that you know the basics of [Docker containers](https://www.docker.com/what-container) and [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules). We are also going to assume that you know how to build and use a nextflow pipeline from the template [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow).
For the practical you can either work with the WebIDE of Gitlab, or locally as described in [git : the basis formation](https://gitlab.biologie.ens-lyon.fr/formations/git_basis). For the practical you can either work with the WebIDE of Gitlab, or locally as described in the [git: basis formation](https://gitlab.biologie.ens-lyon.fr/formations/git_basis).
# Docker # Docker
To run a tool within a [Docker container](https://www.docker.com/what-container) you need to write a `Dockerfile`. To run a tool within a [Docker container](https://www.docker.com/what-container) you need to write a `Dockerfile`.
[`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) are found in the [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) project under `src/docker_modules/`. Each [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) are paired with a [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) file like following the example for `Kallisto` version `0.43.1`: [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) are found in the [pipelines/nextflow](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) project under `src/docker_modules/`. Each [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) is paired with a [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) file like following the example for `Kallisto` version `0.43.1`:
```sh ```sh
$ ls -l src/docker_modules/Kallisto/0.43.1/ $ ls -l src/docker_modules/Kallisto/0.43.1/
...@@ -33,7 +33,10 @@ drwxr-xr-x 3 laurent users 4.0K Jun 6 09:49 ../ ...@@ -33,7 +33,10 @@ drwxr-xr-x 3 laurent users 4.0K Jun 6 09:49 ../
``` ```
## [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) ## [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh)
The [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) is a simple sh script with the executable right (`chmod +x`).By executing this script, the user creates the [Docker container](https://www.docker.com/what-container) for the tools in a specific version. You can check the [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) file of any implemented tools as a template. Remember that the name of the [container](https://www.docker.com/what-container) must be in lower case. The [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) is a simple sh script with executable rights (`chmod +x`). By executing this script, the user creates a [Docker container](https://www.docker.com/what-container) with the tool installed a specific version. You can check the [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) file of any implemented tools as a template.
Remember that the name of the [container](https://www.docker.com/what-container) must be in lower case and in the format `<tool_name>:<version>`.
For tools without a version number you can use a commit hash instead.
## [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) ## [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile)
...@@ -48,29 +51,29 @@ MAINTAINER Laurent Modolo ...@@ -48,29 +51,29 @@ MAINTAINER Laurent Modolo
ENV KALLISTO_VERSION=0.44.0 ENV KALLISTO_VERSION=0.44.0
``` ```
This means that we initialize the [container](https://www.docker.com/what-container) from a bare installation of Ubuntu 18.04. You can check the ubuntu available versions [here](https://hub.docker.com/_/ubuntu/) or others operating systems like [debian](https://hub.docker.com/_/debian/) or [worst](https://hub.docker.com/r/microsoft/windowsservercore/). The `FROM` instruction means that the [container](https://www.docker.com/what-container) is initialized from a bare installation of Ubuntu 18.04. You can check the versions of Ubuntu available [here](https://hub.docker.com/_/ubuntu/) or others operating systems like [debian](https://hub.docker.com/_/debian/) or [worst](https://hub.docker.com/r/microsoft/windowsservercore/).
Then we declare the *maintainer* of the container. Before declaring an environment variable for the container named `KALLISTO_VERSION`, which contains the version of the tool wrapped. This this bash variable will be declared for the user root within the [container](https://www.docker.com/what-container).
Then we declare the *maintainer* of the container. Before declaring a environment variable for the container named `KALLISTO_VERSION` which contains the version of the tools wrapped. This means that this bash variable will be declared within the [container](https://www.docker.com/what-container). You should always declare a variable `TOOLSNAME_VERSION` that contains the version number of commit number of the tools you wrap. In simple cases you just have to modify this line to create a new `Dockerfile` for another version of the tool.
You should always declare a variable `TOOLSNAME_VERSION` that contains the version number of commit number of the tools you wrap. Therefore in simple case you just have to modify this line to create a new `Dockerfile` for another version of the tool. The following lines of the [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) are a succession of `bash` commands executed as the **root** user within the container.
Each `RUN` block is run sequentially by `Docker`. If there is an error or modifications in a `RUN` block, only this block and the following `RUN` will be executed.
You can learn more about the building of Docker containers [here](https://docs.docker.com/engine/reference/builder/#usage).
The following of the [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile) is a succession of `bash` commands executed as the **root** user within the container. When you build your [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile), instead of launching many times the [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) script to tests your [container](https://www.docker.com/what-container), you can connect to a base container in interactive mode to launch tests your commands.
When you build your [`Dockerfile`](./src/docker_modules/Kallisto/0.44.0/Dockerfile), instead of launching many times the [`docker_init.sh`](./src/docker_modules/Kallisto/0.44.0/docker_init.sh) script you can connect to a base container in interactive mode to launch tests your commands.
```sh ```sh
docker run -it ubuntu:18.04 bash docker run -it ubuntu:18.04 bash
KALLISTO_VERSION=0.44.0 KALLISTO_VERSION=0.44.0
``` ```
Each `RUN` block is run sequentially by `Docker`. If there is an error or modifications in a `RUN` block, only this block and the following `RUN` will be executed. # SGE / [PSMN](http://www.ens-lyon.fr/PSMN/doku.php)
You can learn more about the building of Docker containers [here](https://docs.docker.com/engine/reference/builder/#usage).
# SGE To run easily tools on the [PSMN](http://www.ens-lyon.fr/PSMN/doku.php), you need to build your own [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules).
To run easily tools on the PSMN, you need to build your own [Environment Module](http://www.ens-lyon.fr/PSMN/doku.php?id=documentation:tools:modules). You can read the Contributing guide for the [PMSN/modules](https://gitlab.biologie.ens-lyon.fr/PSMN/modules) project [here](https://gitlab.biologie.ens-lyon.fr/PSMN/modules/blob/master/CONTRIBUTING.md)
You can read the Contributing guide of the [PMSN/modules](https://gitlab.biologie.ens-lyon.fr/PSMN/modules) [here](https://gitlab.biologie.ens-lyon.fr/PSMN/modules/blob/master/CONTRIBUTING.md)
# Nextflow # Nextflow
...@@ -95,7 +98,7 @@ total 16 ...@@ -95,7 +98,7 @@ total 16
The [`kallisto.config`](./src/nf_modules/Kallisto/kallisto.config) file contains instructions for two profiles : `sge` and `docker`. The [`kallisto.config`](./src/nf_modules/Kallisto/kallisto.config) file contains instructions for two profiles : `sge` and `docker`.
The [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) file contains nextflow processes to use `Kallisto`. The [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) file contains nextflow processes to use `Kallisto`.
The [`tests/tests.sh`](./src/nf_modules/Kallisto/tests/tests.sh) script, contains a series of nextflow calls on the other `.nf` files of the [`tests/`](./src/nf_modules/kallisto/tests/) folder. Those tests correspond to execution of the processes present in the [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) file on the [LBMC/tiny_dataset](https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset) dataset with the `docker` profile. You can read the *Running the tests* section of the [README.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/README.md). The [`tests/tests.sh`](./src/nf_modules/Kallisto/tests/tests.sh) script (with executable rights), contains a series of nextflow calls on the other `.nf` files of the [`tests/`](./src/nf_modules/kallisto/tests/) folder. Those tests correspond to execution of the processes present in the [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) file on the [LBMC/tiny_dataset](https://gitlab.biologie.ens-lyon.fr/LBMC/tiny_dataset) dataset with the `docker` profile. You can read the *Running the tests* section of the [README.md](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow/blob/master/README.md).
## [`kallisto.config`](./src/nf_modules/Kallisto/kallisto.config) ## [`kallisto.config`](./src/nf_modules/Kallisto/kallisto.config)
...@@ -117,8 +120,8 @@ profiles { ...@@ -117,8 +120,8 @@ profiles {
### `docker` profile ### `docker` profile
The `docker` profile start by enabling docker for the whole pipeline. After that you only have to define the container name of each process: The `docker` profile starts by enabling docker for the whole pipeline. After that you only have to define the container name for each process:
For example, for `Kallisto`, we have: For example, for `Kallisto` with the version `0.44.0`, we have:
```Groovy ```Groovy
process { process {
...@@ -133,7 +136,7 @@ process { ...@@ -133,7 +136,7 @@ process {
### `sge` profile ### `sge` profile
The `sge` profile define for each process all the information necessary to launch your process on a give queue at the PSMN. The `sge` profile defines for each process all the informations necessary to launch your process on a given queue with SGE at the [PSMN](http://www.ens-lyon.fr/PSMN/doku.php).
For example, for `Kallisto`, we have: For example, for `Kallisto`, we have:
```Groovy ```Groovy
...@@ -163,23 +166,23 @@ process{ ...@@ -163,23 +166,23 @@ process{
} }
``` ```
The `beforeScript` variable is executed before the main script of the corresponding process. The `beforeScript` variable is executed before the main script for the corresponding process.
## [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) ## [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf)
The [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) file contains examples of nextflow process that execute Kallisto. The [`kallisto.nf`](./src/nf_modules/Kallisto/kallisto.nf) file contains examples of nextflow process that execute Kallisto.
- Each example must be usable as is to be incorporated in a nextflow pipeline. - Each example must be usable as it is to be incorporated in a nextflow pipeline.
- You need to define, default value for the parameters passed to the process. - You need to define, default value for the parameters passed to the process.
- Input and output must be clearly defined. - Input and output must be clearly defined.
- Your process usable as a starting process or a process retrieving the output of another process. - Your process should be usable as a starting process or a process retrieving the output of another process.
For more informations on processes and channels you can check the [nextflow documentation](https://www.nextflow.io/docs/latest/index.html). For more informations on processes and channels you can check the [nextflow documentation](https://www.nextflow.io/docs/latest/index.html).
## Making your wrapper available to the LBMC ## Making your wrapper available to the LBMC
To make your module available to the LBMC you must have a `tests.sh` script and one or many `docker_init.sh` scripts working without errors . To make your module available to the LBMC you must have a `tests.sh` script and one or many `docker_init.sh` scripts working without errors.
All the processes in your `.nf` must be covered by the tests.
Then after pushing your modification on your forked repository, you can make a Merge Request to the [PSMN/modules](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) **dev** branch. Where it will be tested and After pushing your modifications on your forked repository, you can make a Merge Request to the [PSMN/modules](https://gitlab.biologie.ens-lyon.fr/pipelines/nextflow) **dev** branch. Where it will be tested and integrated to the **master** branch.
integrated to the **master** branch.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment