Skip to content
Snippets Groups Projects
Forked from LBMC / nextflow
1213 commits behind the upstream repository.
TP_computational_biologists.md 6.67 KiB
title: "TP for computational biologists"
author: Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)
date: 20 Jun 2018
output:
pdf_document:
toc: true
toc_depth: 3
    number_sections: true
highlight: tango
    latex_engine: xelatex

The goal of this practical is to learn how to wrap tools in Docker or Environment Module to make them available to nextflow on a personal computer or at the PSMN.

Here we assume that you followed the TP for experimental biologists, and that you know the basics on Docker containers and Environment Module usage. We also are going to assume that you know how to build and use a nextflow pipeline from the template pipelines/nextflow.

For the practical you can either work with the WebIDE of Gitlab, or locally as described in git : the basis formation.

Docker

To run a tools within a Docker container you need to write a Dockerfile.

Dockerfile are found in the pipelines/nextflow project under src/docker_modules/. Each Dockerfile are paired with a docker_init.sh file like following example for Kallisto version 0.43.1:

$ ls -l src/docker_modules/Kallisto/0.43.1/
total 16K                                                                        
drwxr-xr-x 2 laurent users 4.0K Jun  5 19:06 ./                                  
drwxr-xr-x 3 laurent users 4.0K Jun  6 09:49 ../                                 
-rw-r--r-- 1 laurent users  587 Jun  5 19:06 Dockerfile                          
-rwxr-xr-x 1 laurent users   79 Jun  5 19:06 docker_init.sh*                     

docker_init.sh

The docker_init.sh is a simple sh script with the executable right (chmod +x).By executing this script, the user creates the Docker container for the tools in a specific version. You can check the docker_init.sh file of any implemented tools as a template. Remember that the name of the container must be in lower case.

Dockerfile

The recipe to wrap your tool in a Docker container is written in a Dockerfile file.

For Kallisto version 0.44.0 the header of the Dockerfile is :

FROM ubuntu:18.04
MAINTAINER Laurent Modolo

ENV KALLISTO_VERSION=0.44.0

This means that we initialize the container from a bare installation of Ubuntu 18.04. You can check the ubuntu available versions here or others operating systems like debian or worst.

Then we declare the maintainer of the container. Before declaring a environment variable for the container named KALLISTO_VERSION which contains the version of the tools wrapped. This means that this bash variable will be declared within the container.

You should always declare a variable TOOLSNAME_VERSION that contains the version number of commit number of the tools you wrap. Therefore in simple case you just have to modify this line to create a new Dockerfile for another version of the tool.

The following of the Dockerfile is a succession of bash commands executed as the root user within the container. When you build your Dockerfile, instead of launching many time the docker_init.sh script you can connect to a base container in interactive mode to launch tests your commands.

docker run -it ubuntu:18.04 bash
KALLISTO_VERSION=0.44.0

Each RUN block is run sequentially by Docker. If there is an error or modifications in a RUN block, only this block and the following RUN will be executed.

You can learn more about the building of Docker containers here.

SGE

To run easily tools on the PSMN, you need to build your own Environment Module.

You can read the Contributing guide of the PMSN/modules here

Nextflow

The last step to wrap your tool, is to make it available in nextflow. For this you need to create at least 4 files, like the following for Kallisto version 0.44.0:

ls -lR src/nf_modules/Kallisto
src/nf_modules/Kallisto/:
total 12
-rw-r--r-- 1 laurent users  866 Jun 18 17:13 kallisto.config
-rw-r--r-- 1 laurent users 2711 Jun 18 17:13 kallisto.nf
drwxr-xr-x 2 laurent users 4096 Jun 18 17:14 tests/

src/nf_modules/Kallisto/tests:
total 16
-rw-r--r-- 1 laurent users  551 Jun 18 17:14 index.nf
-rw-r--r-- 1 laurent users  901 Jun 18 17:14 mapping_paired.nf
-rw-r--r-- 1 laurent users 1037 Jun 18 17:14 mapping_single.nf
-rwxr-xr-x 1 laurent users  627 Jun 18 17:14 tests.sh*

The kallisto.config file contains intruction for two profiles : sge and docker. The kallisto.nf file contains nextflow processes to use Kallisto.

The tests/tests.sh script, contains a serie of nextflow calls on the other .nf files of the tests/ folder. Those tests correspond to execution of the processes present in the kallisto.nf file on the LBMC/tiny_dataset dataset. You can read the Running the tests section of the README.md.