Skip to content
Snippets Groups Projects

bolero

bolero is a nextflow pipeline dedicated to analyse Nanopore sequencing coupled to 5'RACE amplification of HBV RNAs.

Getting the last updates

To get the last commits from this repository into your fork use the following commands:

git clone http://gitbio.ens-lyon.fr/xgrand/bolero.git

Getting Started

The pipeline src/bolero.nf works a nextflow configuration file src/nextflow.config. The typical command for running the pipeline is as follows: nextflow ./src/bolero.nf -c ./src/nextflow.config -profile singularity

The typical command to obtain help: nextflow ./src/bolero.nf --help

The arguments of this pipeline are described in the table below:

Arguments Description
-c configuration file. This parameter should always be src/nextflow.config
-profile The profile to use. This can be docker or singularity to run the pipeline in docker or singularity container respectively. This can also be psmn to launch the analysis on the PSMN
--input [path] Path to the folder containing fastq files. If skip basecalling option disabled, path to fast5 files folder.
--adapt [str] Sequence of 5'RACE adapter.
--genome [file] Path to the fasta file containing the genome. HBV reference sequence preCore available in data folder.
--skipBC [boolean] Skip basecalling step. If truen give fastq folder as input. Default: true.
--flowcell [str] Nanopore flowcell. Default = FLO-MIN106.
--kit [str] Nanopore kit. Default = SQK-PBK004.
--gpu_mode [str] Guppy basecaller configuration. Default: false.
"gpu" mode is dedicated to NVIDIA Cuda compatible system according to Guppy specifications.
--min_qscore [float] Minimum quality score threshold, default = 7.0.
--gpu_runners_per_device [int] Number of runner per device, default = 32 (refer to guppy manual).
--num_callers [int] Number of callers, default = 16 (refer to guppy manual).
--chunks_per_runner [int] Number of chunks per runner, default = 512 (refer to guppy manual).
--chunks_size [int] Chunck size, default = 1900 (refer to guppy manual).
--help --h Display this help message.

Test Bolero

  1. simulate 5'RACE sequenced reads: Require pbsim3 software: https://github.com/yukiteruono/pbsim3

To produce a complete transcriptome you can run:

path_to_bolero=./bolero
path_to_pbsim3=/opt/Programs/pbsim3
mkdir -p 01_basecalling
for i in $(seq 1 30)
do
    extract=$(cut -f1 ${path_to_bolero}/data/simulation/expression.transcript_${i})
    mkdir 01_basecalling/${extract}
    ${path_to_pbsim3}/src/pbsim --strategy trans --transcript ${path_to_bolero}/data/simulation/expression.transcript_${i} --id-prefix ${extract} --method errhmm --errhmm ${path_to_pbsim3}/data/ERRHMM-ONT.model
    mv sd.fastq 01_basecalling/${extract}/${extract}.fastq
    rm sd.maf
    gzip 01_basecalling/${extract}/${extract}.fastq
done
  1. run Bolero:
cd <PATH_TO_Bolero>
nextflow ./src/bolero.nf -c ./src/nextflow.config -profile <PROFILE> --input <PATH_TO_01_basecalling>

Reference sequence

The HBV reference sequence, genotype D ayw, is available in "data" folder.

Contributing

If you want to add more tools to this project, please read the CONTRIBUTING.md.

Authors

  • Xavier Grand - Maintainer
  • Alia Rifki - Contributor

License

Info{.btn .btn-info} This project is licensed under the CeCiLL License- see the LICENSE file for details.

Warning{.btn .btn-warning} The optional basecalling and demultiplexing steps may be carried out if necessary but are not executed automatically. To execute these steps, it is essential to adhere to the guidelines provided with the Guppy software from Oxford Nanopore Technologies.

To Do:

  • Give the user the possibility to choose the basecalling configuration file