bolero
bolero is a nextflow pipeline dedicated to analyse Nanopore sequencing coupled to 5'RACE amplification of HBV RNAs.
Getting the last updates
To get the last commits from this repository into your fork use the following commands:
git clone http://gitbio.ens-lyon.fr/xgrand/bolero.git
Getting Started
The pipeline src/bolero.nf
works a nextflow configuration file src/nextflow.config
.
The typical command for running the pipeline is as follows:
nextflow ./src/bolero.nf -c ./src/nextflow.config -profile singularity
The typical command to obtain help:
nextflow ./src/bolero.nf --help
The arguments of this pipeline are described in the table below:
Arguments | Description |
---|---|
-c | configuration file. This parameter should always be src/nextflow.config
|
-profile | The profile to use. This can be docker or singularity to run the pipeline in docker or singularity container respectively. This can also be psmn to launch the analysis on the PSMN |
--input [path] | Path to the folder containing fastq files. If skip basecalling option disabled, path to fast5 files folder. |
--adapt [str] | Sequence of 5'RACE adapter. |
--genome [file] | Path to the fasta file containing the genome. HBV reference sequence preCore available in data folder. |
--skipBC [boolean] | Skip basecalling step. If truen give fastq folder as input. Default: true. |
--flowcell [str] | Nanopore flowcell. Default = FLO-MIN106. |
--kit [str] | Nanopore kit. Default = SQK-PBK004. |
--gpu_mode [str] | Guppy basecaller configuration. Default: false. |
"gpu" mode is dedicated to NVIDIA Cuda compatible system according to Guppy specifications. | |
--min_qscore [float] | Minimum quality score threshold, default = 7.0. |
--gpu_runners_per_device [int] | Number of runner per device, default = 32 (refer to guppy manual). |
--num_callers [int] | Number of callers, default = 16 (refer to guppy manual). |
--chunks_per_runner [int] | Number of chunks per runner, default = 512 (refer to guppy manual). |
--chunks_size [int] | Chunck size, default = 1900 (refer to guppy manual). |
--help --h | Display this help message. |
Test Bolero
- simulate 5'RACE sequenced reads: Require pbsim3 software: https://github.com/yukiteruono/pbsim3
To produce a complete transcriptome you can run:
path_to_bolero=./bolero
path_to_pbsim3=/opt/Programs/pbsim3
mkdir -p 01_basecalling
for i in $(seq 1 30)
do
extract=$(cut -f1 ${path_to_bolero}/data/simulation/expression.transcript_${i})
mkdir 01_basecalling/${extract}
${path_to_pbsim3}/src/pbsim --strategy trans --transcript ${path_to_bolero}/data/simulation/expression.transcript_${i} --id-prefix ${extract} --method errhmm --errhmm ${path_to_pbsim3}/data/ERRHMM-ONT.model
mv sd.fastq 01_basecalling/${extract}/${extract}.fastq
rm sd.maf
gzip 01_basecalling/${extract}/${extract}.fastq
done
- run Bolero:
cd <PATH_TO_Bolero>
nextflow ./src/bolero.nf -c ./src/nextflow.config -profile <PROFILE> --input <PATH_TO_01_basecalling>
Reference sequence
The HBV reference sequence, genotype D ayw, is available in "data" folder.
Contributing
If you want to add more tools to this project, please read the CONTRIBUTING.md.
Authors
- Xavier Grand - Maintainer
- Alia Rifki - Contributor
License
Info{.btn .btn-info} This project is licensed under the CeCiLL License- see the LICENSE file for details.
Warning{.btn .btn-warning} The optional basecalling and demultiplexing steps may be carried out if necessary but are not executed automatically. To execute these steps, it is essential to adhere to the guidelines provided with the Guppy software from Oxford Nanopore Technologies.
To Do:
- Give the user the possibility to choose the basecalling configuration file