# bolero bolero is a nextflow pipeline dedicated to analyse Nanopore sequencing coupled to 5'RACE amplification of HBV RNAs. ## Getting the last updates To get the last commits from this repository into your fork use the following commands: ```sh git clone http://gitbio.ens-lyon.fr/xgrand/bolero.git ``` ## Getting Started The pipeline `src/bolero.nf` works a nextflow configuration file `src/nextflow.config`. The typical command for running the pipeline is as follows: `nextflow ./src/bolero.nf -c ./src/nextflow.config -profile singularity` The typical command to obtain help: `nextflow ./src/bolero.nf --help` The arguments of this pipeline are described in the table below: | Arguments | Description | |:---------------------------:|:-------------------------------------------------------------------:| | -c | configuration file. This parameter should always be `src/nextflow.config` | | -profile | The profile to use. This can be **docker** or **singularity** to run the pipeline in docker or singularity container respectively. This can also be **psmn** to launch the analysis on the PSMN | | --input [path] | Path to the folder containing fastq files. If skip basecalling option disabled, path to fast5 files folder. | | --adapt [str] | Sequence of 5'RACE adapter. | | --genome [file] | Path to the fasta file containing the genome. HBV reference sequence preCore available in data folder. | | --skipBC [boolean] | Skip basecalling step. If truen give fastq folder as input. Default: true. | | --flowcell [str] | Nanopore flowcell. Default = FLO-MIN106. | | --kit [str] | Nanopore kit. Default = SQK-PBK004. | | --gpu_mode [str] | Guppy basecaller configuration. Default: false. "gpu" mode is dedicated to NVIDIA Cuda compatible system according to Guppy specifications. | | --min_qscore [float] | Minimum quality score threshold, default = 7.0. | | --gpu_runners_per_device [int] | Number of runner per device, default = 32 (refer to guppy manual). | | --num_callers [int] | Number of callers, default = 16 (refer to guppy manual). | | --chunks_per_runner [int] | Number of chunks per runner, default = 512 (refer to guppy manual). | | --chunks_size [int] | Chunck size, default = 1900 (refer to guppy manual). | | --help --h | Display this help message. | ## Test Bolero 1. simulate 5'RACE sequenced reads: Require pbsim3 software: https://github.com/yukiteruono/pbsim3 To produce a complete transcriptome you can run: ``` path_to_bolero=./bolero path_to_pbsim3=/opt/Programs/pbsim3 mkdir -p 01_basecalling for i in $(seq 1 30) do extract=$(cut -f1 ${path_to_bolero}/data/simulation/expression.transcript_${i}) mkdir 01_basecalling/${extract} ${path_to_pbsim3}/src/pbsim --strategy trans --transcript ${path_to_bolero}/data/simulation/expression.transcript_${i} --id-prefix ${extract} --method errhmm --errhmm ${path_to_pbsim3}/data/ERRHMM-ONT.model mv sd.fastq 01_basecalling/${extract}/${extract}.fastq rm sd.maf gzip 01_basecalling/${extract}/${extract}.fastq done ``` 2. run Bolero: ``` cd <PATH_TO_Bolero> nextflow ./src/bolero.nf -c ./src/nextflow.config -profile <PROFILE> --input <PATH_TO_01_basecalling> ``` ## Reference sequence The HBV reference sequence, genotype D ayw, is available in "data" folder. ## Contributing If you want to add more tools to this project, please read the [CONTRIBUTING.md](CONTRIBUTING.md). ## Authors * **Xavier Grand** - *Maintainer* * **Alia Rifki** - *Contributor* ## License [Info](#){.btn .btn-info} This project is licensed under the CeCiLL License- see the [LICENSE](LICENSE) file for details. [Warning](#){.btn .btn-warning} The optional basecalling and demultiplexing steps may be carried out if necessary but are not executed automatically. To execute these steps, it is essential to adhere to the guidelines provided with the Guppy software from Oxford Nanopore Technologies. ## To Do: * Short-time updates: replace guppy by dorado.