# bolero

bolero is a nextflow pipeline dedicated to analyse Nanopore sequencing coupled to 5'RACE amplification of HBV RNAs.

## Getting the last updates

To get the last commits from this repository into your fork use the following commands:

```sh
git clone http://gitbio.ens-lyon.fr/xgrand/bolero.git
```

## Getting Started

The pipeline `src/bolero.nf` works a nextflow configuration file `src/nextflow.config`.
The typical command for running the pipeline is as follows:
`nextflow ./src/bolero.nf -c ./src/nextflow.config -profile singularity`

The typical command to obtain help:
`nextflow ./src/bolero.nf --help`

The arguments of this pipeline are described in the table below:

|         Arguments           |                             Description                             | 
|:---------------------------:|:-------------------------------------------------------------------:| 
| -c | configuration file. This parameter should always be `src/nextflow.config`        | 
| -profile      | The profile to use. This can be **docker** or **singularity** to run the pipeline in docker or singularity container respectively. This can also be **psmn** to launch the analysis on the PSMN | 
| --input [path] | Path to the folder containing fastq files. If skip basecalling option disabled, path to fast5 files folder. |
| --adapt [str] | Sequence of 5'RACE adapter. |
| --genome [file] | Path to the fasta file containing the genome. HBV reference sequence preCore available in data folder. |
| --skipBC [boolean] | Skip basecalling step. If truen give fastq folder as input. Default: true. |
| --flowcell [str] | Nanopore flowcell. Default = FLO-MIN106. |
| --kit [str] | Nanopore kit. Default = SQK-PBK004. |
| --gpu_mode [str] | Guppy basecaller configuration. Default: false. 
"gpu" mode is dedicated to NVIDIA Cuda compatible system according to Guppy specifications. |
| --min_qscore [float] | Minimum quality score threshold, default = 7.0. |
| --gpu_runners_per_device [int] | Number of runner per device, default = 32 (refer to guppy manual). |
| --num_callers [int] | Number of callers, default = 16 (refer to guppy manual). |
| --chunks_per_runner [int] | Number of chunks per runner, default = 512 (refer to guppy manual). |
| --chunks_size [int] | Chunck size, default = 1900 (refer to guppy manual). |
| --help --h | Display this help message. |

## Test Bolero

1. simulate 5'RACE sequenced reads:
Require pbsim3 software: https://github.com/yukiteruono/pbsim3

To produce a complete transcriptome you can run:
```
path_to_bolero=./bolero
path_to_pbsim3=/opt/Programs/pbsim3
mkdir -p 01_basecalling
for i in $(seq 1 30)
do
    extract=$(cut -f1 ${path_to_bolero}/data/simulation/expression.transcript_${i})
    mkdir 01_basecalling/${extract}
    ${path_to_pbsim3}/src/pbsim --strategy trans --transcript ${path_to_bolero}/data/simulation/expression.transcript_${i} --id-prefix ${extract} --method errhmm --errhmm ${path_to_pbsim3}/data/ERRHMM-ONT.model
    mv sd.fastq 01_basecalling/${extract}/${extract}.fastq
    rm sd.maf
    gzip 01_basecalling/${extract}/${extract}.fastq
done
```

2. run Bolero:
```
cd <PATH_TO_Bolero>
nextflow ./src/bolero.nf -c ./src/nextflow.config -profile <PROFILE> --input <PATH_TO_01_basecalling>
```

## Reference sequence

The HBV reference sequence, genotype D ayw, is available in "data" folder.

## Contributing

If you want to add more tools to this project, please read the [CONTRIBUTING.md](CONTRIBUTING.md).

## Authors

* **Xavier Grand** - *Maintainer*
* **Alia Rifki** - *Contributor*

## License
[Info](#){.btn .btn-info}
This project is licensed under the CeCiLL License- see the [LICENSE](LICENSE) file for details.

[Warning](#){.btn .btn-warning}
The optional basecalling and demultiplexing steps may be carried out if necessary but are not executed automatically. 
To execute these steps, it is essential to adhere to the guidelines provided with the Guppy software from Oxford Nanopore Technologies.

## To Do:

* Short-time updates: replace guppy by dorado.