diff --git a/CHANGELOG.md b/CHANGELOG.md index 0a2d45fe9c8d83de81d5859fd4836d3746170d73..fcf522d7d2038c7f1fd424fb82478a325f3db82e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,134 +3,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## v1.3.0 - 2021-22-05 +## v1.3.0 - [date] -* Change the `/tmp/` folder to `./tmp/` folder so that all tmp files are now in the work directory (#24) -* Add `--hicpro_maps` options to generate the raw and normalized HiC-Pro maps. The default is now to use cooler -* Add chromosome compartments calling with cooltools (#53) -* Add HiCExplorer distance decay quality control (#54) -* Add HiCExplorer TADs calling (#55) -* Add insulation score TADs calling (#55) -* Generate cooler/txt contact maps -* Normalize Hi-C data with cooler instead of iced -* New `--digestion` parameter to automatically set the restriction_site and ligation_site motifs -* New `--keep_multi` and `keep_dup` options. Default: false -* Template update for nf-core/tools -* Minor fix to summary log messages in pipeline header - -### `Fixed` - -* Fix bug in stats report which were not all correcly exported in the results folder -* Fix recurrent bug in input file extension (#86) -* Fix bug in `--bin_size` parameter (#85) -* `--min_mapq` is ignored if `--keep_multi` is used - -### `Deprecated` - -* `--rm_dup` and `--rm_multi` are replaced by `--keep_dups` and `--keep_multi` - -## v1.2.2 - 2020-09-02 +Initial release of nf-core/hic, created with the [nf-core](https://nf-co.re/) template. ### `Added` -* Template update for nf-core/tools v1.10.2 -* Add the `--fastq_chunks_size` to specify the number of reads per chunks if split_fastq is true - -### `Fixed` - -* Bug in `--split_fastq` option not recognized - -## v1.2.1 - 2020-07-06 - -### `Fixed` - -* Fix issue with `--fasta` option and `.fa` extension (#66) - -## v1.2.0 - 2020-06-18 - -### `Added` - -* Bump v1.2.0 -* Merge template nf-core 1.9 -* Move some options to camel_case -* Update python scripts for python3 -* Update conda environment file - * python base `2.7.15` > `3.7.6` - * pip `19.1` > `20.0.1` - * scipy `1.2.1` > `1.4.1` - * numpy `1.16.3` > `1.18.1` - * bx-python `0.8.2` > `0.8.8` - * pysam `0.15.2` > `0.15.4` - * cooler `0.8.5` > `0.8.6` - * multiqc `1.7` > `1.8` - * iced `0.5.1` > `0.5.6` - * *_New_* pymdown-extensions `7.1` - * *_New_* hicexplorer `3.4.3` - * *_New_* bioconductor-hitc `1.32.0` - * *_New_* r-optparse `1.6.6` - * *_New_* ucsc-bedgraphtobigwig `377` - * *_New_* cython `0.29.19` - * *_New_* cooltools `0.3.2` - * *_New_* fanc `0.8.30` - * *_Removed_* r-markdown - ### `Fixed` -* Fix error in doc for Arima kit usage -* Sort output of `get_valid_interaction` process as the input files of `remove_duplicates` -are expected to be sorted (sort -m) +### `Dependencies` ### `Deprecated` - -* Command line options converted to `camel_case`: - * `--skipMaps` > `--skip_maps` - * `--skipIce` > `--skip_ice` - * `--skipCool` > `--skip_cool` - * `--skipMultiQC` > `--skip_multiqc` - * `--saveReference` > `--save_reference` - * `--saveAlignedIntermediates` > `--save_aligned_intermediates` - * `--saveInteractionBAM` > `--save_interaction_bam` - -## v1.1.1 - 2020-04-02 - -### `Fixed` - -* Fix bug in tag. Remove '[' - -## v1.1.0 - 2019-10-15 - -### `Added` - -* Update hicpro2higlass with `-p` parameter -* Support 'N' base motif in restriction/ligation sites -* Support multiple restriction enzymes/ligattion sites (comma separated) ([#31](https://github.com/nf-core/hic/issues/31)) -* Add --saveInteractionBAM option -* Add DOI ([#29](https://github.com/nf-core/hic/issues/29)) -* Update manual ([#28](https://github.com/nf-core/hic/issues/28)) - -### `Fixed` - -* Fix bug for reads extension `_1`/`_2` ([#30](https://github.com/nf-core/hic/issues/30)) - -## v1.0 - [2019-05-06] - -Initial release of nf-core/hic, created with the [nf-core](http://nf-co.re/) template. - -### `Added` - -First version of nf-core Hi-C pipeline which is a Nextflow implementation of -the [HiC-Pro pipeline](https://github.com/nservant/HiC-Pro/). -Note that all HiC-Pro functionalities are not yet all implemented. -The current version supports most protocols including Hi-C, in situ Hi-C, -DNase Hi-C, Micro-C, capture-C or HiChip data. - -In summary, this version allows : - -* Automatic detection and generation of annotation files based on igenomes -if not provided. -* Two-steps alignment of raw sequencing reads -* Reads filtering and detection of valid interaction products -* Generation of raw contact matrices for a set of resolutions -* Normalization of the contact maps using the ICE algorithm -* Generation of cooler file for visualization on [higlass](https://higlass.io/) -* Quality report based on HiC-Pro MultiQC module diff --git a/README.md b/README.md index cb88454ceec117d40a3f43251d85a67078085b99..3dd2b4b3fe48f429a3918f596631749fec36dd95 100644 --- a/README.md +++ b/README.md @@ -1,99 +1,85 @@ #  -**Analysis of Chromosome Conformation Capture data (Hi-C)**. +[](https://github.com/nf-core/hic/actions?query=workflow%3A%22nf-core+CI%22) +[](https://github.com/nf-core/hic/actions?query=workflow%3A%22nf-core+linting%22) +[](https://nf-co.re/hic/results) +[](https://doi.org/10.5281/zenodo.XXXXXXX) -[](https://github.com/nf-core/hic/actions) -[](https://github.com/nf-core/hic/actions) -[](https://www.nextflow.io/) +[](https://www.nextflow.io/) +[](https://docs.conda.io/en/latest/) +[](https://www.docker.com/) +[](https://sylabs.io/docs/) -[](https://bioconda.github.io/) -[](https://hub.docker.com/r/nfcore/hic) - -[](https://doi.org/10.5281/zenodo.2669513) -[](https://nfcore.slack.com/channels/hic) +[](https://nfcore.slack.com/channels/hic) +[](https://twitter.com/nf_core) +[](https://www.youtube.com/c/nf-core) ## Introduction -This pipeline was originally set up from the -[HiC-Pro workflow](https://github.com/nservant/HiC-Pro). -It was designed to process Hi-C data from raw FastQ files (paired-end Illumina -data) to normalized contact maps. -The current version supports most protocols, including digestion protocols as -well as protocols that do not require restriction enzymes such as DNase Hi-C. -In practice, this workflow was successfully applied to many data-sets including -dilution Hi-C, in situ Hi-C, DNase Hi-C, Micro-C, capture-C, capture Hi-C or -HiChip data. - -Contact maps are generated in standard formats including HiC-Pro, and cooler for -downstream analysis and visualization. -Addition analysis steps such as compartments and TADs calling are also available. - -The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool -to run tasks across multiple compute infrastructures in a very portable manner. -It comes with docker / singularity containers making installation trivial and -results highly reproducible. +<!-- TODO nf-core: Write a 1-2 sentence summary of what data the pipeline is for and what it does --> +**nf-core/hic** is a bioinformatics best-practice analysis pipeline for Analysis of Chromosome Conformation Capture data (Hi-C). + +The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community! + +<!-- TODO nf-core: Add full-sized test dataset and amend the paragraph below if applicable --> +On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/hic/results). ## Pipeline summary -1. HiC-Pro data processing ([`HiC-Pro`](https://github.com/nservant/HiC-Pro)) - 1. Mapping using a two steps strategy to rescue reads spanning the ligation - sites ([`bowtie2`](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)) - 2. Detection of valid interaction products - 3. Duplicates removal - 4. Generate raw and normalized contact maps ([`iced`](https://github.com/hiclib/iced)) -2. Create genome-wide contact maps at various resolutions ([`cooler`](https://github.com/open2c/cooler)) -3. Contact maps normalization using balancing algorithm ([`cooler`](https://github.com/open2c/cooler)) -4. Export to various contact maps formats ([`HiC-Pro`](https://github.com/nservant/HiC-Pro), [`cooler`](https://github.com/open2c/cooler)) -5. Quality controls ([`HiC-Pro`](https://github.com/nservant/HiC-Pro), [`HiCExplorer`](https://github.com/deeptools/HiCExplorer)) -6. Compartments calling ([`cooltools`](https://cooltools.readthedocs.io/en/latest/)) -7. TADs calling ([`HiCExplorer`](https://github.com/deeptools/HiCExplorer), [`cooltools`](https://cooltools.readthedocs.io/en/latest/)) -8. Quality control report ([`MultiQC`](https://multiqc.info/)) +<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline --> + +1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) +2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/)) ## Quick Start -1. Install [`nextflow`](https://nf-co.re/usage/installation) (`>=20.04.0`) +1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=21.04.0`) 2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_ -3. Download the pipeline and test it on a minimal dataset with a single command +3. Download the pipeline and test it on a minimal dataset with a single command: - ```bash + ```console nextflow run nf-core/hic -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute> ``` - > Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) - to see if a custom config file to run nf-core pipelines already exists for your Institute. - If so, you can simply use `-profile <institute>` in your command. - This will enable either `docker` or `singularity` and set the appropriate execution - settings for your local compute environment. + > * Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. + > * If you are using `singularity` then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the `--singularity_pull_docker_container` parameter to pull and convert the Docker image instead. Alternatively, it is highly recommended to use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to pre-download all of the required containers before running the pipeline and to set the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options to be able to store and re-use the images from a central location for future pipeline runs. + > * If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs. 4. Start running your own analysis! - ```bash - nextflow run nf-core/hic -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --input '*_R{1,2}.fastq.gz' --genome GRCh37 + <!-- TODO nf-core: Update the example "typical command" below used to run the pipeline --> + + ```console + nextflow run nf-core/hic -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --input samplesheet.csv --genome GRCh37 ``` ## Documentation -The nf-core/hic pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/hic/usage) and [output](https://nf-co.re/hic/output). - -For further information or help, don't hesitate to get in touch on [Slack](https://nfcore.slack.com/channels/hic). -You can join with [this invite](https://nf-co.re/join/slack). +The nf-core/hic pipeline comes with documentation about the pipeline [usage](https://nf-co.re/hic/usage), [parameters](https://nf-co.re/hic/parameters) and [output](https://nf-co.re/hic/output). ## Credits nf-core/hic was originally written by Nicolas Servant. +We thank the following people for their extensive assistance in the development of this pipeline: + +<!-- TODO nf-core: If applicable, make list of people who have also contributed --> + ## Contributions and Support If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). For further information or help, don't hesitate to get in touch on the [Slack `#hic` channel](https://nfcore.slack.com/channels/hic) (you can join with [this invite](https://nf-co.re/join/slack)). -## Citation +## Citations + +<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. --> +<!-- If you use nf-core/hic for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) --> -If you use nf-core/hic for your analysis, please cite it using the following -doi: [10.5281/zenodo.2669513](https://doi.org/10.5281/zenodo.2669513) +<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline --> +An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. You can cite the `nf-core` publication as follows: @@ -102,11 +88,3 @@ You can cite the `nf-core` publication as follows: > Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. > > _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). - -In addition, references of tools and data used in this pipeline are as follows: - -> **HiC-Pro: An optimized and flexible pipeline for Hi-C processing.** -> -> Nicolas Servant, Nelle Varoquaux, Bryan R. Lajoie, Eric Viara, Chongjian Chen, Jean-Philippe Vert, Job Dekker, Edith Heard, Emmanuel Barillot. -> -> Genome Biology 2015, 16:259 doi: [10.1186/s13059-015-0831-x](https://dx.doi.org/10.1186/s13059-015-0831-x) diff --git a/assets/samplesheet.csv b/assets/samplesheet.csv new file mode 100644 index 0000000000000000000000000000000000000000..5f653ab7bfc86c905b720d2bb8708646bb66366e --- /dev/null +++ b/assets/samplesheet.csv @@ -0,0 +1,3 @@ +sample,fastq_1,fastq_2 +SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz +SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz, diff --git a/assets/schema_input.json b/assets/schema_input.json new file mode 100644 index 0000000000000000000000000000000000000000..1c3f0f7be19d1d007f3a5ab083adf91cb5cf72dc --- /dev/null +++ b/assets/schema_input.json @@ -0,0 +1,39 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/hic/master/assets/schema_input.json", + "title": "nf-core/hic pipeline - params.input schema", + "description": "Schema for the file provided with params.input", + "type": "array", + "items": { + "type": "object", + "properties": { + "sample": { + "type": "string", + "pattern": "^\\S+$", + "errorMessage": "Sample name must be provided and cannot contain spaces" + }, + "fastq_1": { + "type": "string", + "pattern": "^\\S+\\.f(ast)?q\\.gz$", + "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'" + }, + "fastq_2": { + "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'", + "anyOf": [ + { + "type": "string", + "pattern": "^\\S+\\.f(ast)?q\\.gz$" + }, + { + "type": "string", + "maxLength": 0 + } + ] + } + }, + "required": [ + "sample", + "fastq_1" + ] + } +} diff --git a/assets/sendmail_template.txt b/assets/sendmail_template.txt index bdf905878111122e2d6b6983f72e6a04c78e97b5..6213286c8a3c027fb1fb4496c938fbe3ed7909d3 100644 --- a/assets/sendmail_template.txt +++ b/assets/sendmail_template.txt @@ -15,15 +15,15 @@ Content-ID: <nfcorepipelinelogo> Content-Disposition: inline; filename="nf-core-hic_logo.png" <% out << new File("$projectDir/assets/nf-core-hic_logo.png"). - bytes. - encodeBase64(). - toString(). - tokenize( '\n' )*. - toList()*. - collate( 76 )*. - collect { it.join() }. - flatten(). - join( '\n' ) %> + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' ) %> <% if (mqcFile){ @@ -37,15 +37,15 @@ Content-ID: <mqcreport> Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\" ${mqcFileObj. - bytes. - encodeBase64(). - toString(). - tokenize( '\n' )*. - toList()*. - collate( 76 )*. - collect { it.join() }. - flatten(). - join( '\n' )} + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' )} """ }} %> diff --git a/bin/check_samplesheet.py b/bin/check_samplesheet.py new file mode 100755 index 0000000000000000000000000000000000000000..9f6aaa35eeecaa04a4f33b9b8bdf0aa303936362 --- /dev/null +++ b/bin/check_samplesheet.py @@ -0,0 +1,146 @@ +#!/usr/bin/env python + +# TODO nf-core: Update the script to check the samplesheet +# This script is based on the example at: https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv + +import os +import sys +import errno +import argparse + + +def parse_args(args=None): + Description = "Reformat nf-core/hic samplesheet file and check its contents." + Epilog = "Example usage: python check_samplesheet.py <FILE_IN> <FILE_OUT>" + + parser = argparse.ArgumentParser(description=Description, epilog=Epilog) + parser.add_argument("FILE_IN", help="Input samplesheet file.") + parser.add_argument("FILE_OUT", help="Output file.") + return parser.parse_args(args) + + +def make_dir(path): + if len(path) > 0: + try: + os.makedirs(path) + except OSError as exception: + if exception.errno != errno.EEXIST: + raise exception + + +def print_error(error, context="Line", context_str=""): + error_str = "ERROR: Please check samplesheet -> {}".format(error) + if context != "" and context_str != "": + error_str = "ERROR: Please check samplesheet -> {}\n{}: '{}'".format( + error, context.strip(), context_str.strip() + ) + print(error_str) + sys.exit(1) + + +# TODO nf-core: Update the check_samplesheet function +def check_samplesheet(file_in, file_out): + """ + This function checks that the samplesheet follows the following structure: + + sample,fastq_1,fastq_2 + SAMPLE_PE,SAMPLE_PE_RUN1_1.fastq.gz,SAMPLE_PE_RUN1_2.fastq.gz + SAMPLE_PE,SAMPLE_PE_RUN2_1.fastq.gz,SAMPLE_PE_RUN2_2.fastq.gz + SAMPLE_SE,SAMPLE_SE_RUN1_1.fastq.gz, + + For an example see: + https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv + """ + + sample_mapping_dict = {} + with open(file_in, "r") as fin: + + ## Check header + MIN_COLS = 2 + # TODO nf-core: Update the column names for the input samplesheet + HEADER = ["sample", "fastq_1", "fastq_2"] + header = [x.strip('"') for x in fin.readline().strip().split(",")] + if header[: len(HEADER)] != HEADER: + print("ERROR: Please check samplesheet header -> {} != {}".format(",".join(header), ",".join(HEADER))) + sys.exit(1) + + ## Check sample entries + for line in fin: + lspl = [x.strip().strip('"') for x in line.strip().split(",")] + + # Check valid number of columns per row + if len(lspl) < len(HEADER): + print_error( + "Invalid number of columns (minimum = {})!".format(len(HEADER)), + "Line", + line, + ) + num_cols = len([x for x in lspl if x]) + if num_cols < MIN_COLS: + print_error( + "Invalid number of populated columns (minimum = {})!".format(MIN_COLS), + "Line", + line, + ) + + ## Check sample name entries + sample, fastq_1, fastq_2 = lspl[: len(HEADER)] + sample = sample.replace(" ", "_") + if not sample: + print_error("Sample entry has not been specified!", "Line", line) + + ## Check FastQ file extension + for fastq in [fastq_1, fastq_2]: + if fastq: + if fastq.find(" ") != -1: + print_error("FastQ file contains spaces!", "Line", line) + if not fastq.endswith(".fastq.gz") and not fastq.endswith(".fq.gz"): + print_error( + "FastQ file does not have extension '.fastq.gz' or '.fq.gz'!", + "Line", + line, + ) + + ## Auto-detect paired-end/single-end + sample_info = [] ## [single_end, fastq_1, fastq_2] + if sample and fastq_1 and fastq_2: ## Paired-end short reads + sample_info = ["0", fastq_1, fastq_2] + elif sample and fastq_1 and not fastq_2: ## Single-end short reads + sample_info = ["1", fastq_1, fastq_2] + else: + print_error("Invalid combination of columns provided!", "Line", line) + + ## Create sample mapping dictionary = { sample: [ single_end, fastq_1, fastq_2 ] } + if sample not in sample_mapping_dict: + sample_mapping_dict[sample] = [sample_info] + else: + if sample_info in sample_mapping_dict[sample]: + print_error("Samplesheet contains duplicate rows!", "Line", line) + else: + sample_mapping_dict[sample].append(sample_info) + + ## Write validated samplesheet with appropriate columns + if len(sample_mapping_dict) > 0: + out_dir = os.path.dirname(file_out) + make_dir(out_dir) + with open(file_out, "w") as fout: + fout.write(",".join(["sample", "single_end", "fastq_1", "fastq_2"]) + "\n") + for sample in sorted(sample_mapping_dict.keys()): + + ## Check that multiple runs of the same sample are of the same datatype + if not all(x[0] == sample_mapping_dict[sample][0][0] for x in sample_mapping_dict[sample]): + print_error("Multiple runs of a sample must be of the same datatype!", "Sample: {}".format(sample)) + + for idx, val in enumerate(sample_mapping_dict[sample]): + fout.write(",".join(["{}_T{}".format(sample, idx + 1)] + val) + "\n") + else: + print_error("No entries to process!", "Samplesheet: {}".format(file_in)) + + +def main(args=None): + args = parse_args(args) + check_samplesheet(args.FILE_IN, args.FILE_OUT) + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index 5ff3fcfe270923ed0aeeec220e82a348a529b3e4..322c97c6fc74194e352853b6f5c8f33827a1cbd9 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -1,40 +1,18 @@ #!/usr/bin/env python from __future__ import print_function -from collections import OrderedDict -import re +import os -# Add additional regexes for new tools in process get_software_versions -regexes = { - 'nf-core/hic': ['v_pipeline.txt', r"(\S+)"], - 'Nextflow': ['v_nextflow.txt', r"(\S+)"], - 'Bowtie2': ['v_bowtie2.txt', r"bowtie2-align-s version (\S+)"], - 'Python': ['v_python.txt', r"Python (\S+)"], - 'Samtools': ['v_samtools.txt', r"samtools (\S+)"], - 'MultiQC': ['v_multiqc.txt', r"multiqc, version (\S+)"], -} -results = OrderedDict() -results['nf-core/hic'] = '<span style="color:#999999;\">N/A</span>' -results['Nextflow'] = '<span style="color:#999999;\">N/A</span>' -results['Bowtie2'] = '<span style="color:#999999;\">N/A</span>' -results['Python'] = '<span style="color:#999999;\">N/A</span>' -results['Samtools'] = '<span style="color:#999999;\">N/A</span>' -results['MultiQC'] = '<span style="color:#999999;\">N/A</span>' +results = {} +version_files = [x for x in os.listdir(".") if x.endswith(".version.txt")] +for version_file in version_files: -# Search each file using its regex -for k, v in regexes.items(): - try: - with open(v[0]) as x: - versions = x.read() - match = re.search(v[1], versions) - if match: - results[k] = "v{}".format(match.group(1)) - except IOError: - results[k] = False + software = version_file.replace(".version.txt", "") + if software == "pipeline": + software = "nf-core/hic" -# Remove software set to false in results -for k in list(results): - if not results[k]: - del results[k] + with open(version_file) as fin: + version = fin.read().strip() + results[software] = version # Dump to YAML print( @@ -48,11 +26,11 @@ data: | <dl class="dl-horizontal"> """ ) -for k, v in results.items(): +for k, v in sorted(results.items()): print(" <dt>{}</dt><dd><samp>{}</samp></dd>".format(k, v)) print(" </dl>") -# Write out regexes as csv file: -with open("software_versions.csv", "w") as f: - for k, v in results.items(): +# Write out as tsv file: +with open("software_versions.tsv", "w") as f: + for k, v in sorted(results.items()): f.write("{}\t{}\n".format(k, v)) diff --git a/conf/base.config b/conf/base.config index ddec1a8507ded18a2d81923bc87daea40963c346..2e7db0c1db87efe04da28a0914de2e9130a1c112 100644 --- a/conf/base.config +++ b/conf/base.config @@ -1,46 +1,57 @@ /* - * ------------------------------------------------- - * nf-core/hic Nextflow base config file - * ------------------------------------------------- - * A 'blank slate' config file, appropriate for general - * use on most high performace compute environments. - * Assumes that all software is installed and available - * on the PATH. Runs in `local` mode - all jobs will be - * run on the logged in environment. - */ +======================================================================================== + nf-core/hic Nextflow base config file +======================================================================================== + A 'blank slate' config file, appropriate for general use on most high performance + compute environments. Assumes that all software is installed and available on + the PATH. Runs in `local` mode - all jobs will be run on the logged in environment. +---------------------------------------------------------------------------------------- +*/ process { - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 7.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } - errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } - maxRetries = 1 - maxErrors = '-1' + // TODO nf-core: Check the defaults for all processes + cpus = { check_max( 1 * task.attempt, 'cpus' ) } + memory = { check_max( 6.GB * task.attempt, 'memory' ) } + time = { check_max( 4.h * task.attempt, 'time' ) } - withLabel:process_low { - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 4.GB * task.attempt, 'memory' ) } - time = { check_max( 6.h * task.attempt, 'time' ) } - } - withLabel:process_medium { - cpus = { check_max( 4 * task.attempt, 'cpus' ) } - memory = { check_max( 8.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } - } - withLabel:process_high { - cpus = { check_max( 8 * task.attempt, 'cpus' ) } - memory = { check_max( 64.GB * task.attempt, 'memory' ) } - time = { check_max( 10.h * task.attempt, 'time' ) } - } - withLabel:process_long { - time = { check_max( 20.h * task.attempt, 'time' ) } - } - withLabel:process_highmem { - memory = { check_max( 12.GB * task.attempt, 'memory' ) } - } - withName:get_software_versions { - cache = false - } + errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } + maxRetries = 1 + maxErrors = '-1' + // Process-specific resource requirements + // NOTE - Please try and re-use the labels below as much as possible. + // These labels are used and recognised by default in DSL2 files hosted on nf-core/modules. + // If possible, it would be nice to keep the same label naming convention when + // adding in your local modules too. + // TODO nf-core: Customise requirements for specific processes. + // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors + withLabel:process_low { + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 12.GB * task.attempt, 'memory' ) } + time = { check_max( 4.h * task.attempt, 'time' ) } + } + withLabel:process_medium { + cpus = { check_max( 6 * task.attempt, 'cpus' ) } + memory = { check_max( 36.GB * task.attempt, 'memory' ) } + time = { check_max( 8.h * task.attempt, 'time' ) } + } + withLabel:process_high { + cpus = { check_max( 12 * task.attempt, 'cpus' ) } + memory = { check_max( 72.GB * task.attempt, 'memory' ) } + time = { check_max( 16.h * task.attempt, 'time' ) } + } + withLabel:process_long { + time = { check_max( 20.h * task.attempt, 'time' ) } + } + withLabel:process_high_memory { + memory = { check_max( 200.GB * task.attempt, 'memory' ) } + } + withLabel:error_ignore { + errorStrategy = 'ignore' + } + withLabel:error_retry { + errorStrategy = 'retry' + maxRetries = 2 + } } diff --git a/conf/igenomes.config b/conf/igenomes.config index 1ba2588593f4e1940dc0bf3a3380f0114a71684e..855948def19c408aa81e133a0b297e3d2a1cc299 100644 --- a/conf/igenomes.config +++ b/conf/igenomes.config @@ -1,162 +1,432 @@ /* - * ------------------------------------------------- - * Nextflow config file for iGenomes paths - * ------------------------------------------------- - * Defines reference genomes, using iGenome paths - * Can be used by any config that customises the base - * path using $params.igenomes_base / --igenomes_base - */ +======================================================================================== + Nextflow config file for iGenomes paths +======================================================================================== + Defines reference genomes using iGenome paths. + Can be used by any config that customises the base path using: + $params.igenomes_base / --igenomes_base +---------------------------------------------------------------------------------------- +*/ params { - // illumina iGenomes reference file paths - genomes { - 'GRCh37' { - fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/" + // illumina iGenomes reference file paths + genomes { + 'GRCh37' { + fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/GRCh37-blacklist.bed" + } + 'GRCh38' { + fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed" + } + 'GRCm38' { + fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "1.87e9" + blacklist = "${projectDir}/assets/blacklists/GRCm38-blacklist.bed" + } + 'TAIR10' { + fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt" + mito_name = "Mt" + } + 'EB2' { + fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt" + } + 'UMD3.1' { + fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt" + mito_name = "MT" + } + 'WBcel235' { + fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed" + mito_name = "MtDNA" + macs_gsize = "9e7" + } + 'CanFam3.1' { + fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt" + mito_name = "MT" + } + 'GRCz10' { + fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'BDGP6' { + fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed" + mito_name = "M" + macs_gsize = "1.2e8" + } + 'EquCab2' { + fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt" + mito_name = "MT" + } + 'EB1' { + fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt" + } + 'Galgal4' { + fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'Gm01' { + fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt" + } + 'Mmul_1' { + fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt" + mito_name = "MT" + } + 'IRGSP-1.0' { + fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed" + mito_name = "Mt" + } + 'CHIMP2.1.4' { + fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt" + mito_name = "MT" + } + 'Rnor_5.0' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'Rnor_6.0' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'R64-1-1' { + fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed" + mito_name = "MT" + macs_gsize = "1.2e7" + } + 'EF2' { + fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "1.21e7" + } + 'Sbi1' { + fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/README.txt" + } + 'Sscrofa10.2' { + fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/README.txt" + mito_name = "MT" + } + 'AGPv3' { + fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed" + mito_name = "Mt" + } + 'hg38' { + fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed" + } + 'hg19' { + fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/hg19-blacklist.bed" + } + 'mm10' { + fasta = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "1.87e9" + blacklist = "${projectDir}/assets/blacklists/mm10-blacklist.bed" + } + 'bosTau8' { + fasta = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.bed" + mito_name = "chrM" + } + 'ce10' { + fasta = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "9e7" + } + 'canFam3' { + fasta = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/README.txt" + mito_name = "chrM" + } + 'danRer10' { + fasta = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "1.37e9" + } + 'dm6' { + fasta = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "1.2e8" + } + 'equCab2' { + fasta = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/README.txt" + mito_name = "chrM" + } + 'galGal4' { + fasta = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/README.txt" + mito_name = "chrM" + } + 'panTro4' { + fasta = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/README.txt" + mito_name = "chrM" + } + 'rn6' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.bed" + mito_name = "chrM" + } + 'sacCer3' { + fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/" + readme = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "1.2e7" + } + 'susScr3' { + fasta = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/README.txt" + mito_name = "chrM" + } } - 'GRCh38' { - fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/" - } - 'GRCm38' { - fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/" - } - 'TAIR10' { - fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/" - } - 'EB2' { - fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/" - } - 'UMD3.1' { - fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/" - } - 'WBcel235' { - fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/" - } - 'CanFam3.1' { - fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/" - } - 'GRCz10' { - fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/" - } - 'BDGP6' { - fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/" - } - 'EquCab2' { - fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/" - } - 'EB1' { - fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/" - } - 'Galgal4' { - fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/" - } - 'Gm01' { - fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/" - } - 'Mmul_1' { - fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/" - } - 'IRGSP-1.0' { - fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/" - } - 'CHIMP2.1.4' { - fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/" - } - 'Rnor_6.0' { - fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/" - } - 'R64-1-1' { - fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/" - } - 'EF2' { - fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/" - } - 'Sbi1' { - fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/" - } - 'Sscrofa10.2' { - fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/" - } - 'AGPv3' { - fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/" - } - 'hg38' { - fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/" - } - 'hg19' { - fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/" - } - 'mm10' { - fasta = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/" - } - 'bosTau8' { - fasta = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/" - } - 'ce10' { - fasta = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/" - } - 'canFam3' { - fasta = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/" - } - 'danRer10' { - fasta = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/" - } - 'dm6' { - fasta = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/" - } - 'equCab2' { - fasta = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/" - } - 'galGal4' { - fasta = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/" - } - 'panTro4' { - fasta = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/" - } - 'rn6' { - fasta = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/" - } - 'sacCer3' { - fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/" - } - 'susScr3' { - fasta = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa" - bowtie2 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/" - } - } } diff --git a/conf/test.config b/conf/test.config index 5c5fc84c35989f039418aeba4bc5b5b1c10da1a6..d117e413a738899ff73b5ed900a52ced8c5986f9 100644 --- a/conf/test.config +++ b/conf/test.config @@ -1,41 +1,29 @@ /* - * ------------------------------------------------- - * Nextflow config file for running tests - * ------------------------------------------------- - * Defines bundled input files and everything required - * to run a fast and simple test. Use as follows: - * nextflow run nf-core/hic -profile test,<docker/singularity> - */ +======================================================================================== + Nextflow config file for running minimal tests +======================================================================================== + Defines input files and everything required to run a fast and simple pipeline test. -params { - config_profile_name = 'Hi-C test data from Schalbetter et al. (2017)' - config_profile_description = 'Minimal test dataset to check pipeline function' + Use as follows: + nextflow run nf-core/hic -profile test,<docker/singularity> + +---------------------------------------------------------------------------------------- +*/ - // Limit resources so that this can run on Travis - max_cpus = 2 - max_memory = 4.GB - max_time = 1.h +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' - // Input data - input_paths = [ - ['SRR4292758_00', ['https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R2.fastq.gz']] - ] + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = 6.GB + max_time = 6.h - // Annotations - fasta = 'https://github.com/nf-core/test-datasets/raw/hic/reference/W303_SGD_2015_JRIU00000000.fsa' - digestion = 'hindiii' - min_mapq = 10 - min_restriction_fragment_size = 100 - max_restriction_fragment_size = 100000 - min_insert_size = 100 - max_insert_size = 600 + // Input data + // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets + // TODO nf-core: Give any required params for the test so that command line flags are not needed + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv' - bin_size = '1000' - res_dist_decay = '1000' - res_tads = '1000' - tads_caller = 'insulation,hicexplorer' - res_compartments = '1000' - - // Ignore `--input` as otherwise the parameter validation will throw an error - schema_ignore_params = 'genomes,digest,input_paths,input' + // Genome references + genome = 'R64-1-1' } diff --git a/conf/test_full.config b/conf/test_full.config index 1e793cc57628bdbed6bbe322e558bffc0e15a3d1..749878333105099f6ec556853fa2b404e1a28cbb 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -1,36 +1,24 @@ /* - * ------------------------------------------------- - * Nextflow config file for running full-size tests - * ------------------------------------------------- - * Defines bundled input files and everything required - * to run a full size pipeline test. Use as follows: - * nextflow run nf-core/hic -profile test_full,<docker/singularity> - */ +======================================================================================== + Nextflow config file for running full-size tests +======================================================================================== + Defines input files and everything required to run a full size pipeline test. -params { - config_profile_name = 'Full test profile' - config_profile_description = 'Full test dataset to check pipeline function' + Use as follows: + nextflow run nf-core/hic -profile test_full,<docker/singularity> + +---------------------------------------------------------------------------------------- +*/ - // Input data for full size test - input_paths = [ - ['SRR4292758_00', ['https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R2.fastq.gz']] - ] +params { + config_profile_name = 'Full test profile' + config_profile_description = 'Full test dataset to check pipeline function' - // Annotations - fasta = 'https://github.com/nf-core/test-datasets/raw/hic/reference/W303_SGD_2015_JRIU00000000.fsa' - digestion = 'hindiii' - min_mapq = 10 - min_restriction_fragment_size = 100 - max_restriction_fragment_size = 100000 - min_insert_size = 100 - max_insert_size = 600 + // Input data for full size test + // TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA) + // TODO nf-core: Give any required params for the test so that command line flags are not needed + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv' - bin_size = '1000' - res_dist_decay = '1000' - res_tads = '1000' - tads_caller = 'insulation,hicexplorer' - res_compartments = '1000' - - // Ignore `--input` as otherwise the parameter validation will throw an error - schema_ignore_params = 'genomes,digest,input_paths,input' + // Genome references + genome = 'R64-1-1' } diff --git a/docs/README.md b/docs/README.md index a6889549c7f27bda0aed81947685713781fe2d1b..112811ae596b875a989ea77d0671e97ee1125a7c 100644 --- a/docs/README.md +++ b/docs/README.md @@ -3,8 +3,8 @@ The nf-core/hic documentation is split into the following pages: * [Usage](usage.md) - * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. + * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. * [Output](output.md) - * An overview of the different results produced by the pipeline and how to interpret them. + * An overview of the different results produced by the pipeline and how to interpret them. You can find a lot more documentation about installing, configuring and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re) diff --git a/docs/images/mqc_fastqc_adapter.png b/docs/images/mqc_fastqc_adapter.png new file mode 100755 index 0000000000000000000000000000000000000000..361d0e47acfb424dea1f326590d1eb2f6dfa26b5 Binary files /dev/null and b/docs/images/mqc_fastqc_adapter.png differ diff --git a/docs/images/mqc_fastqc_counts.png b/docs/images/mqc_fastqc_counts.png new file mode 100755 index 0000000000000000000000000000000000000000..cb39ebb80a71dc4cdeee076c107e30a6c944441b Binary files /dev/null and b/docs/images/mqc_fastqc_counts.png differ diff --git a/docs/images/mqc_fastqc_quality.png b/docs/images/mqc_fastqc_quality.png new file mode 100755 index 0000000000000000000000000000000000000000..a4b89bf56ab2ba88cab87841916eb680a816deae Binary files /dev/null and b/docs/images/mqc_fastqc_quality.png differ diff --git a/docs/output.md b/docs/output.md index 8b3fd0a40579b5ee19f107acdf6f531a8d98702f..e2d35a1b9c9b5bd3b6c2f7948a3cf5bb469e0c62 100644 --- a/docs/output.md +++ b/docs/output.md @@ -3,275 +3,66 @@ ## Introduction This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. -The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. - -## Pipeline overview - -The pipeline is built using [Nextflow](https://www.nextflow.io/) -and processes data using the following steps: - -* [HiC-Pro](#hicpro) - * [Reads alignment](#reads-alignment) - * [Valid pairs detection](#valid-pairs-detection) - * [Duplicates removal](#duplicates-removal) - * [Contact maps](#hicpro-contact-maps) -* [Hi-C contact maps](#hic-contact-maps) -* [Downstream analysis](#downstream-analysis) - * [Distance decay](#distance-decay) - * [Compartments calling](#compartments-calling) - * [TADs calling](#tads-calling) -* [MultiQC](#multiqc) - aggregate report and quality controls, describing -results of the whole pipeline -* [Export](#exprot) - additionnal export for compatibility with downstream -analysis tool and visualization - -## HiC-Pro - -The current version is mainly based on the -[HiC-Pro](https://github.com/nservant/HiC-Pro) pipeline. -For details about the workflow, see -[Servant et al. 2015](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0831-x) - -### Reads alignment - -Using Hi-C data, each reads mate has to be independantly aligned on the -reference genome. -The current workflow implements a two steps mapping strategy. First, the reads -are aligned using an end-to-end aligner. -Second, reads spanning the ligation junction are trimmmed from their 3' end, -and aligned back on the genome. -Aligned reads for both fragment mates are then paired in a single paired-end -BAM file. -Singletons are discarded, and multi-hits are filtered according to the -configuration parameters (`--rm-multi`). -Note that if the `--dnase` mode is activated, HiC-Pro will skip the second -mapping step. - -**Output directory: `results/hicpro/mapping`** - -* `*bwt2pairs.bam` - final BAM file with aligned paired data -* `*.pairstat` - mapping statistics - -if `--saveAlignedIntermediates` is specified, additional mapping file results -are available ; - -* `*.bam` - Aligned reads (R1 and R2) from end-to-end alignment -* `*_unmap.fastq` - Unmapped reads after end-to-end alignment -* `*_trimmed.fastq` - Trimmed reads after end-to-end alignment -* `*_trimmed.bam` - Alignment of trimmed reads -* `*bwt2merged.bam` - merged BAM file after the two-steps alignment -* `*.mapstat` - mapping statistics per read mate - -Usually, a high fraction of reads is expected to be aligned on the genome -(80-90%). Among them, we usually observed a few percent (around 10%) of step 2 -aligned reads. Those reads are chimeric fragments for which we detect a -ligation junction. An abnormal level of chimeric reads can reflect a ligation -issue during the library preparation. -The fraction of singleton or multi-hits depends on the genome complexity and -the fraction of unmapped reads. The fraction of singleton is usually close to -the sum of unmapped R1 and R2 reads, as it is unlikely that both mates from the -same pair were unmapped. - -### Valid pairs detection with HiC-Pro - -Each aligned reads can be assigned to one restriction fragment according to the -reference genome and the digestion protocol. - -Invalid pairs are classified as follow: - -* Dangling end, i.e. unligated fragments (both reads mapped on the same -restriction fragment) -* Self circles, i.e. fragments ligated on themselves (both reads mapped on the -same restriction fragment in inverted orientation) -* Religation, i.e. ligation of juxtaposed fragments -* Filtered pairs, i.e. any pairs that do not match the filtering criteria on -inserts size, restriction fragments size -* Dumped pairs, i.e. any pairs for which we were not able to reconstruct the -ligation product. - -Only valid pairs involving two different restriction fragments are used to -build the contact maps. -Duplicated valid pairs associated to PCR artefacts are discarded -(see `--rm_dup`). - -In case of Hi-C protocols that do not require a restriction enzyme such as -DNase Hi-C or micro Hi-C, the assignment to a restriction is not possible -(see `--dnase`). -Short range interactions that are likely to be spurious ligation products -can thus be discarded using the `--min_cis_dist` parameter. - -**Output directory: `results/hicpro/valid_pairs`** - -* `*.validPairs` - List of valid ligation products -* `*.DEpairs` - List of dangling-end products -* `*.SCPairs` - List of self-circle products -* `*.REPairs` - List of religation products -* `*.FiltPairs` - List of filtered pairs -* `*RSstat` - Statitics of number of read pairs falling in each category - -The validPairs are stored using a simple tab-delimited text format ; - -```bash -read name / chr_reads1 / pos_reads1 / strand_reads1 / chr_reads2 / pos_reads2 / -strand_reads2 / fragment_size / res frag name R1 / res frag R2 / mapping qual R1 -/ mapping qual R2 [/ allele_specific_tag] -``` - -The ligation efficiency can be assessed using the filtering of valid and -invalid pairs. As the ligation is a random process, 25% of each valid ligation -class is expected. In the same way, a high level of dangling-end or self-circle -read pairs is associated with a low quality experiment, and reveals a problem -during the digestion, fill-in or ligation steps. - -In the context of Hi-C protocol without restriction enzyme, this analysis step -is skipped. The aligned pairs are therefore directly used to generate the -contact maps. A filter of the short range contact (typically <1kb) is -recommanded as this pairs are likely to be self ligation products. -### Duplicates removal - -Note that validPairs file are generated per reads chunck. -These files are then merged in the allValidPairs file, and duplicates are -removed if the `--rm_dup` parameter is used. - -**Output directory: `results/hicpro/valid_pairs`** - -* `*allValidPairs` - combined valid pairs from all read chunks -* `*mergestat` - statistics about duplicates removal and valid pairs information - -Additional quality controls such as fragment size distribution can be extracted -from the list of valid interaction products. -We usually expect to see a distribution centered around 300 pb which correspond -to the paired-end insert size commonly used. -The fraction of dplicates is also presented. A high level of duplication -indicates a poor molecular complexity and a potential PCR bias. -Finaly, an important metric is to look at the fraction of intra and -inter-chromosomal interactions, as well as long range (>20kb) versus short -range (<20kb) intra-chromosomal interactions. - -### Contact maps - -Intra et inter-chromosomal contact maps are build for all specified resolutions. -The genome is splitted into bins of equal size. Each valid interaction is -associated with the genomic bins to generate the raw maps. -In addition, Hi-C data can contain several sources of biases which has to be -corrected. -The HiC-Pro workflow uses the [ìced](https://github.com/hiclib/iced) and -[Varoquaux and Servant, 2018](http://joss.theoj.org/papers/10.21105/joss.01286) -python package which proposes a fast implementation of the original ICE -normalization algorithm (Imakaev et al. 2012), making the assumption of equal -visibility of each fragment. - -Importantly, the HiC-Pro maps are generated only if the `--hicpro_maps` option -is specified on the command line. - -**Output directory: `results/hicpro/matrix`** - -* `*.matrix` - genome-wide contact maps -* `*_iced.matrix` - genome-wide iced contact maps - -The contact maps are generated for all specified resolutions -(see `--bin_size` argument). -A contact map is defined by : - -* A list of genomic intervals related to the specified resolution (BED format). -* A matrix, stored as standard triplet sparse format (i.e. list format). - -Based on the observation that a contact map is symmetric and usually sparse, -only non-zero values are stored for half of the matrix. The user can specified -if the 'upper', 'lower' or 'complete' matrix has to be stored. The 'asis' -option allows to store the contacts as they are observed from the valid pairs -files. - -```bash - A B 10 - A C 23 - B C 24 - (...) -``` - -This format is memory efficient, and is compatible with several software for -downstream analysis. - -## Hi-C contact maps - -Contact maps are usually stored as simple txt (`HiC-Pro`), .hic (`Juicer/Juicebox`) and .(m)cool (`cooler/Higlass`) formats. -Note that .cool and .hic format are compressed and usually much more efficient that the txt format. -In the current workflow, we propose to use the `cooler` format as a standard to build the raw and normalized maps -after valid pairs detection as it is used by several downstream analysis and visualization tools. - -Raw contact maps are therefore in **`results/contact_maps/raw`** which contains the different maps in `txt` and `cool` formats, at various resolutions. -Normalized contact maps are stored in **`results/contact_maps/norm`** which contains the different maps in `txt`, `cool`, and `mcool` format. +The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. -Note that `txt` contact maps generated with `cooler` are identical to those generated by `HiC-Pro`. -However, differences can be observed on the normalized contact maps as the balancing algorithm is not the same. +<!-- TODO nf-core: Write this documentation describing your workflow's output --> -## Downstream analysis +## Pipeline overview -Downstream analysis are performed from `cool` files at specified resolution. +The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: -### Distance decay +* [FastQC](#fastqc) - Raw read QC +* [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline +* [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution -The distance decay plot shows the relationship between contact frequencies and genomic distance. It gives a good indication of the compaction of the genome. -According to the organism, the slope of the curve should fit the expectation of polymer physics models. +### FastQC -The results generated with the `HiCExplorer hicPlotDistVsCounts` tool (plot and table) are available in the **`results/dist_decay/`** folder. +<details markdown="1"> +<summary>Output files</summary> -### Compartments calling +* `fastqc/` + * `*_fastqc.html`: FastQC report containing quality metrics. + * `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images. -Compartments calling is one of the most common analysis which aims at detecting A (open, active) / B (close, inactive) compartments. -In the first studies on the subject, the compartments were called at high/medium resolution (1000000 to 250000) which is enough to call A/B comparments. -Analysis at higher resolution has shown that these two main types of compartments can be further divided into compartments subtypes. +</details> -Although different methods have been proposed for compartment calling, the standard remains the eigen vector decomposition from the normalized correlation maps. -Here, we use the implementation available in the [`cooltools`](https://cooltools.readthedocs.io/en/lates) package. +[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). -Results are available in **`results/compartments/`** folder and includes : + -* `*cis.vecs.tsv`: eigenvectors decomposition along the genome -* `*cis.lam.txt`: eigenvalues associated with the eigenvectors + -### TADs calling + -TADs has been described as functional units of the genome. -While contacts between genes and regulatority elements can occur within a single TADs, contacts between TADs are much less frequent, mainly due to the presence of insulation protein (such as CTCF) at their boundaries. Looking at Hi-C maps, TADs look like triangles around the diagonal. According to the contact map resolutions, TADs appear as hierarchical structures with a median size around 1Mb (in mammals), as well as smaller structures usually called sub-TADs of smaller size. +> **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. -TADs calling remains a challenging task, and even if many methods have been proposed in the last decade, little overlap have been found between their results. +### MultiQC -Currently, the pipeline proposes two approaches : +<details markdown="1"> +<summary>Output files</summary> -* Insulation score using the [`cooltools`](https://cooltools.readthedocs.io/en/latest/cli.html#cooltools-diamond-insulation) package. Results are availabe in **`results/tads/insulation`**. -* [`HiCExplorer TADs calling`](https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindTADs.html). Results are available at **`results/tads/hicexplorer`**. +* `multiqc/` + * `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. + * `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. + * `multiqc_plots/`: directory containing static images from the report in various formats. -Usually, TADs results are presented as simple BED files, or bigWig files, with the position of boundaries along the genome. +</details> -## MultiQC +[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory. -[MultiQC](http://multiqc.info) is a visualisation tool that generates a single -HTML report summarising all samples in your project. Most of the pipeline QC -results are visualised in the report and further statistics are available in -within the report data directory. +Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see <http://multiqc.info>. -The pipeline has special steps which allow the software versions used to be -reported in the MultiQC output for future traceability. +### Pipeline information -**Output files:** +<details markdown="1"> +<summary>Output files</summary> -* `multiqc/` - * `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. - * `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. - * `multiqc_plots/`: directory containing static images from the report in various formats. +* `pipeline_info/` + * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. + * Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.tsv`. + * Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`. -## Pipeline information +</details> [Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. - -**Output files:** - -* `pipeline_info/` - * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, - `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. - * Reports generated by the pipeline: `pipeline_report.html`, - `pipeline_report.txt` and `software_versions.csv`. - * Documentation for interpretation of results in HTML format: - `results_description.html`. diff --git a/docs/usage.md b/docs/usage.md index 800d44713563554482d79b8e165d06514ad921e3..79b33ec2f74790ac27c2c883fe932dfa222f63e6 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -6,20 +6,65 @@ ## Introduction +<!-- TODO nf-core: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website. --> + +## Samplesheet input + +You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. + +```console +--input '[path to samplesheet file]' +``` + +### Multiple runs of the same sample + +The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes: + +```console +sample,fastq_1,fastq_2 +CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz +CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz +CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz +``` + +### Full samplesheet + +The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below. + +A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice. + +```console +sample,fastq_1,fastq_2 +CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz +CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz +CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz +TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz, +TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz, +TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz, +TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz, +``` + +| Column | Description | +|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). | +| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | +| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | + +An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline. + ## Running the pipeline The typical command for running the pipeline is as follows: -```bash -nextflow run nf-core/hic --input '*_R{1,2}.fastq.gz' -profile docker +```console +nextflow run nf-core/hic --input samplesheet.csv --genome GRCh37 -profile docker ``` -This will launch the pipeline with the `docker` configuration profile. -See below for more information about profiles. +This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. Note that the pipeline will create the following files in your working directory: -```bash +```console work # Directory containing the nextflow working files results # Finished results (configurable, see below) .nextflow_log # Log file from Nextflow @@ -30,673 +75,210 @@ results # Finished results (configurable, see below) When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: -```bash +```console nextflow pull nf-core/hic ``` ### Reproducibility -It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. +It is a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. -First, go to the -[nf-core/hic releases page](https://github.com/nf-core/hic/releases) and find -the latest version number - numeric only (eg. `1.3.1`). -Then specify this when running the pipeline with `-r` (one hyphen) -eg. `-r 1.3.1`. +First, go to the [nf-core/hic releases page](https://github.com/nf-core/hic/releases) and find the latest version number - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. -This version number will be logged in reports when you run the pipeline, so -that you'll know what you used when you look back in the future. - -### Automatic resubmission - -Each step in the pipeline has a default set of requirements for number of CPUs, -memory and time. For most of the steps in the pipeline, if the job exits with -an error code of `143` (exceeded requested resources) it will automatically -resubmit with higher requests (2 x original, then 3 x original). If it still -fails after three times then the pipeline is stopped. +This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. ## Core Nextflow arguments -> **NB:** These options are part of Nextflow and use a _single_ hyphen -(pipeline parameters use a double-hyphen). +> **NB:** These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen). ### `-profile` -Use this parameter to choose a configuration profile. Profiles can give -configuration presets for different compute environments. +Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. -Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Conda) - see below. +Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Conda) - see below. When using Biocontainers, most of these software packaging methods pull Docker containers from quay.io e.g [FastQC](https://quay.io/repository/biocontainers/fastqc) except for Singularity which directly downloads Singularity images via https hosted by the [Galaxy project](https://depot.galaxyproject.org/singularity/) and Conda which downloads and installs software locally from [Bioconda](https://bioconda.github.io/). -> We highly recommend the use of Docker or Singularity containers for full -pipeline reproducibility, however when this is not possible, Conda is also supported. +> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. -The pipeline also dynamically loads configurations from -[https://github.com/nf-core/configs](https://github.com/nf-core/configs) -when it runs, making multiple config profiles for various institutional -clusters available at run time. -For more information and to see if your system is available in these -configs please see -the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). +The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). -Note that multiple profiles can be loaded, for example: `-profile test,docker` - -the order of arguments is important! -They are loaded in sequence, so later profiles can overwrite -earlier profiles. +Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important! +They are loaded in sequence, so later profiles can overwrite earlier profiles. -If `-profile` is not specified, the pipeline will run locally and -expect all software to be -installed and available on the `PATH`. This is _not_ recommended. +If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended. * `docker` - * A generic configuration profile to be used with [Docker](https://docker.com/) - * Pulls software from Docker Hub: [`nfcore/hic`](https://hub.docker.com/r/nfcore/hic/) + * A generic configuration profile to be used with [Docker](https://docker.com/) * `singularity` - * A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) - * Pulls software from Docker Hub: [`nfcore/hic`](https://hub.docker.com/r/nfcore/hic/) + * A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) * `podman` - * A generic configuration profile to be used with [Podman](https://podman.io/) - * Pulls software from Docker Hub: [`nfcore/hic`](https://hub.docker.com/r/nfcore/hic/) + * A generic configuration profile to be used with [Podman](https://podman.io/) * `shifter` - * A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/) - * Pulls software from Docker Hub: [`nfcore/hic`](https://hub.docker.com/r/nfcore/hic/) + * A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/) * `charliecloud` - * A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/) - * Pulls software from Docker Hub: [`nfcore/hic`](https://hub.docker.com/r/nfcore/hic/) + * A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/) * `conda` - * Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter or Charliecloud. - * A generic configuration profile to be used with [Conda](https://conda.io/docs/) - * Pulls most software from [Bioconda](https://bioconda.github.io/) + * A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter or Charliecloud. * `test` - * A profile with a complete configuration for automated testing - * Includes links to test data so needs no other parameters + * A profile with a complete configuration for automated testing + * Includes links to test data so needs no other parameters ### `-resume` -Specify this when restarting a pipeline. Nextflow will used cached results from -any pipeline steps where the inputs are the same, continuing from where it got -to previously. -You can also supply a run name to resume a specific run: `-resume [run-name]`. -Use the `nextflow log` command to show previous run names. - -### `-c` - -Specify the path to a specific config file (this is a core Nextflow command). -See the [nf-core website documentation](https://nf-co.re/usage/configuration) -for more information. - -#### Custom resource requests - -Each step in the pipeline has a default set of requirements for number of CPUs, -memory and time. For most of the steps in the pipeline, if the job exits with -an error code of `143` (exceeded requested resources) it will automatically resubmit -with higher requests (2 x original, then 3 x original). If it still fails after three -times then the pipeline is stopped. - -Whilst these default requirements will hopefully work for most people with most data, -you may find that you want to customise the compute resources that the pipeline requests. -You can do this by creating a custom config file. For example, to give the workflow -process `star` 32GB of memory, you could use the following config: - -```nextflow -process { - withName: star { - memory = 32.GB - } -} -``` - -To find the exact name of a process you wish to modify the compute resources, check the live-status of a nextflow run displayed on your terminal or check the nextflow error for a line like so: `Error executing process > 'bowtie2_end_to_end'`. In this case the name to specify in the custom config file is `bowtie2_end_to_end`. - -See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information. - -If you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition above). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. - -If you have any questions or issues please send us a message on -[Slack](https://nf-co.re/join/slack) on the -[`#configs` channel](https://nfcore.slack.com/channels/configs). - -### Running in the background - -Nextflow handles job submissions and supervises the running jobs. -The Nextflow process must run until the pipeline is finished. - -The Nextflow `-bg` flag launches Nextflow in the background, detached from your terminal -so that the workflow does not stop if you log out of your session. The logs are -saved to a file. - -Alternatively, you can use `screen` / `tmux` or similar tool to create a detached -session which you can log back into at a later time. -Some HPC setups also allow you to run nextflow within a cluster job submitted -your job scheduler (from where it submits more jobs). - -#### Nextflow memory requirements - -In some cases, the Nextflow Java virtual machines can start to request a -large amount of memory. -We recommend adding the following line to your environment to limit this -(typically in `~/.bashrc` or `~./bash_profile`): - -```bash -NXF_OPTS='-Xms1g -Xmx4g' -``` - -## Use case - -### Hi-C digestion protocol - -Here is an command line example for standard DpnII digestion protocols. -Alignment will be performed on the `mm10` genome with default parameters. -Multi-hits will not be considered and duplicates will be removed. -Note that by default, no filters are applied on DNA and restriction fragment sizes. - -```bash -nextflow run main.nf --input './*_R{1,2}.fastq.gz' --genome 'mm10' --digestion 'dnpii' -``` - -### DNase Hi-C protocol - -Here is an command line example for DNase protocol. -Alignment will be performed on the `mm10` genome with default paramters. -Multi-hits will not be considered and duplicates will be removed. -Contacts involving fragments separated by less than 1000bp will be discarded. - -```bash -nextflow run main.nf --input './*_R{1,2}.fastq.gz' --genome 'mm10' --dnase --min_cis 1000 -``` - -## Inputs - -### `--input` - -Use this to specify the location of your input FastQ files. For example: - -```bash ---input 'path/to/data/sample_*_{1,2}.fastq' -``` - -Please note the following requirements: - -1. The path must be enclosed in quotes -2. The path must have at least one `*` wildcard character -3. When using the pipeline with paired end data, the path must use `{1,2}` -notation to specify read pairs. - -If left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz` - -Note that the Hi-C data analysis requires paired-end data. - -## Reference genomes - -The pipeline config files come bundled with paths to the illumina iGenomes reference -index files. If running with docker or AWS, the configuration is set up to use the -[AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource. - -### `--genome` (using iGenomes) - -There are many different species supported in the iGenomes references. To run -the pipeline, you must specify which to use with the `--genome` flag. - -You can find the keys to specify the genomes in the -[iGenomes config file](../conf/igenomes.config). - -### `--fasta` - -If you prefer, you can specify the full path to your reference genome when you -run the pipeline: - -```bash ---fasta '[path to Fasta reference]' -``` - -### `--bwt2_index` - -The bowtie2 indexes are required to align the data with the HiC-Pro workflow. If the -`--bwt2_index` is not specified, the pipeline will either use the igenome -bowtie2 indexes (see `--genome` option) or build the indexes on-the-fly -(see `--fasta` option) - -```bash ---bwt2_index '[path to bowtie2 index]' -``` - -### `--chromosome_size` - -The Hi-C pipeline will also requires a two-columns text file with the -chromosome name and its size (tab separated). -If not specified, this file will be automatically created by the pipeline. -In the latter case, the `--fasta` reference genome has to be specified. - -```bash - chr1 249250621 - chr2 243199373 - chr3 198022430 - chr4 191154276 - chr5 180915260 - chr6 171115067 - chr7 159138663 - chr8 146364022 - chr9 141213431 - chr10 135534747 - (...) -``` - -```bash ---chromosome_size '[path to chromosome size file]' -``` - -### `--restriction_fragments` - -Finally, Hi-C experiments based on restriction enzyme digestion requires a BED -file with coordinates of restriction fragments. - -```bash - chr1 0 16007 HIC_chr1_1 0 + - chr1 16007 24571 HIC_chr1_2 0 + - chr1 24571 27981 HIC_chr1_3 0 + - chr1 27981 30429 HIC_chr1_4 0 + - chr1 30429 32153 HIC_chr1_5 0 + - chr1 32153 32774 HIC_chr1_6 0 + - chr1 32774 37752 HIC_chr1_7 0 + - chr1 37752 38369 HIC_chr1_8 0 + - chr1 38369 38791 HIC_chr1_9 0 + - chr1 38791 39255 HIC_chr1_10 0 + - (...) -``` - -If not specified, this file will be automatically created by the pipline. -In this case, the `--fasta` reference genome will be used. -Note that the `digestion` or `--restriction_site` parameter is mandatory to create this file. - -## Hi-C specific options - -The following options are defined in the `nextflow.config` file, and can be -updated either using a custom configuration file (see `-c` option) or using -command line parameter. - -### HiC-pro mapping - -The reads mapping is currently based on the two-steps strategy implemented in -the HiC-pro pipeline. The idea is to first align reads from end-to-end. -Reads that do not aligned are then trimmed at the ligation site, and their 5' -end is re-aligned to the reference genome. -Note that the default option are quite stringent, and can be updated according -to the reads quality or the reference genome. - -#### `--bwt2_opts_end2end` - -Bowtie2 alignment option for end-to-end mapping. -Default: '--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end ---reorder' - -```bash ---bwt2_opts_end2end '[Options for bowtie2 step1 mapping on full reads]' -``` - -#### `--bwt2_opts_trimmed` - -Bowtie2 alignment option for trimmed reads mapping (step 2). -Default: '--very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end ---reorder' - -```bash ---bwt2_opts_trimmed '[Options for bowtie2 step2 mapping on trimmed reads]' -``` - -#### `--min_mapq` - -Minimum mapping quality. Reads with lower quality are discarded. Default: 10 - -```bash ---min_mapq '[Minimum quality value]' -``` - -### Digestion Hi-C - -#### `--digestion` - -This parameter allows to automatically set the `--restriction_site` and -`--ligation_site` parameter according to the restriction enzyme you used. -Available keywords are 'hindiii', 'dpnii', 'mboi', 'arima'. +Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. -```bash ---digestion 'hindiii' -``` - -#### `--restriction_site` - -If the restriction enzyme is not available through the `--digestion` -parameter, you can also defined manually the restriction motif(s) for -Hi-C digestion protocol. -The restriction motif(s) is(are) used to generate the list of restriction fragments. -The precise cutting site of the restriction enzyme has to be specified using -the '^' character. Default: 'A^AGCTT' -Here are a few examples: - -* MboI: ^GATC -* DpnII: ^GATC -* HindIII: A^AGCTT -* ARIMA kit: ^GATC,G^ANTC - -Note that multiples restriction motifs can be provided (comma-separated) and -that 'N' base are supported. - -```bash ---restriction_size '[Cutting motif]' -``` - -#### `--ligation_site` - -Ligation motif after reads ligation. This motif is used for reads trimming and -depends on the fill in strategy. -Note that multiple ligation sites can be specified (comma separated) and that -'N' base is interpreted and replaced by 'A','C','G','T'. -Default: 'AAGCTAGCTT' - -```bash ---ligation_site '[Ligation motif]' -``` - -Exemple of the ARIMA kit: GATCGATC,GANTGATC,GANTANTC,GATCANTC - -### DNAse Hi-C - -#### `--dnase` - -In DNAse Hi-C mode, all options related to digestion Hi-C -(see previous section) are ignored. -In this case, it is highly recommanded to use the `--min_cis_dist` parameter -to remove spurious ligation products. - -```bash ---dnase' -``` - -### HiC-pro processing - -#### `--min_restriction_fragment_size` - -Minimum size of restriction fragments to consider for the Hi-C processing. -Default: '0' - no filter - -```bash ---min_restriction_fragment_size '[numeric]' -``` - -#### `--max_restriction_fragment_size` - -Maximum size of restriction fragments to consider for the Hi-C processing. -Default: '0' - no filter - -```bash ---max_restriction_fragment_size '[numeric]' -``` - -#### `--min_insert_size` - -Minimum reads insert size. Shorter 3C products are discarded. -Default: '0' - no filter - -```bash ---min_insert_size '[numeric]' -``` - -#### `--max_insert_size` - -Maximum reads insert size. Longer 3C products are discarded. -Default: '0' - no filter - -```bash ---max_insert_size '[numeric]' -``` - -#### `--min_cis_dist` - -Filter short range contact below the specified distance. -Mainly useful for DNase Hi-C. Default: '0' - -```bash ---min_cis_dist '[numeric]' -``` - -#### `--keep_dups` - -If specified, duplicates reads are not discarded before building contact maps. - -```bash ---keep_dups -``` - -#### `--keep_multi` - -If specified, reads that aligned multiple times on the genome are not discarded. -Note the default mapping options are based on random hit assignment, meaning -that only one position is kept per read. -Note that in this case the `--min_mapq` parameter is ignored. - -```bash ---keep_multi -``` - -## Genome-wide contact maps - -Once the list of valid pairs is available, the standard is now to move on the `cooler` -framework to build the raw and balanced contact maps in txt and (m)cool formats. - -### `--bin_size` - -Resolution of contact maps to generate (comma separated). -Default:'1000000,500000' +You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names. -```bash ---bins_size '[string]' -``` - -### `--res_zoomify` - -Define the maximum resolution to reach when zoomify the cool contact maps. -Default:'5000' - -```bash ---res_zoomify '[string]' -``` - -### HiC-Pro contact maps - -By default, the contact maps are now generated with the `cooler` framework. -However, for backward compatibility, the raw and normalized maps can still be generated -by HiC-pro if the `--hicpro_maps` parameter is set. - -#### `--hicpro_maps` - -If specified, the raw and ICE normalized contact maps will be generated by HiC-Pro. - -```bash ---hicpro_maps -``` - -#### `--ice_max_iter` - -Maximum number of iteration for ICE normalization. -Default: 100 - -```bash ---ice_max_iter '[numeric]' -``` - -#### `--ice_filer_low_count_perc` - -Define which pourcentage of bins with low counts should be force to zero. -Default: 0.02 - -```bash ---ice_filter_low_count_perc '[numeric]' -``` - -#### `--ice_filer_high_count_perc` - -Define which pourcentage of bins with low counts should be discarded before -normalization. Default: 0 - -```bash ---ice_filter_high_count_perc '[numeric]' -``` - -#### `--ice_eps` - -The relative increment in the results before declaring convergence for ICE -normalization. Default: 0.1 - -```bash ---ice_eps '[numeric]' -``` - -## Downstream analysis - -### Additional quality controls +### `-c` -#### `--res_dist_decay` +Specify the path to a specific config file (this is a core Nextflow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information. -Generates distance vs Hi-C counts plots at a given resolution using `HiCExplorer`. -Several resolution can be specified (comma separeted). Default: '250000' +## Custom configuration -```bash ---res_dist_decay '[string]' -``` +### Resource requests -### Compartment calling +Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped. -Call open/close compartments for each chromosome, using the `cooltools` command. +For example, if the nf-core/rnaseq pipeline is failing after multiple re-submissions of the `STAR_ALIGN` process due to an exit code of `137` this would indicate that there is an out of memory issue: -#### `--res_compartments` +```console +[62/149eb0] NOTE: Process `RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137) -- Execution is retried (1) +Error executing process > 'RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)' -Resolution to call the chromosome compartments (comma separated). -Default: '250000' +Caused by: + Process `RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137) -```bash ---res_compartments '[string]' -``` +Command executed: + STAR \ + --genomeDir star \ + --readFilesIn WT_REP1_trimmed.fq.gz \ + --runThreadN 2 \ + --outFileNamePrefix WT_REP1. \ + <TRUNCATED> -### TADs calling +Command exit status: + 137 -#### `--tads_caller` +Command output: + (empty) -TADs calling can be performed using different approaches. -Currently available options are `insulation` and `hicexplorer`. -Note that all options can be specified (comma separated). -Default: 'insulation' +Command error: + .command.sh: line 9: 30 Killed STAR --genomeDir star --readFilesIn WT_REP1_trimmed.fq.gz --runThreadN 2 --outFileNamePrefix WT_REP1. <TRUNCATED> +Work dir: + /home/pipelinetest/work/9d/172ca5881234073e8d76f2a19c88fb -```bash ---tads_caller '[string]' +Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run` ``` -#### `--res_tads` - -Resolution to run the TADs calling analysis (comma separated). -Default: '40000,20000' +To bypass this error you would need to find exactly which resources are set by the `STAR_ALIGN` process. The quickest way is to search for `process STAR_ALIGN` in the [nf-core/rnaseq Github repo](https://github.com/nf-core/rnaseq/search?q=process+STAR_ALIGN). We have standardised the structure of Nextflow DSL2 pipelines such that all module files will be present in the `modules/` directory and so based on the search results the file we want is `modules/nf-core/software/star/align/main.nf`. If you click on the link to that file you will notice that there is a `label` directive at the top of the module that is set to [`label process_high`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L9). The [Nextflow `label`](https://www.nextflow.io/docs/latest/process.html#label) directive allows us to organise workflow processes in separate groups which can be referenced in a configuration file to select and configure subset of processes having similar computing requirements. The default values for the `process_high` label are set in the pipeline's [`base.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L33-L37) which in this case is defined as 72GB. Providing you haven't set any other standard nf-core parameters to __cap__ the [maximum resources](https://nf-co.re/usage/configuration#max-resources) used by the pipeline then we can try and bypass the `STAR_ALIGN` process failure by creating a custom config file that sets at least 72GB of memory, in this case increased to 100GB. The custom config below can then be provided to the pipeline via the [`-c`](#-c) parameter as highlighted in previous sections. -```bash ---res_tads '[string]' +```nextflow +process { + withName: STAR_ALIGN { + memory = 100.GB + } +} ``` -## Inputs/Outputs +> **NB:** We specify just the process name i.e. `STAR_ALIGN` in the config file and not the full task name string that is printed to screen in the error message or on the terminal whilst the pipeline is running i.e. `RNASEQ:ALIGN_STAR:STAR_ALIGN`. You may get a warning suggesting that the process selector isn't recognised but you can ignore that if the process name has been specified correctly. This is something that needs to be fixed upstream in core Nextflow. -### `--split_fastq` +### Tool-specific options -By default, the nf-core Hi-C pipeline expects one read pairs per sample. -However, for large Hi-C data processing single fastq files can be very -time consuming. -The `--split_fastq` option allows to automatically split input read pairs -into chunks of reads of size `--fastq_chunks_size` (Default : 20000000). In this case, all chunks will be processed in parallel -and merged before generating the contact maps, thus leading to a significant -increase of processing performance. +For the ultimate flexibility, we have implemented and are using Nextflow DSL2 modules in a way where it is possible for both developers and users to change tool-specific command-line arguments (e.g. providing an additional command-line argument to the `STAR_ALIGN` process) as well as publishing options (e.g. saving files produced by the `STAR_ALIGN` process that aren't saved by default by the pipeline). In the majority of instances, as a user you won't have to change the default options set by the pipeline developer(s), however, there may be edge cases where creating a simple custom config file can improve the behaviour of the pipeline if for example it is failing due to a weird error that requires setting a tool-specific parameter to deal with smaller / larger genomes. -```bash ---split_fastq --fastq_chunks_size '[numeric]' -``` +The command-line arguments passed to STAR in the `STAR_ALIGN` module are a combination of: -### `--save_reference` +* Mandatory arguments or those that need to be evaluated within the scope of the module, as supplied in the [`script`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L49-L55) section of the module file. -If specified, annotation files automatically generated from the `--fasta` file -are exported in the results folder. Default: false +* An [`options.args`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L56) string of non-mandatory parameters that is set to be empty by default in the module but can be overwritten when including the module in the sub-workflow / workflow context via the `addParams` Nextflow option. -```bash ---save_reference -``` +The nf-core/rnaseq pipeline has a sub-workflow (see [terminology](https://github.com/nf-core/modules#terminology)) specifically to align reads with STAR and to sort, index and generate some basic stats on the resulting BAM files using SAMtools. At the top of this file we import the `STAR_ALIGN` module via the Nextflow [`include`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/subworkflows/nf-core/align_star.nf#L10) keyword and by default the options passed to the module via the `addParams` option are set as an empty Groovy map [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/subworkflows/nf-core/align_star.nf#L5); this in turn means `options.args` will be set to empty by default in the module file too. This is an intentional design choice and allows us to implement well-written sub-workflows composed of a chain of tools that by default run with the bare minimum parameter set for any given tool in order to make it much easier to share across pipelines and to provide the flexibility for users and developers to customise any non-mandatory arguments. -### `--save_aligned_intermediates` +When including the sub-workflow above in the main pipeline workflow we use the same `include` statement, however, we now have the ability to overwrite options for each of the tools in the sub-workflow including the [`align_options`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/workflows/rnaseq.nf#L225) variable that will be used specifically to overwrite the optional arguments passed to the `STAR_ALIGN` module. In this case, the options to be provided to `STAR_ALIGN` have been assigned sensible defaults by the developer(s) in the pipeline's [`modules.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L70-L74) and can be accessed and customised in the [workflow context](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/workflows/rnaseq.nf#L201-L204) too before eventually passing them to the sub-workflow as a Groovy map called `star_align_options`. These options will then be propagated from `workflow -> sub-workflow -> module`. -If specified, all intermediate mapping files are saved and exported in the -results folder. Default: false +As mentioned at the beginning of this section it may also be necessary for users to overwrite the options passed to modules to be able to customise specific aspects of the way in which a particular tool is executed by the pipeline. Given that all of the default module options are stored in the pipeline's `modules.config` as a [`params` variable](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L24-L25) it is also possible to overwrite any of these options via a custom config file. -```bash ---save_aligned_inermediates -``` +Say for example we want to append an additional, non-mandatory parameter (i.e. `--outFilterMismatchNmax 16`) to the arguments passed to the `STAR_ALIGN` module. Firstly, we need to copy across the default `args` specified in the [`modules.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L71) and create a custom config file that is a composite of the default `args` as well as the additional options you would like to provide. This is very important because Nextflow will overwrite the default value of `args` that you provide via the custom config. -### `--save_interaction_bam` +As you will see in the example below, we have: -If specified, write a BAM file with all classified reads (valid paires, -dangling end, self-circle, etc.) and its tags. +* appended `--outFilterMismatchNmax 16` to the default `args` used by the module. +* changed the default `publish_dir` value to where the files will eventually be published in the main results directory. +* appended `'bam':''` to the default value of `publish_files` so that the BAM files generated by the process will also be saved in the top-level results directory for the module. Note: `'out':'log'` means any file/directory ending in `out` will now be saved in a separate directory called `my_star_directory/log/`. -```bash ---save_interaction_bam +```nextflow +params { + modules { + 'star_align' { + args = "--quantMode TranscriptomeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readFilesCommand zcat --runRNGseed 0 --outFilterMultimapNmax 20 --alignSJDBoverhangMin 1 --outSAMattributes NH HI AS NM MD --quantTranscriptomeBan Singleend --outFilterMismatchNmax 16" + publish_dir = "my_star_directory" + publish_files = ['out':'log', 'tab':'log', 'bam':''] + } + } +} ``` -## Skip options - -### `--skip_maps` +### Updating containers -If defined, the workflow stops with the list of valid interactions, and the -genome-wide maps are not built. Usefult for capture-C analysis. Default: false +The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. If for some reason you need to use a different version of a particular tool with the pipeline then you just need to identify the `process` name and override the Nextflow `container` definition for that process using the `withName` declaration. For example, in the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline a tool called [Pangolin](https://github.com/cov-lineages/pangolin) has been used during the COVID-19 pandemic to assign lineages to SARS-CoV-2 genome sequenced samples. Given that the lineage assignments change quite frequently it doesn't make sense to re-release the nf-core/viralrecon everytime a new version of Pangolin has been released. However, you can override the default container used by the pipeline by creating a custom config file and passing it as a command-line argument via `-c custom.config`. -```bash ---skip_maps -``` +1. Check the default version used by the pipeline in the module file for [Pangolin](https://github.com/nf-core/viralrecon/blob/a85d5969f9025409e3618d6c280ef15ce417df65/modules/nf-core/software/pangolin/main.nf#L14-L19) +2. Find the latest version of the Biocontainer available on [Quay.io](https://quay.io/repository/biocontainers/pangolin?tag=latest&tab=tags) +3. Create the custom config accordingly: -### `--skip_balancing` + * For Docker: -If defined, the contact maps normalization is not run on the raw contact maps. -Default: false + ```nextflow + process { + withName: PANGOLIN { + container = 'quay.io/biocontainers/pangolin:3.0.5--pyhdfd78af_0' + } + } + ``` -```bash ---skip_balancing -``` + * For Singularity: -### `--skip_cool` + ```nextflow + process { + withName: PANGOLIN { + container = 'https://depot.galaxyproject.org/singularity/pangolin:3.0.5--pyhdfd78af_0' + } + } + ``` -If defined, cooler files are not generated. Default: false + * For Conda: -```bash ---skip_cool -``` + ```nextflow + process { + withName: PANGOLIN { + conda = 'bioconda::pangolin=3.0.5' + } + } + ``` -### `skip_dist_decay` +> **NB:** If you wish to periodically update individual tool-specific results (e.g. Pangolin) generated by the pipeline then you must ensure to keep the `work/` directory otherwise the `-resume` ability of the pipeline will be compromised and it will restart from scratch. -Do not run distance decay plots. Default: false +### nf-core/configs -```bash ---skip_dist_decay -``` +In most cases, you will only need to create a custom config as a one-off but if you and others within your organisation are likely to be running nf-core pipelines regularly and need to use the same settings regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter. You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. -### `skip_compartments` +See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information about creating your own configuration files. -Do not call compartments. Default: false +If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). -```bash ---skip_compartments -``` +## Running in the background -### `skip_tads` +Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished. -Do not call TADs. Default: false +The Nextflow `-bg` flag launches Nextflow in the background, detached from your terminal so that the workflow does not stop if you log out of your session. The logs are saved to a file. -```bash ---skip_tads -``` +Alternatively, you can use `screen` / `tmux` or similar tool to create a detached session which you can log back into at a later time. +Some HPC setups also allow you to run nextflow within a cluster job submitted your job scheduler (from where it submits more jobs). -### `--skip_multiQC` +## Nextflow memory requirements -If defined, the MultiQC report is not generated. Default: false +In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. +We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`): -```bash ---skip_multiQC +```console +NXF_OPTS='-Xms1g -Xmx4g' ``` diff --git a/environment.yml b/environment.yml index 9d357598b764a0dbf9adbeccd3fe6767828fa844..e3cd7576274730a4a3e7c76e8431993726bcc47d 100644 --- a/environment.yml +++ b/environment.yml @@ -16,6 +16,7 @@ dependencies: - bioconda::pysam=0.15.4 - conda-forge::pymdown-extensions=7.1 - bioconda::cooler=0.8.6 + - bioconda::cooltools=0.4.0 - bioconda::bowtie2=2.3.5 - bioconda::samtools=1.9 - bioconda::multiqc=1.8 @@ -27,5 +28,4 @@ dependencies: - bioconda::ucsc-bedgraphtobigwig=357 - conda-forge::cython=0.29.19 - pip: - - cooltools==0.4.0 - fanc==0.8.30 \ No newline at end of file diff --git a/modules/local/samplesheet_check.nf b/modules/local/samplesheet_check.nf index 96ff67f747ec8144d48fee8bcafe412c5bf5ee95..7bbf09f11446e7a4f8e279f19cf399cf41a11ddd 100644 --- a/modules/local/samplesheet_check.nf +++ b/modules/local/samplesheet_check.nf @@ -1,6 +1,5 @@ // Import generic module functions include { initOptions; saveFiles; getSoftwareName } from './functions' - params.options = [:] options = initOptions(params.options) diff --git a/nextflow.config b/nextflow.config index 7296cc2af450343641614486f3cc232b25c71243..8c5d7ac6beffe20a5979f989d71d6737a5c87c55 100644 --- a/nextflow.config +++ b/nextflow.config @@ -1,200 +1,136 @@ /* - * ------------------------------------------------- - * nf-core/hic Nextflow config file - * ------------------------------------------------- - * Default config options for all environments. - */ +======================================================================================== + nf-core/hic Nextflow config file +======================================================================================== + Default config options for all compute environments +---------------------------------------------------------------------------------------- +*/ // Global default params, used in configs params { - // Inputs / outputs - genome = false - input = null - input_paths = null - outdir = './results' - genome = false - input_paths = false - chromosome_size = false - restriction_fragments = false - save_reference = false - - // Mapping - split_fastq = false - fastq_chunks_size = 20000000 - save_interaction_bam = false - save_aligned_intermediates = false - bwt2_opts_end2end = '--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder' - bwt2_opts_trimmed = '--very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder' - keep_dups = false - keep_multi = false - min_mapq = 10 + // TODO nf-core: Specify your pipeline's command line flags + // Input options + input = null + + // References + genome = null + igenomes_base = 's3://ngi-igenomes/igenomes' + igenomes_ignore = false + + // MultiQC options + multiqc_config = null + multiqc_title = null + max_multiqc_email_size = '25.MB' + + // Boilerplate options + outdir = './results' + tracedir = "${params.outdir}/pipeline_info" + publish_dir_mode = 'copy' + email = null + email_on_fail = null + plaintext_email = false + monochrome_logs = false + help = false + validate_params = true + show_hidden_params = false + schema_ignore_params = 'genomes,modules' + enable_conda = false + singularity_pull_docker_container = false + + // Config options + custom_config_version = 'master' + custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" + hostnames = [:] + config_profile_description = null + config_profile_contact = null + config_profile_url = null + config_profile_name = null + + // Max resource options + // Defaults only, expecting to be overwritten + max_memory = '128.GB' + max_cpus = 16 + max_time = '240.h' - // Digestion Hi-C - digestion = false - digest { - 'hindiii'{ - restriction_site='A^AGCTT' - ligation_site='AAGCTAGCTT' - } - 'mboi' { - restriction_site='^GATC' - ligation_site='GATCGATC' - } - 'dpnii' { - restriction_site='^GATC' - ligation_site='GATCGATC' - } - 'arima' { - restriction_site='^GATC,G^ANT' - ligation_site='GATCGATC,GATCGANT,GANTGATC,GANTGANT' - } - } - min_restriction_fragment_size = 0 - max_restriction_fragment_size = 0 - min_insert_size = 0 - max_insert_size = 0 - save_nonvalid_pairs = false - - // Dnase Hi-C - dnase = false - min_cis_dist = 0 - - // Contact maps - bin_size = '1000000' - res_zoomify = '5000' - hicpro_maps = false - ice_max_iter = 100 - ice_filter_low_count_perc = 0.02 - ice_filter_high_count_perc = 0 - ice_eps = 0.1 - - // Downstream Analysis - res_dist_decay = '250000' - tads_caller = 'insulation' - res_tads = '40000' - res_compartments = '250000' - - // Workflow - skip_maps = false - skip_balancing = false - skip_mcool = false - skip_dist_decay = false - skip_compartments = false - skip_tads = false - skip_multiqc = false - - // Boilerplate options - publish_dir_mode = 'copy' - multiqc_config = false - email = false - email_on_fail = false - max_multiqc_email_size = 25.MB - plaintext_email = false - monochrome_logs = false - help = false - igenomes_base = 's3://ngi-igenomes/igenomes' - tracedir = "${params.outdir}/pipeline_info" - igenomes_ignore = false - - //Config - custom_config_version = 'master' - custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" - hostnames = false - config_profile_name = null - config_profile_description = false - config_profile_contact = false - config_profile_url = false - validate_params = true - show_hidden_params = false - schema_ignore_params = 'genomes,digest,input_paths' - - // Defaults only, expecting to be overwritten - max_memory = 24.GB - max_cpus = 8 - max_time = 240.h } -// Container slug. Stable releases should specify release tag! -// Developmental code should specify :dev -process.container = 'nfcore/hic:1.3.0' - // Load base.config by default for all pipelines includeConfig 'conf/base.config' +// Load modules.config for DSL2 module specific options +includeConfig 'conf/modules.config' + // Load nf-core custom profiles from different Institutions try { - includeConfig "${params.custom_config_base}/nfcore_custom.config" + includeConfig "${params.custom_config_base}/nfcore_custom.config" } catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") -} - -// Create profiles -profiles { - conda { - docker.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - process.conda = "$projectDir/environment.yml" - } - debug { process.beforeScript = 'echo $HOSTNAME' } - docker { - docker.enabled = true - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - // Avoid this error: - // WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. - // Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351 - // once this is established and works well, nextflow might implement this behavior as new default. - docker.runOptions = '-u \$(id -u):\$(id -g)' - } - singularity { - docker.enabled = false - singularity.enabled = true - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - singularity.autoMounts = true - } - podman { - singularity.enabled = false - docker.enabled = false - podman.enabled = true - shifter.enabled = false - charliecloud.enabled = false - } - shifter { - singularity.enabled = false - docker.enabled = false - podman.enabled = false - shifter.enabled = true - charliecloud.enabled = false - } - charliecloud { - singularity.enabled = false - docker.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = true - } - test { includeConfig 'conf/test.config' } - test_full { includeConfig 'conf/test_full.config' } + System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") } // Load igenomes.config if required if (!params.igenomes_ignore) { - includeConfig 'conf/igenomes.config' + includeConfig 'conf/igenomes.config' +} else { + params.genomes = [:] +} + +profiles { + debug { process.beforeScript = 'echo $HOSTNAME' } + conda { + params.enable_conda = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + docker { + docker.enabled = true + docker.userEmulation = true + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + singularity { + singularity.enabled = true + singularity.autoMounts = true + docker.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + podman { + podman.enabled = true + docker.enabled = false + singularity.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + shifter { + shifter.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + charliecloud.enabled = false + } + charliecloud { + charliecloud.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + } + test { includeConfig 'conf/test.config' } + test_full { includeConfig 'conf/test_full.config' } } // Export these variables to prevent local Python/R libraries from conflicting with those in the container env { - PYTHONNOUSERSITE = 1 - R_PROFILE_USER = "/.Rprofile" - R_ENVIRON_USER = "/.Renviron" + PYTHONNOUSERSITE = 1 + R_PROFILE_USER = "/.Rprofile" + R_ENVIRON_USER = "/.Renviron" } // Capture exit codes from upstream processes when piping @@ -202,61 +138,61 @@ process.shell = ['/bin/bash', '-euo', 'pipefail'] def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') timeline { - enabled = true - file = "${params.tracedir}/execution_timeline_${trace_timestamp}.html" + enabled = true + file = "${params.tracedir}/execution_timeline_${trace_timestamp}.html" } report { - enabled = true - file = "${params.tracedir}/execution_report_${trace_timestamp}.html" + enabled = true + file = "${params.tracedir}/execution_report_${trace_timestamp}.html" } trace { - enabled = true - file = "${params.tracedir}/execution_trace_${trace_timestamp}.txt" + enabled = true + file = "${params.tracedir}/execution_trace_${trace_timestamp}.txt" } dag { - enabled = true - file = "${params.tracedir}/pipeline_dag_${trace_timestamp}.svg" + enabled = true + file = "${params.tracedir}/pipeline_dag_${trace_timestamp}.svg" } manifest { - name = 'nf-core/hic' - author = 'Nicolas Servant' - homePage = 'https://github.com/nf-core/hic' - description = 'Analysis of Chromosome Conformation Capture data (Hi-C)' - mainScript = 'main.nf' - nextflowVersion = '>=20.04.0' - version = '1.3.0' + name = 'nf-core/hic' + author = 'Nicolas Servant' + homePage = 'https://github.com/nf-core/hic' + description = 'Analysis of Chromosome Conformation Capture data (Hi-C)' + mainScript = 'main.nf' + nextflowVersion = '!>=21.04.0' + version = '1.3.0' } // Function to ensure that resource requirements don't go beyond // a maximum limit def check_max(obj, type) { - if (type == 'memory') { - try { - if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) - return params.max_memory as nextflow.util.MemoryUnit - else - return obj - } catch (all) { - println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" - return obj + if (type == 'memory') { + try { + if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) + return params.max_memory as nextflow.util.MemoryUnit + else + return obj + } catch (all) { + println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'time') { + try { + if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) + return params.max_time as nextflow.util.Duration + else + return obj + } catch (all) { + println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'cpus') { + try { + return Math.min( obj, params.max_cpus as int ) + } catch (all) { + println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" + return obj + } } - } else if (type == 'time') { - try { - if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) - return params.max_time as nextflow.util.Duration - else - return obj - } catch (all) { - println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'cpus') { - try { - return Math.min( obj, params.max_cpus as int ) - } catch (all) { - println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" - return obj - } - } -} \ No newline at end of file +} diff --git a/nextflow_schema.json b/nextflow_schema.json index 7fe34b7c68bcc906fbaa437aab40a53ae0d41bfc..f8154bdd36c9f9269222395f1cbd70b83c140cff 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -16,19 +16,17 @@ "properties": { "input": { "type": "string", - "fa_icon": "fas fa-dna", - "description": "Input FastQ files.", - "help_text": "Use this to specify the location of your input FastQ files. For example:\n\n```bash\n--input 'path/to/data/sample_*_{1,2}.fastq'\n```\n\nPlease note the following requirements:\n\n1. The path must be enclosed in quotes\n2. The path must have at least one `*` wildcard character\n3. When using the pipeline with paired end data, the path must use `{1,2}` notation to specify read pairs.\n\nIf left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz`" - }, - "input_paths": { - "type": "string", - "hidden": true, - "description": "Input FastQ files for test only", - "default": "undefined" + "format": "file-path", + "mimetype": "text/csv", + "pattern": "^\\S+\\.csv$", + "schema": "assets/schema_input.json", + "description": "Path to comma-separated file containing information about the samples in the experiment.", + "help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See [usage docs](https://nf-co.re/hic/usage#samplesheet-input).", + "fa_icon": "fas fa-file-csv" }, "outdir": { "type": "string", - "description": "The output directory where the results will be saved.", + "description": "Path to the output directory where the results will be saved.", "default": "./results", "fa_icon": "fas fa-folder-open" }, @@ -38,6 +36,11 @@ "fa_icon": "fas fa-envelope", "help_text": "Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.", "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$" + }, + "multiqc_title": { + "type": "string", + "description": "MultiQC report title. Printed as page header, used for filename if not otherwise specified.", + "fa_icon": "fas fa-file-signature" } } }, @@ -45,22 +48,26 @@ "title": "Reference genome options", "type": "object", "fa_icon": "fas fa-dna", - "description": "Options for the reference genome indices used to align reads.", + "description": "Reference genome related files and options required for the workflow.", "properties": { "genome": { "type": "string", "description": "Name of iGenomes reference.", "fa_icon": "fas fa-book", - "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`.\n\nSee the [nf-core website docs](https://nf-co.re/usage/reference_genomes) for more details." + "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`. \n\nSee the [nf-core website docs](https://nf-co.re/usage/reference_genomes) for more details." }, "fasta": { "type": "string", - "fa_icon": "fas fa-font", + "format": "file-path", + "mimetype": "text/plain", + "pattern": "^\\S+\\.fn?a(sta)?(\\.gz)?$", "description": "Path to FASTA genome file.", - "help_text": "If you have no genome reference available, the pipeline can build one using a FASTA file. This requires additional time and resources, so it's better to use a pre-build index if possible." + "help_text": "This parameter is *mandatory* if `--genome` is not specified. If you don't have a BWA index available this will be generated for you automatically. Combine with `--save_reference` to save BWA index for future runs.", + "fa_icon": "far fa-file-code" }, "igenomes_base": { "type": "string", + "format": "directory-path", "description": "Directory / URL base for iGenomes references.", "default": "s3://ngi-igenomes/igenomes", "fa_icon": "fas fa-cloud-download-alt", @@ -72,260 +79,95 @@ "fa_icon": "fas fa-ban", "hidden": true, "help_text": "Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`." - }, - "bwt2_index": { - "type": "string", - "description": "Full path to directory containing Bowtie index including base name. i.e. `/path/to/index/base`.", - "fa_icon": "far fa-file-alt" } } }, - "digestion_hi_c": { - "title": "Digestion Hi-C", + "institutional_config_options": { + "title": "Institutional config options", "type": "object", - "description": "Parameters for protocols based on restriction enzyme", - "default": "", + "fa_icon": "fas fa-university", + "description": "Parameters used to describe centralised config profiles. These should not be edited.", + "help_text": "The centralised nf-core configuration profiles use a handful of pipeline parameters to describe themselves. This information is then printed to the Nextflow log when you run a pipeline. You should not need to change these values when you run a pipeline.", "properties": { - "digestion": { + "custom_config_version": { "type": "string", - "default": "hindiii", - "description": "Name of restriction enzyme to automatically set the restriction_site and ligation_site options" + "description": "Git commit id for Institutional configs.", + "default": "master", + "hidden": true, + "fa_icon": "fas fa-users-cog" }, - "restriction_site": { + "custom_config_base": { "type": "string", - "default": "'A^AGCTT'", - "description": "Restriction motifs used during digestion. Several motifs (comma separated) can be provided." + "description": "Base directory for Institutional configs.", + "default": "https://raw.githubusercontent.com/nf-core/configs/master", + "hidden": true, + "help_text": "If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.", + "fa_icon": "fas fa-users-cog" }, - "ligation_site": { + "hostnames": { "type": "string", - "default": "'AAGCTAGCTT", - "description": "Expected motif after DNA ligation. Several motifs (comma separated) can be provided." + "description": "Institutional configs hostname.", + "hidden": true, + "fa_icon": "fas fa-users-cog" }, - "chromosome_size": { + "config_profile_name": { "type": "string", - "description": "Full path to file specifying chromosome sizes (tab separated with chromosome name and size)`.", - "fa_icon": "far fa-file-alt", - "help_text": "If not specified, the pipeline will build this file from the reference genome file" + "description": "Institutional config name.", + "hidden": true, + "fa_icon": "fas fa-users-cog" }, - "restriction_fragments": { + "config_profile_description": { "type": "string", - "description": "Full path to restriction fragment (bed) file.", - "fa_icon": "far fa-file-alt", - "help_text": "This file depends on the Hi-C protocols and digestion strategy. If not provided, the pipeline will build it using the --restriction_site option" - }, - "save_reference": { - "type": "boolean", - "description": "If generated by the pipeline save the annotation and indexes in the results directory.", - "help_text": "Use this parameter to save all annotations to your results folder. These can then be used for future pipeline runs, reducing processing times.", - "fa_icon": "fas fa-save" - }, - "save_nonvalid_pairs": { - "type": "boolean", - "description": "Save the non valid pairs detected by HiC-Pro.", - "help_text": "Use this parameter to save non valid pairs detected by HiC-Pro (dangling-end, self-circle, re-ligation, filtered).", - "fa_icon": "fas fa-save" - } - } - }, - "dnase_hi_c": { - "title": "DNAse Hi-C", - "type": "object", - "description": "Parameters for protocols based on DNAse digestion", - "default": "", - "properties": { - "dnase": { - "type": "boolean", - "description": "For Hi-C protocols which are not based on enzyme digestion such as DNase Hi-C" - }, - "min_cis_dist": { - "type": "integer", - "description": "Minimum distance between loci to consider. Useful for --dnase mode to remove spurious ligation products. Only values > 0 are considered" - } - } - }, - "alignments": { - "title": "Alignments", - "type": "object", - "description": "Parameters for reads aligments", - "default": "", - "fa_icon": "fas fa-bahai", - "properties": { - "split_fastq": { - "type": "boolean", - "description": "Split the reads into chunks before running the pipelne", - "fa_icon": "fas fa-dna" - }, - "fastq_chunks_size": { - "type": "integer", - "description": "Read number per chunks if split_fastq is used", - "default": 20000000 - }, - "min_mapq": { - "type": "integer", - "default": 10, - "description": "Keep aligned reads with a minimum quality value" + "description": "Institutional config description.", + "hidden": true, + "fa_icon": "fas fa-users-cog" }, - "bwt2_opts_end2end": { + "config_profile_contact": { "type": "string", - "default": "'--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder'", - "description": "Option for HiC-Pro end-to-end bowtie mapping" + "description": "Institutional config contact information.", + "hidden": true, + "fa_icon": "fas fa-users-cog" }, - "bwt2_opts_trimmed": { + "config_profile_url": { "type": "string", - "default": "'--very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder'", - "description": "Option for HiC-Pro trimmed reads mapping" - }, - "save_aligned_intermediates": { - "type": "boolean", - "description": "Save all BAM files during two-steps mapping" - } - } - }, - "valid_pairs_detection": { - "title": "Valid Pairs Detection", - "type": "object", - "description": "Options to call significant interactions", - "default": "", - "fa_icon": "fas fa-signature", - "properties": { - "keep_dups": { - "type": "boolean", - "description": "Keep duplicated reads" - }, - "keep_multi": { - "type": "boolean", - "description": "Keep multi-aligned reads" - }, - "max_insert_size": { - "type": "integer", - "description": "Maximum fragment size to consider. Only values > 0 are considered" - }, - "min_insert_size": { - "type": "integer", - "description": "Minimum fragment size to consider. Only values > 0 are considered" - }, - "max_restriction_fragment_size": { - "type": "integer", - "description": "Maximum restriction fragment size to consider. Only values > 0 are considered" - }, - "min_restriction_fragment_size": { - "type": "integer", - "description": "Minimum restriction fragment size to consider. Only values > 0 are considered" - }, - "save_interaction_bam": { - "type": "boolean", - "description": "Save a BAM file where all reads are flagged by their interaction classes" + "description": "Institutional config URL link.", + "hidden": true, + "fa_icon": "fas fa-users-cog" } } }, - "contact_maps": { - "title": "Contact maps", + "max_job_request_options": { + "title": "Max job request options", "type": "object", - "description": "Options to build Hi-C contact maps", - "default": "", - "fa_icon": "fas fa-chess-board", + "fa_icon": "fab fa-acquisitions-incorporated", + "description": "Set the top limit for requested resources for any single job.", + "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", "properties": { - "bin_size": { - "type": "string", - "pattern": "^(\\d+)(,\\d+)*$", - "default": "1000000,500000", - "description": "Resolution to build the maps (comma separated)" - }, - "hicpro_maps": { - "type": "boolean", - "description": "Generate raw and normalized contact maps with HiC-Pro" - }, - "ice_filter_low_count_perc": { - "type": "number", - "default": 0.02, - "description": "Filter low counts rows before HiC-Pro normalization" - }, - "ice_filter_high_count_perc": { - "type": "integer", - "description": "Filter high counts rows before HiC-Pro normalization" - }, - "ice_eps": { - "type": "number", - "default": 0.1, - "description": "Threshold for HiC-Pro ICE convergence" - }, - "ice_max_iter": { + "max_cpus": { "type": "integer", - "default": 100, - "description": "Maximum number of iteraction for HiC-Pro ICE normalization" - }, - "res_zoomify": { - "type": "string", - "default": "5000", - "description": "Maximum resolution to build mcool file" - } - } - }, - "downstream_analysis": { - "title": "Downstream Analysis", - "type": "object", - "description": "Set up downstream analysis from contact maps", - "default": "", - "properties": { - "res_dist_decay": { - "type": "string", - "pattern": "^(\\d+)(,\\d+)*$", - "default": "1000000", - "description": "Resolution to build count/distance plot" - }, - "tads_caller": { - "type": "string", - "default": "hicexplorer,insulation", - "description": "Define methods for TADs calling" - }, - "res_tads": { - "type": "string", - "pattern": "^(\\d+)(,\\d+)*$", - "default": "40000,20000", - "description": "Resolution to run TADs callers (comma separated)" + "description": "Maximum number of CPUs that can be requested for any single job.", + "default": 16, + "fa_icon": "fas fa-microchip", + "hidden": true, + "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" }, - "res_compartments": { + "max_memory": { "type": "string", - "pattern": "^(\\d+)(,\\d+)*$", - "default": "250000", - "description": "Resolution for compartments calling" - } - } - }, - "skip_options": { - "title": "Skip options", - "type": "object", - "description": "Skip some steps of the pipeline", - "default": "", - "fa_icon": "fas fa-random", - "properties": { - "skip_maps": { - "type": "boolean", - "description": "Do not build contact maps" - }, - "skip_dist_decay": { - "type": "boolean", - "description": "Do not run distance/decay plot" - }, - "skip_tads": { - "type": "boolean", - "description": "Do not run TADs calling" + "description": "Maximum amount of memory that can be requested for any single job.", + "default": "128.GB", + "fa_icon": "fas fa-memory", + "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", + "hidden": true, + "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" }, - "skip_compartments": { + "max_time": { "type": "string", - "description": "Do not run compartments calling" - }, - "skip_balancing": { - "type": "boolean", - "description": "Do not run cooler balancing normalization" - }, - "skip_mcool": { - "type": "boolean", - "description": "Do not generate mcool file for Higlass visualization" - }, - "skip_multiqc": { - "type": "boolean", - "description": "Do not generate MultiQC report" + "description": "Maximum amount of time that can be requested for any single job.", + "default": "240.h", + "fa_icon": "far fa-clock", + "pattern": "^(\\d+\\.?\\s*(s|m|h|day)\\s*)+$", + "hidden": true, + "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" } } }, @@ -339,13 +181,12 @@ "help": { "type": "boolean", "description": "Display help text.", - "hidden": true, - "fa_icon": "fas fa-question-circle" + "fa_icon": "fas fa-question-circle", + "hidden": true }, "publish_dir_mode": { "type": "string", "default": "copy", - "hidden": true, "description": "Method used to save pipeline results to output directory.", "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.", "fa_icon": "fas fa-copy", @@ -356,13 +197,7 @@ "copy", "copyNoFollow", "move" - ] - }, - "validate_params": { - "type": "boolean", - "description": "Boolean whether to validate parameters against the schema at runtime", - "default": true, - "fa_icon": "fas fa-check-square", + ], "hidden": true }, "email_on_fail": { @@ -370,30 +205,28 @@ "description": "Email address for completion summary, only when pipeline fails.", "fa_icon": "fas fa-exclamation-triangle", "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$", - "hidden": true, - "help_text": "This works exactly as with `--email`, except emails are only sent if the workflow is not successful." + "help_text": "An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.", + "hidden": true }, "plaintext_email": { "type": "boolean", "description": "Send plain-text email instead of HTML.", "fa_icon": "fas fa-remove-format", - "hidden": true, - "help_text": "Set to receive plain-text e-mails instead of HTML formatted." + "hidden": true }, "max_multiqc_email_size": { "type": "string", "description": "File size limit when attaching MultiQC reports to summary emails.", + "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", "default": "25.MB", "fa_icon": "fas fa-file-upload", - "hidden": true, - "help_text": "If file generated by pipeline exceeds the threshold, it will not be attached." + "hidden": true }, "monochrome_logs": { "type": "boolean", "description": "Do not use coloured log outputs.", "fa_icon": "fas fa-palette", - "hidden": true, - "help_text": "Set to disable colourful command line output and live life in monochrome." + "hidden": true }, "multiqc_config": { "type": "string", @@ -408,101 +241,32 @@ "fa_icon": "fas fa-cogs", "hidden": true }, + "validate_params": { + "type": "boolean", + "description": "Boolean whether to validate parameters against the schema at runtime", + "default": true, + "fa_icon": "fas fa-check-square", + "hidden": true + }, "show_hidden_params": { "type": "boolean", "fa_icon": "far fa-eye-slash", "description": "Show all params when using `--help`", "hidden": true, "help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters." - } - } - }, - "max_job_request_options": { - "title": "Max job request options", - "type": "object", - "fa_icon": "fab fa-acquisitions-incorporated", - "description": "Set the top limit for requested resources for any single job.", - "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", - "properties": { - "max_cpus": { - "type": "integer", - "description": "Maximum number of CPUs that can be requested for any single job.", - "default": 16, - "fa_icon": "fas fa-microchip", - "hidden": true, - "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" - }, - "max_memory": { - "type": "string", - "description": "Maximum amount of memory that can be requested for any single job.", - "default": "128.GB", - "fa_icon": "fas fa-memory", - "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", - "hidden": true, - "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" - }, - "max_time": { - "type": "string", - "description": "Maximum amount of time that can be requested for any single job.", - "default": "240.h", - "fa_icon": "far fa-clock", - "pattern": "^(\\d+\\.?\\s*(s|m|h|day)\\s*)+$", - "hidden": true, - "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" - } - } - }, - "institutional_config_options": { - "title": "Institutional config options", - "type": "object", - "fa_icon": "fas fa-university", - "description": "Parameters used to describe centralised config profiles. These should not be edited.", - "help_text": "The centralised nf-core configuration profiles use a handful of pipeline parameters to describe themselves. This information is then printed to the Nextflow log when you run a pipeline. You should not need to change these values when you run a pipeline.", - "properties": { - "custom_config_version": { - "type": "string", - "description": "Git commit id for Institutional configs.", - "default": "master", - "hidden": true, - "fa_icon": "fas fa-users-cog", - "help_text": "Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`.\n\n```bash\n## Download and use config file with following git commit id\n--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96\n```" - }, - "custom_config_base": { - "type": "string", - "description": "Base directory for Institutional configs.", - "default": "https://raw.githubusercontent.com/nf-core/configs/master", - "hidden": true, - "help_text": "If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the `custom_config_base` option. For example:\n\n```bash\n## Download and unzip the config files\ncd /path/to/my/configs\nwget https://github.com/nf-core/configs/archive/master.zip\nunzip master.zip\n\n## Run the pipeline\ncd /path/to/my/data\nnextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/\n```\n\n> Note that the nf-core/tools helper package has a `download` command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.", - "fa_icon": "fas fa-users-cog" }, - "hostnames": { - "type": "string", - "description": "Institutional configs hostname.", - "hidden": true, - "fa_icon": "fas fa-users-cog" - }, - "config_profile_name": { - "type": "string", - "description": "Institutional config name", - "hidden": true - }, - "config_profile_description": { - "type": "string", - "description": "Institutional config description.", + "enable_conda": { + "type": "boolean", + "description": "Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter.", "hidden": true, - "fa_icon": "fas fa-users-cog" + "fa_icon": "fas fa-bacon" }, - "config_profile_contact": { - "type": "string", - "description": "Institutional config contact information.", - "hidden": true, - "fa_icon": "fas fa-users-cog" - }, - "config_profile_url": { - "type": "string", - "description": "Institutional config URL link.", + "singularity_pull_docker_container": { + "type": "boolean", + "description": "Instead of directly downloading Singularity images for use with Singularity, force the workflow to pull and convert Docker containers instead.", "hidden": true, - "fa_icon": "fas fa-users-cog" + "fa_icon": "fas fa-toolbox", + "help_text": "This may be useful for example if you are unable to directly pull Singularity containers to run the pipeline due to http/https proxy issues." } } } @@ -515,34 +279,13 @@ "$ref": "#/definitions/reference_genome_options" }, { - "$ref": "#/definitions/digestion_hi_c" - }, - { - "$ref": "#/definitions/dnase_hi_c" - }, - { - "$ref": "#/definitions/alignments" - }, - { - "$ref": "#/definitions/valid_pairs_detection" - }, - { - "$ref": "#/definitions/contact_maps" - }, - { - "$ref": "#/definitions/downstream_analysis" - }, - { - "$ref": "#/definitions/skip_options" - }, - { - "$ref": "#/definitions/generic_options" + "$ref": "#/definitions/institutional_config_options" }, { "$ref": "#/definitions/max_job_request_options" }, { - "$ref": "#/definitions/institutional_config_options" + "$ref": "#/definitions/generic_options" } ] } diff --git a/workflows/hic.nf b/workflows/hic.nf index 991eb0a052d039cdeb198206cd1ab37090857112..e0e809641395645a6756a893058d5b00e7aff203 100644 --- a/workflows/hic.nf +++ b/workflows/hic.nf @@ -49,6 +49,7 @@ include { HIC_PRO } from '../subworkflows/local/hicpro' addParams( options: [:] include { COOLER } from '../subworkflows/local/cooler' addParams( options: [:] ) include { COMPARTMENTS } from '../subworkflows/local/compartments' addParams( options: [:] ) include { TADS } from '../subworkflows/local/tads' addParams( options: [:] ) + /* ======================================================================================== IMPORT NF-CORE MODULES/SUBWORKFLOWS