diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 62ad6c259efad78784730845c6e83ac9e7b158ab..25ef2ed3c3c87f3ab115e92de6d830fdb6718b4d 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -9,9 +9,7 @@ Please use the pre-filled template to save time. However, don't be put off by this template - other more general issues and suggestions are welcome! Contributions to the code are even more welcome ;) -> If you need help using or modifying nf-core/hic then the best place to ask is on the nf-core -Slack [#hic](https://nfcore.slack.com/channels/hic) channel ([join our Slack here](https://nf-co.re/join/slack)). - +> If you need help using or modifying nf-core/hic then the best place to ask is on the nf-core Slack [#hic](https://nfcore.slack.com/channels/hic) channel ([join our Slack here](https://nf-co.re/join/slack)). ## Contribution workflow @@ -20,8 +18,9 @@ If you'd like to write some code for nf-core/hic, the standard workflow is as fo 1. Check that there isn't already an issue about your idea in the [nf-core/hic issues](https://github.com/nf-core/hic/issues) to avoid duplicating work * If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/hic repository](https://github.com/nf-core/hic) to your GitHub account -3. Make the necessary changes / additions within your forked repository -4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged +3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) +4. Use `nf-core schema build .` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -32,14 +31,14 @@ Typically, pull-requests are only fully reviewed when these tests are passing, t There are typically two types of tests that run: -### Lint Tests +### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint <pipeline-directory>` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. -### Pipeline Tests +### Pipeline tests Each `nf-core` pipeline should be set up with a minimal set of test-data. `GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully. @@ -55,8 +54,75 @@ These tests are run both with the latest available version of `Nextflow` and als * A PR should be made on `master` from patch to directly this particular bug. ## Getting help -For further information/help, please consult the [nf-core/hic documentation](https://nf-co.re/nf-core/hic/docs) and -don't hesitate to get in touch on the nf-core Slack [#hic](https://nfcore.slack.com/channels/hic) channel -([join our Slack here](https://nf-co.re/join/slack)). For further information/help, please consult the [nf-core/hic documentation](https://nf-co.re/hic/usage) and don't hesitate to get in touch on the nf-core Slack [#hic](https://nfcore.slack.com/channels/hic) channel ([join our Slack here](https://nf-co.re/join/slack)). + +## Pipeline contribution conventions + +To make the nf-core/hic code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written. + +### Adding a new step + +If you wish to contribute a new step, please use the following coding standards: + +1. Define the corresponding input channel into your new process from the expected previous process channel +2. Write the process block (see below). +3. Define the output channel if needed (see below). +4. Add any new flags/options to `nextflow.config` with a default (see below). +5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`) +6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter). +7. Add sanity checks for all relevant parameters. +8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`. +9. Do local tests that the new code works properly and as expected. +10. Add a new test command in `.github/workflow/ci.yaml`. +11. If applicable add a [MultiQC](https://https://multiqc.info/) module. +12. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order. +13. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`. + +### Default values + +Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. + +Once there, use `nf-core schema build .` to add to `nextflow_schema.json`. + +### Default processes resource requirements + +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/%7B%7Bcookiecutter.name_noslash%7D%7D/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. + +The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block. + +### Naming schemes + +Please use the following naming schemes, to make it easy to understand what is going where. + +* initial process channel: `ch_output_from_<process>` +* intermediate and terminal channels: `ch_<previousprocess>_for_<nextprocess>` + +### Nextflow version bumping + +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` + +### Software version reporting + +If you add a new tool to the pipeline, please ensure you add the information of the tool to the `get_software_version` process. + +Add to the script block of the process, something like the following: + +```bash +<YOUR_TOOL> --version &> v_<YOUR_TOOL>.txt 2>&1 || true +``` + +or + +```bash +<YOUR_TOOL> --help | head -n 1 &> v_<YOUR_TOOL>.txt 2>&1 || true +``` + +You then need to edit the script `bin/scrape_software_versions.py` to: + +1. Add a Python regex for your tool's `--version` output (as in stored in the `v_<YOUR_TOOL>.txt` file), to ensure the version is reported as a `v` and the version number e.g. `v2.1.1` +2. Add a HTML entry to the `OrderedDict` for formatting in MultiQC. + +### Images and figures + +For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index 78c45d55dcf3cc4514fc8facc7a7a21be8f5d220..5ac2f7f52734df2cde27f756dcf7de5e4525463c 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -13,6 +13,13 @@ Thanks for telling us about a problem with the pipeline. Please delete this text and anything that's not relevant from the template below: --> +## Check Documentation + +I have checked the following places for your error: + +- [ ] [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting) +- [ ] [nf-core/hic pipeline documentation](https://nf-co.re/nf-core/hic/usage) + ## Description of the bug <!-- A clear and concise description of what the bug is. --> @@ -28,6 +35,13 @@ Steps to reproduce the behaviour: <!-- A clear and concise description of what you expected to happen. --> +## Log files + +Have you provided the following extra information/files: + +- [ ] The command used to run the pipeline +- [ ] The `.nextflow.log` file <!-- this is a hidden file in the directory where you launched the pipeline --> + ## System - Hardware: <!-- [e.g. HPC, Desktop, Cloud...] --> diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index 1a92849358ce849acd440d8a6806aeeb8a8e92fc..2e01a5fe11f6ed4f3e5bfb4bcaff8c8b7bdc56d5 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -11,7 +11,6 @@ Hi there! Thanks for suggesting a new feature for the pipeline! Please delete this text and anything that's not relevant from the template below: - --> ## Is your feature request related to a problem? Please describe diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 0bdd57579b660e0b42eaa248a2b5e1a163ed95e3..fe95321696a9a5b3a88b419e21e2e593e375d493 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -13,9 +13,15 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/hic/ ## PR checklist -- [ ] This comment contains a description of changes (with reason) -- [ ] `CHANGELOG.md` is updated +- [ ] This comment contains a description of changes (with reason). - [ ] If you've fixed a bug or added code that should be tested, add tests! -- [ ] Documentation in `docs` is updated -- [ ] If necessary, also make a PR on the [nf-core/hic branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/hic) + - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py` + - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/hic/tree/master/.github/CONTRIBUTING.md) + - [ ] If necessary, also make a PR on the nf-core/hic _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. +- [ ] Make sure your code lints (`nf-core lint .`). +- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). +- [ ] Usage Documentation in `docs/usage.md` is updated. +- [ ] Output Documentation in `docs/output.md` is updated. +- [ ] `CHANGELOG.md` is updated. +- [ ] `README.md` is updated (including new tool citations and authors/contributors). diff --git a/.github/markdownlint.yml b/.github/markdownlint.yml index 96b12a70398f6870ef306f4d8a5afcebc8f96ba8..8d7eb53b07463c24bd981a479a7d0591fabf7463 100644 --- a/.github/markdownlint.yml +++ b/.github/markdownlint.yml @@ -1,5 +1,12 @@ # Markdownlint configuration file -default: true, +default: true line-length: false no-duplicate-header: siblings_only: true +no-inline-html: + allowed_elements: + - img + - p + - kbd + - details + - summary diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 6367daba8be05f3cb37649750dad86f8dc88a3c6..6f2be6b08786ec7559b42df36d16c10790e60172 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -33,8 +33,10 @@ jobs: nf-core: runs-on: ubuntu-latest steps: + - name: Check out pipeline code uses: actions/checkout@v2 + - name: Install Nextflow env: CAPSULE_LOG: none @@ -72,5 +74,3 @@ jobs: lint_log.txt lint_results.md PR_number.txt - - diff --git a/CHANGELOG.md b/CHANGELOG.md index 5ea324166000ae82243653f594f1438ee217745f..27746ee7adb1b77eef6c06d9b43ad51904213875 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,7 +3,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## v1.3.0dev +## v1.3.0dev * Add HiCExplorer distance decay quality control * Add HiCExplorer TADs calling @@ -21,7 +21,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * Fix bug in `--bin_size` parameter (#85) * `--min_mapq` is ignored if `--keep_multi` is used -### Deprecated +### `Deprecated` * `--rm_dup` and `--rm_multi` are replaced by `--keep_dup` and `--keep_multi` diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 9d68eed2ae8c493a162c2294cdb7e5f229df6283..daea9ea82d791ff54b2b19755a09371e0ae330cc 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -54,13 +54,7 @@ project may be further defined and clarified by project maintainers. ## Enforcement -Instances of abusive, harassing, or otherwise unacceptable behavior may be -reported by contacting the project team on -[Slack](https://nf-co.re/join/slack). The project team will review -and investigate all complaints, and will respond in a way that it deems -appropriate to the circumstances. The project team is obligated to maintain -confidentiality with regard to the reporter of an incident. Further details -of specific enforcement policies may be posted separately. +Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-co.re/join/slack). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other @@ -68,9 +62,7 @@ members of the project's leadership. ## Attribution -This Code of Conduct is adapted from the [Contributor Covenant][homepage], -version 1.4, available at -[https://www.contributor-covenant.org/version/1/4/code-of-conduct/][version] +This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [https://www.contributor-covenant.org/version/1/4/code-of-conduct/][version] [homepage]: https://contributor-covenant.org [version]: https://www.contributor-covenant.org/version/1/4/code-of-conduct/ diff --git a/Dockerfile b/Dockerfile index 422e2e16938a79677dc89d9e0dce16d24d1e9a4c..35ffbe997bc475e7d4ed69da9508c1c5a0a6e426 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,10 +1,11 @@ -FROM nfcore/base:1.12 +FROM nfcore/base:1.12.1 LABEL authors="Nicolas Servant" \ description="Docker image containing all software requirements for the nf-core/hic pipeline" ## Install gcc for pip iced install RUN apt-get update && apt-get install -y gcc g++ && apt-get clean -y +# Install the conda environment COPY environment.yml / RUN conda env create --quiet -f /environment.yml && conda clean -a diff --git a/README.md b/README.md index 280a06a8d35806051c44260ff7646dd627ce8a16..8fd90f44a6ff1119c92085f8d464a8ac3d5fe72c 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -#  +#  **Analysis of Chromosome Conformation Capture data (Hi-C)**. @@ -70,8 +70,6 @@ sites ([`bowtie2`](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)) nextflow run nf-core/hic -profile <docker/singularity/podman/conda/institute> --input '*_R{1,2}.fastq.gz' --genome GRCh37 ``` -See [usage docs](https://nf-co.re/hic/usage) for all of the available options when running the pipeline. - ## Documentation The nf-core/hic pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/hic/usage) and [output](https://nf-co.re/hic/output). @@ -87,9 +85,7 @@ nf-core/hic was originally written by Nicolas Servant. If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). -For further information or help, don't hesitate to get in touch on the -[Slack `#hic` channel](https://nfcore.slack.com/channels/hic) -(you can join with [this invite](https://nf-co.re/join/slack)). +For further information or help, don't hesitate to get in touch on the [Slack `#hic` channel](https://nfcore.slack.com/channels/hic) (you can join with [this invite](https://nf-co.re/join/slack)). ## Citation @@ -100,8 +96,15 @@ You can cite the `nf-core` publication as follows: > **The nf-core framework for community-curated bioinformatics pipelines.** > -> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, -Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. +> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. > > _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). > ReadCube: [Full Access Link](https://rdcu.be/b1GjZ) + +In addition, references of tools and data used in this pipeline are as follows: + +> **HiC-Pro: An optimized and flexible pipeline for Hi-C processing.** +> +> Nicolas Servant, Nelle Varoquaux, Bryan R. Lajoie, Eric Viara, Chongjian Chen, Jean-Philippe Vert, Job Dekker, Edith Heard, Emmanuel Barillot. +> +> Genome Biology 2015, 16:259 doi: [10.1186/s13059-015-0831-x](https://dx.doi.org/10.1186/s13059-015-0831-x) diff --git a/bin/mergeSAM.py b/bin/mergeSAM.py index 12917b16277a0a768269f611cd13422bccbe98a1..a907fd77438307ffc808ce7d5ac0d7684c22f5f8 100755 --- a/bin/mergeSAM.py +++ b/bin/mergeSAM.py @@ -52,16 +52,16 @@ def get_args(): def is_unique_bowtie2(read): - ret = False - if not read.is_unmapped and read.has_tag('AS'): - if read.has_tag('XS'): - primary = read.get_tag('AS') - secondary = read.get_tag('XS') - if (primary > secondary): - ret = True - else: - ret = True - return ret + ret = False + if not read.is_unmapped and read.has_tag('AS'): + if read.has_tag('XS'): + primary = read.get_tag('AS') + secondary = read.get_tag('XS') + if (primary > secondary): + ret = True + else: + ret = True + return ret ## Remove everything after "/" or " " in read's name def get_read_name(read): @@ -71,249 +71,239 @@ def get_read_name(read): def sam_flag(read1, read2, hr1, hr2): - f1 = read1.flag - f2 = read2.flag - - if r1.is_unmapped == False: - r1_chrom = hr1.get_reference_name(r1.reference_id) - else: - r1_chrom = "*" - if r2.is_unmapped == False: - r2_chrom = hr2.get_reference_name(r2.reference_id) - else: - r2_chrom="*" - - - ##Relevant bitwise flags (flag in an 11-bit binary number) - ##1 The read is one of a pair - ##2 The alignment is one end of a proper paired-end alignment - ##4 The read has no reported alignments - ##8 The read is one of a pair and has no reported alignments - ##16 The alignment is to the reverse reference strand - ##32 The other mate in the paired-end alignment is aligned to the reverse reference strand - ##64 The read is the first (#1) mate in a pair - ##128 The read is the second (#2) mate in a pair + f1 = read1.flag + f2 = read2.flag + + if r1.is_unmapped == False: + r1_chrom = hr1.get_reference_name(r1.reference_id) + else: + r1_chrom = "*" + if r2.is_unmapped == False: + r2_chrom = hr2.get_reference_name(r2.reference_id) + else: + r2_chrom="*" + + ##Relevant bitwise flags (flag in an 11-bit binary number) + ##1 The read is one of a pair + ##2 The alignment is one end of a proper paired-end alignment + ##4 The read has no reported alignments + ##8 The read is one of a pair and has no reported alignments + ##16 The alignment is to the reverse reference strand + ##32 The other mate in the paired-end alignment is aligned to the reverse reference strand + ##64 The read is the first (#1) mate in a pair + ##128 The read is the second (#2) mate in a pair - ##The reads were mapped as single-end data, so should expect flags of - ##0 (map to the '+' strand) or 16 (map to the '-' strand) - ##Output example: a paired-end read that aligns to the reverse strand - ##and is the first mate in the pair will have flag 83 (= 64 + 16 + 2 + 1) + ##The reads were mapped as single-end data, so should expect flags of + ##0 (map to the '+' strand) or 16 (map to the '-' strand) + ##Output example: a paired-end read that aligns to the reverse strand + ##and is the first mate in the pair will have flag 83 (= 64 + 16 + 2 + 1) - if f1 & 0x4: - f1 = f1 | 0x8 + if f1 & 0x4: + f1 = f1 | 0x8 - if f2 & 0x4: - f2 = f2 | 0x8 + if f2 & 0x4: + f2 = f2 | 0x8 - if (not (f1 & 0x4) and not (f2 & 0x4)): - ##The flag should now indicate this is paired-end data - f1 = f1 | 0x1 - f1 = f1 | 0x2 - f2 = f2 | 0x1 - f2 = f2 | 0x2 - + if (not (f1 & 0x4) and not (f2 & 0x4)): + ##The flag should now indicate this is paired-end data + f1 = f1 | 0x1 + f1 = f1 | 0x2 + f2 = f2 | 0x1 + f2 = f2 | 0x2 - ##Indicate if the pair is on the reverse strand - if f1 & 0x10: - f2 = f2 | 0x20 + ##Indicate if the pair is on the reverse strand + if f1 & 0x10: + f2 = f2 | 0x20 - if f2 & 0x10: - f1 = f1 | 0x20 + if f2 & 0x10: + f1 = f1 | 0x20 - ##Is this first or the second pair? - f1 = f1 | 0x40 - f2 = f2 | 0x80 + ##Is this first or the second pair? + f1 = f1 | 0x40 + f2 = f2 | 0x80 ##Insert the modified bitwise flags into the reads - read1.flag = f1 - read2.flag = f2 + read1.flag = f1 + read2.flag = f2 - ##Determine the RNEXT and PNEXT values (i.e. the positional values of a read's pair) - #RNEXT - if r1_chrom == r2_chrom: - read1.next_reference_id = r1.reference_id - read2.next_reference_id = r1.reference_id - else: - read1.next_reference_id = r2.reference_id - read2.next_reference_id = r1.reference_id - #PNEXT - read1.next_reference_start = read2.reference_start - read2.next_reference_start = read1.reference_start + ##Determine the RNEXT and PNEXT values (i.e. the positional values of a read's pair) + #RNEXT + if r1_chrom == r2_chrom: + read1.next_reference_id = r1.reference_id + read2.next_reference_id = r1.reference_id + else: + read1.next_reference_id = r2.reference_id + read2.next_reference_id = r1.reference_id + #PNEXT + read1.next_reference_start = read2.reference_start + read2.next_reference_start = read1.reference_start - return(read1, read2) + return(read1, read2) if __name__ == "__main__": ## Read command line arguments - opts = get_args() - inputFile = None - outputFile = None - mapq = None - report_single = False - report_multi = False - verbose = False - stat = False - output = "-" - - if len(opts) == 0: - usage() - sys.exit() - - for opt, arg in opts: - if opt in ("-h", "--help"): - usage() - sys.exit() - elif opt in ("-f", "--forward"): - R1file = arg - elif opt in ("-r", "--reverse"): - R2file = arg - elif opt in ("-o", "--output"): - output = arg - elif opt in ("-q", "--qual"): - mapq = arg - elif opt in ("-s", "--single"): - report_single = True - elif opt in ("-m", "--multi"): - report_multi = True - elif opt in ("-t", "--stat"): - stat = True - elif opt in ("-v", "--verbose"): - verbose = True - else: - assert False, "unhandled option" + opts = get_args() + inputFile = None + outputFile = None + mapq = None + report_single = False + report_multi = False + verbose = False + stat = False + output = "-" + + if len(opts) == 0: + usage() + sys.exit() + + for opt, arg in opts: + if opt in ("-h", "--help"): + usage() + sys.exit() + elif opt in ("-f", "--forward"): + R1file = arg + elif opt in ("-r", "--reverse"): + R2file = arg + elif opt in ("-o", "--output"): + output = arg + elif opt in ("-q", "--qual"): + mapq = arg + elif opt in ("-s", "--single"): + report_single = True + elif opt in ("-m", "--multi"): + report_multi = True + elif opt in ("-t", "--stat"): + stat = True + elif opt in ("-v", "--verbose"): + verbose = True + else: + assert False, "unhandled option" ## Verbose mode - if verbose: - print("## mergeBAM.py") - print("## forward=", R1file) - print("## reverse=", R2file) - print("## output=", output) - print("## min mapq=", mapq) - print("## report_single=", report_single) - print("## report_multi=", report_multi) - print("## verbose=", verbose) + if verbose: + print("## mergeBAM.py") + print("## forward=", R1file) + print("## reverse=", R2file) + print("## output=", output) + print("## min mapq=", mapq) + print("## report_single=", report_single) + print("## report_multi=", report_multi) + print("## verbose=", verbose) ## Initialize variables - tot_pairs_counter = 0 - multi_pairs_counter = 0 - uniq_pairs_counter = 0 - unmapped_pairs_counter = 0 - lowq_pairs_counter = 0 - multi_singles_counter = 0 - uniq_singles_counter = 0 - lowq_singles_counter = 0 + tot_pairs_counter = 0 + multi_pairs_counter = 0 + uniq_pairs_counter = 0 + unmapped_pairs_counter = 0 + lowq_pairs_counter = 0 + multi_singles_counter = 0 + uniq_singles_counter = 0 + lowq_singles_counter = 0 #local_counter = 0 - paired_reads_counter = 0 - singleton_counter = 0 - reads_counter = 0 - r1 = None - r2 = None + paired_reads_counter = 0 + singleton_counter = 0 + reads_counter = 0 + r1 = None + r2 = None ## Reads are 0-based too (for both SAM and BAM format) ## Loop on all reads - if verbose: - print("## Merging forward and reverse tags ...") - with pysam.Samfile(R1file, "rb") as hr1, pysam.Samfile(R2file, "rb") as hr2: - if output == "-": - outfile = pysam.AlignmentFile(output, "w", template=hr1) - else: - outfile = pysam.AlignmentFile(output, "wb", template=hr1) - for r1, r2 in zip(hr1.fetch(until_eof=True), hr2.fetch(until_eof=True)): - reads_counter +=1 - - #print r1 - #print r2 - #print hr1.getrname(r1.tid) - #print hr2.getrname(r2.tid) - - if (reads_counter % 1000000 == 0 and verbose): - print("##", reads_counter) + if verbose: + print("## Merging forward and reverse tags ...") + + with pysam.Samfile(R1file, "rb") as hr1, pysam.Samfile(R2file, "rb") as hr2: + if output == "-": + outfile = pysam.AlignmentFile(output, "w", template=hr1) + else: + outfile = pysam.AlignmentFile(output, "wb", template=hr1) + + for r1, r2 in zip(hr1.fetch(until_eof=True), hr2.fetch(until_eof=True)): + reads_counter +=1 + if (reads_counter % 1000000 == 0 and verbose): + print("##", reads_counter) - if get_read_name(r1) == get_read_name(r2): + if get_read_name(r1) == get_read_name(r2): + ## both unmapped + if r1.is_unmapped == True and r2.is_unmapped == True: + unmapped_pairs_counter += 1 + continue - ## both unmapped - if r1.is_unmapped == True and r2.is_unmapped == True: - unmapped_pairs_counter += 1 - continue - ## both mapped - elif r1.is_unmapped == False and r2.is_unmapped == False: - ## quality - if mapq != None and (r1.mapping_quality < int(mapq) or r2.mapping_quality < int(mapq)): - lowq_pairs_counter += 1 - continue + elif r1.is_unmapped == False and r2.is_unmapped == False: + ## quality + if mapq != None and (r1.mapping_quality < int(mapq) or r2.mapping_quality < int(mapq)): + lowq_pairs_counter += 1 + continue - ## Unique mapping - if is_unique_bowtie2(r1) == True and is_unique_bowtie2(r2) == True: - uniq_pairs_counter += 1 - else: - multi_pairs_counter += 1 - if report_multi == False: - continue - # one end mapped, other is not - else: - singleton_counter += 1 - if report_single == False: - continue - if r1.is_unmapped == False: ## first end is mapped, second is not - ## quality - if mapq != None and (r1.mapping_quality < int(mapq)): - lowq_singles_counter += 1 - continue - ## Unique mapping - if is_unique_bowtie2(r1) == True: - uniq_singles_counter += 1 - else: - multi_singles_counter += 1 - if report_multi == False: - continue - else: ## second end is mapped, first is not - ## quality - if mapq != None and (r2.mapping_quality < int(mapq)): - lowq_singles_counter += 1 - continue - ## Unique mapping - if is_unique_bowtie2(r2) == True: - uniq_singles_counter += 1 - else: - multi_singles_counter += 1 - if report_multi == False: - continue + ## Unique mapping + if is_unique_bowtie2(r1) == True and is_unique_bowtie2(r2) == True: + uniq_pairs_counter += 1 + else: + multi_pairs_counter += 1 + if report_multi == False: + continue + + ## One mate maped + else: + singleton_counter += 1 + if report_single == False: + continue + if r1.is_unmapped == False: ## first end is mapped, second is not + ## quality + if mapq != None and (r1.mapping_quality < int(mapq)): + lowq_singles_counter += 1 + continue + ## Unique mapping + if is_unique_bowtie2(r1) == True: + uniq_singles_counter += 1 + else: + multi_singles_counter += 1 + if report_multi == False: + continue + else: ## second end is mapped, first is not + ## quality + if mapq != None and (r2.mapping_quality < int(mapq)): + lowq_singles_counter += 1 + continue + ## Unique mapping + if is_unique_bowtie2(r2) == True: + uniq_singles_counter += 1 + else: + multi_singles_counter += 1 + if report_multi == False: + continue + + tot_pairs_counter += 1 + (r1, r2) = sam_flag(r1,r2, hr1, hr2) - tot_pairs_counter += 1 - (r1, r2) = sam_flag(r1,r2, hr1, hr2) - - #print hr1.getrname(r1.tid) - #print hr2.getrname(r2.tid) - #print r1 - #print r2 ## Write output - outfile.write(r1) - outfile.write(r2) - - else: - print("Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.") - sys.exit(1) - - if stat: - if output == '-': - statfile = "pairing.stat" - else: - statfile = re.sub('\.bam$', '.pairstat', output) - with open(statfile, 'w') as handle_stat: - handle_stat.write("Total_pairs_processed\t" + str(reads_counter) + "\t" + str(round(float(reads_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Unmapped_pairs\t" + str(unmapped_pairs_counter) + "\t" + str(round(float(unmapped_pairs_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Low_qual_pairs\t" + str(lowq_pairs_counter) + "\t" + str(round(float(lowq_pairs_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Unique_paired_alignments\t" + str(uniq_pairs_counter) + "\t" + str(round(float(uniq_pairs_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Multiple_pairs_alignments\t" + str(multi_pairs_counter) + "\t" + str(round(float(multi_pairs_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Pairs_with_singleton\t" + str(singleton_counter) + "\t" + str(round(float(singleton_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Low_qual_singleton\t" + str(lowq_singles_counter) + "\t" + str(round(float(lowq_singles_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Unique_singleton_alignments\t" + str(uniq_singles_counter) + "\t" + str(round(float(uniq_singles_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Multiple_singleton_alignments\t" + str(multi_singles_counter) + "\t" + str(round(float(multi_singles_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Reported_pairs\t" + str(tot_pairs_counter) + "\t" + str(round(float(tot_pairs_counter)/float(reads_counter)*100,3)) + "\n") - hr1.close() - hr2.close() - outfile.close() + outfile.write(r1) + outfile.write(r2) + + else: + print("Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.") + sys.exit(1) + + if stat: + if output == '-': + statfile = "pairing.stat" + else: + statfile = re.sub('\.bam$', '.pairstat', output) + with open(statfile, 'w') as handle_stat: + handle_stat.write("Total_pairs_processed\t" + str(reads_counter) + "\t" + str(round(float(reads_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Unmapped_pairs\t" + str(unmapped_pairs_counter) + "\t" + str(round(float(unmapped_pairs_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Low_qual_pairs\t" + str(lowq_pairs_counter) + "\t" + str(round(float(lowq_pairs_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Unique_paired_alignments\t" + str(uniq_pairs_counter) + "\t" + str(round(float(uniq_pairs_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Multiple_pairs_alignments\t" + str(multi_pairs_counter) + "\t" + str(round(float(multi_pairs_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Pairs_with_singleton\t" + str(singleton_counter) + "\t" + str(round(float(singleton_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Low_qual_singleton\t" + str(lowq_singles_counter) + "\t" + str(round(float(lowq_singles_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Unique_singleton_alignments\t" + str(uniq_singles_counter) + "\t" + str(round(float(uniq_singles_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Multiple_singleton_alignments\t" + str(multi_singles_counter) + "\t" + str(round(float(multi_singles_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Reported_pairs\t" + str(tot_pairs_counter) + "\t" + str(round(float(tot_pairs_counter)/float(reads_counter)*100,3)) + "\n") + hr1.close() + hr2.close() + outfile.close() diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index f882f23325768249c6f6f0122f0d238bd9a7df37..5ff3fcfe270923ed0aeeec220e82a348a529b3e4 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -36,11 +36,6 @@ for k in list(results): if not results[k]: del results[k] -# Remove software set to false in results -for k in results: - if not results[k]: - del(results[k]) - # Dump to YAML print( """ @@ -61,4 +56,3 @@ print(" </dl>") with open("software_versions.csv", "w") as f: for k, v in results.items(): f.write("{}\t{}\n".format(k, v)) - diff --git a/conf/base.config b/conf/base.config index 157dd9548a110b9f2f710d3072850608fa9c2de5..c301031e67fecd8f4899b5bbc53f2c3adae9dcd3 100644 --- a/conf/base.config +++ b/conf/base.config @@ -10,7 +10,6 @@ */ process { - // nf-core: Check the defaults for all processes cpus = { check_max( 1 * task.attempt, 'cpus' ) } memory = { check_max( 7.GB * task.attempt, 'memory' ) } time = { check_max( 4.h * task.attempt, 'time' ) } diff --git a/conf/hicpro.config b/conf/hicpro.config deleted file mode 100644 index cd0cf0b5a54f860312f49ac193802d53964ce686..0000000000000000000000000000000000000000 --- a/conf/hicpro.config +++ /dev/null @@ -1,38 +0,0 @@ -/* - * ------------------------------------------------- - * Nextflow config file for Genomes paths - * ------------------------------------------------- - * Defines reference genomes - * Can be used by any config that customises the base - * path using $params.genomes_base / --genomes_base - */ - -params { - - // Alignment options - bwt2_opts_end2end = '--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder' - bwt2_opts_trimmed = '--very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder' - min_mapq = 10 - - // Digestion Hi-C - restriction_site = 'A^AGCTT' - ligation_site = 'AAGCTAGCTT' - min_restriction_fragment_size = - max_restriction_fragment_size = - min_insert_size = - max_insert_size = - - // Hi-C Processing - min_cis_dist = - rm_singleton = true - rm_multi = true - rm_dup = true - - bin_size = '1000000,500000' - - ice_max_iter = 100 - ice_filer_low_count_perc = 0.02 - ice_filer_high_count_perc = 0 - ice_eps = 0.1 -} - diff --git a/conf/test.config b/conf/test.config index 5b3ac0e1395c89032ebfe04e8688e62d1abb660f..5988a32428569c24f4f4d321ee727f617d3da202 100644 --- a/conf/test.config +++ b/conf/test.config @@ -8,8 +8,7 @@ */ params { - -config_profile_name = 'Hi-C test data from Schalbetter et al. (2017)' + config_profile_name = 'Hi-C test data from Schalbetter et al. (2017)' config_profile_description = 'Minimal test dataset to check pipeline function' // Limit resources so that this can run on Travis @@ -25,12 +24,12 @@ config_profile_name = 'Hi-C test data from Schalbetter et al. (2017)' // Annotations fasta = 'https://github.com/nf-core/test-datasets/raw/hic/reference/W303_SGD_2015_JRIU00000000.fsa' digestion = 'hindiii' - min_mapq = 20 + min_mapq = 10 min_restriction_fragment_size = 100 max_restriction_fragment_size = 100000 min_insert_size = 100 max_insert_size = 600 - - // Options - skip_cool = true + + //hicexplorer does not run + skip_dist_decay = true } diff --git a/conf/test_full.config b/conf/test_full.config index 47d31760585c66025666f112dcd03a23faeac543..65dcbf8f5ddbce6c5e46c3160461e87a3ee56e98 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -11,6 +11,8 @@ params { config_profile_name = 'Full test profile' config_profile_description = 'Full test dataset to check pipeline function' + // TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA) + // TODO nf-core: Give any required params for the test so that command line flags are not needed // Input data for full size test input_paths = [ ['SRR4292758_00', ['https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R2.fastq.gz']] diff --git a/docs/README.md b/docs/README.md index bdbc92abc939ff716f3fcaba1b5069be471c9049..a6889549c7f27bda0aed81947685713781fe2d1b 100644 --- a/docs/README.md +++ b/docs/README.md @@ -3,11 +3,8 @@ The nf-core/hic documentation is split into the following pages: * [Usage](usage.md) - * An overview of how the pipeline works, how to run it and a - description of all of the different command-line flags. + * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. * [Output](output.md) - * An overview of the different results produced by the pipeline - and how to interpret them. + * An overview of the different results produced by the pipeline and how to interpret them. -You can find a lot more documentation about installing, configuring -and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re) +You can find a lot more documentation about installing, configuring and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re) diff --git a/docs/output.md b/docs/output.md index 895a4f2a16d75e7ca0dfb21c13d0cccc7ea3a322..d4092a050ff5954e5782bd0a8b12ffa0bdfe37a4 100644 --- a/docs/output.md +++ b/docs/output.md @@ -7,10 +7,7 @@ ## Introduction This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. - -The directories listed below will be created in the results directory -after the pipeline has finished. All paths are relative to the top-level -results directory. +The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. ## Pipeline overview diff --git a/docs/usage.md b/docs/usage.md index be630a4298976e67bec49659f6dc6bc38ac86cc6..4e7946caeb3bdd2e854060472b37c7ac66c310f5 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -28,12 +28,7 @@ results # Finished results (configurable, see below) ### Updating the pipeline -When you run the above command, Nextflow automatically pulls the pipeline code -from GitHub and stores it as a cached version. When running the pipeline after -this, it will always use the cached version if available - even if the pipeline -has been updated since. To make sure that you're running the latest version of -the pipeline, make sure that you regularly update the cached version of the -pipeline: +When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: ```bash nextflow pull nf-core/hic @@ -41,17 +36,7 @@ nextflow pull nf-core/hic ### Reproducibility -It's a good idea to specify a pipeline version when running the pipeline on -your data. This ensures that a specific version of the pipeline code and -software are used when you run your pipeline. If you keep using the same tag, -you'll be running the same version of the pipeline, even if there have been -changes to the code since. - -It's a good idea to specify a pipeline version when running the pipeline on -your data. This ensures that a specific version of the pipeline code and -software are used when you run your pipeline. If you keep using the same tag, -you'll be running the same version of the pipeline, even if there have been -changes to the code since. +It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. First, go to the [nf-core/hic releases page](https://github.com/nf-core/hic/releases) and find diff --git a/environment.yml b/environment.yml index 6ee111e3beddb0e3c014d31ceec9d4a64bff25e6..90d9eea12420ed6e6066608aadc55d16120e8dfa 100644 --- a/environment.yml +++ b/environment.yml @@ -27,4 +27,4 @@ dependencies: - conda-forge::cython=0.29.19 - pip: - cooltools==0.3.2 - - fanc==0.8.30 \ No newline at end of file + - fanc==0.8.30 diff --git a/main.nf b/main.nf index 91c3cf7612358bed8dc813ad7513d84813ba3515..b6a66d306941907e8d14654560e0195773e44909 100644 --- a/main.nf +++ b/main.nf @@ -10,7 +10,6 @@ */ def helpMessage() { - // Add to this help message with new command line parameters log.info nfcoreHeader() log.info""" @@ -107,6 +106,7 @@ if (params.genomes && params.genome && !params.genomes.containsKey(params.genome if (params.digest && params.digestion && !params.digest.containsKey(params.digestion)) { exit 1, "Unknown digestion protocol. Currently, the available digestion options are ${params.digest.keySet().join(", ")}. Please set manually the '--restriction_site' and '--ligation_site' parameters." } + params.restriction_site = params.digestion ? params.digest[ params.digestion ].restriction_site ?: false : false params.ligation_site = params.digestion ? params.digest[ params.digestion ].ligation_site ?: false : false @@ -143,13 +143,10 @@ ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multi ch_output_docs = file("$projectDir/docs/output.md", checkIfExists: true) ch_output_docs_images = file("$projectDir/docs/images/", checkIfExists: true) -/********************************************************** - * SET UP CHANNELS - */ - /* * input read files */ + if (params.input_paths){ raw_reads = Channel.create() @@ -382,26 +379,6 @@ process get_software_versions { """ } -def create_workflow_summary(summary) { - - def yaml_file = workDir.resolve('workflow_summary_mqc.yaml') - yaml_file.text = """ - id: 'nf-core-hic-summary' - description: " - this information is collected when the pipeline is started." - section_name: 'nf-core/hic Workflow Summary' - section_href: 'https://github.com/nf-core/hic' - plot_type: 'html' - data: | - <dl class=\"dl-horizontal\"> -${summary.collect { k,v -> " <dt>$k</dt><dd><samp>${v ?: '<span style=\"color:#999999;\">N/A</a>'}</samp></dd>" }.join("\n")} - </dl> - """.stripIndent() - - return yaml_file -} - - - /**************************************************** * PRE-PROCESSING */ @@ -674,7 +651,6 @@ process combine_mates{ """ } - /* * HiC-Pro - detect valid interaction from aligned data */ @@ -747,7 +723,6 @@ else{ } } - /* * Remove duplicates */ @@ -877,7 +852,7 @@ process run_ice{ * Cooler */ -process cooler_build { +process convert_to_pairs { tag "$sample" label 'process_medium' @@ -889,18 +864,17 @@ process cooler_build { file chrsize from chrsize_build.collect() output: - set val(sample), file("contacts.sorted.txt.gz"), file("contacts.sorted.txt.gz.px2") into cool_build, cool_build_zoom + set val(sample), file("*.txt.gz") into cool_build, cool_build_zoom script: """ - awk '{OFS="\t";print \$2,\$3,\$4,\$5,\$6,\$7,1}' $vpairs | sed -e 's/+/1/g' -e 's/-/16/g' > contacts.txt - cooler csort --nproc ${task.cpus} -c1 1 -p1 2 -s1 3 -c2 4 -p2 5 -s2 6 \ - contacts.txt \ - -o contacts.sorted.txt.gz \ - ${chrsize} + ## chr/pos/strand/chr/pos/strand + awk '{OFS="\t";print \$2,\$3,\$4,\$5,\$6,\$7}' $vpairs | sed -e 's/+/1/g' -e 's/-/16/g' > contacts.txt + gzip contacts.txt """ } + process cooler_raw { tag "$sample - ${res}" label 'process_medium' @@ -909,7 +883,7 @@ process cooler_raw { saveAs: {filename -> filename.indexOf(".cool") > 0 ? "raw/cool/$filename" : "raw/txt/$filename"} input: - set val(sample), file(contacts), file(index), val(res) from cool_build.combine(map_res_cool) + set val(sample), file(contacts), val(res) from cool_build.combine(map_res_cool) file chrsize from chrsize_raw.collect() output: @@ -919,7 +893,7 @@ process cooler_raw { script: """ cooler makebins ${chrsize} ${res} > ${sample}_${res}.bed - cooler cload pairix --nproc ${task.cpus} ${sample}_${res}.bed ${contacts} ${sample}_${res}.cool + cooler cload pairs -c1 1 -p1 2 -c2 4 -p2 5 ${sample}_${res}.bed ${contacts} ${sample}_${res}.cool cooler dump ${sample}_${res}.cool | awk '{OFS="\t"; print \$1+1,\$2+1,\$3}' > ${sample}_${res}.txt """ } @@ -959,7 +933,7 @@ process cooler_zoomify { !params.skip_mcool input: - set val(sample), file(contacts), file(index) from cool_build_zoom + set val(sample), file(contacts) from cool_build_zoom file chrsize from chrsize_zoom.collect() output: @@ -968,7 +942,7 @@ process cooler_zoomify { script: """ cooler makebins ${chrsize} ${params.res_zoomify} > bins.bed - cooler cload pairix --nproc ${task.cpus} bins.bed contacts.sorted.txt.gz ${sample}.cool + cooler cload pairs -c1 1 -p1 2 -c2 4 -p2 5 bins.bed ${contacts} ${sample}.cool cooler zoomify --nproc ${task.cpus} --balance ${sample}.cool """ } @@ -992,7 +966,7 @@ process convert_to_h5 { script: """ hicConvertFormat --matrices ${maps} \ - --outFileName ${sample}.h5 \ + --outFileName ${maps.baseName}.h5 \ --resolution ${res} \ --inputFormat cool \ --outputFormat h5 \ @@ -1027,11 +1001,10 @@ process dist_decay { script: - prefix = h5mat.toString() - ~/(\.h5)?$/ """ hicPlotDistVsCounts --matrices ${h5mat} \ - --plotFile ${prefix}_distcount.png \ - --outFileData ${prefix}_distcount.txt + --plotFile ${h5mat.baseName}_distcount.png \ + --outFileData ${h5mat.baseName}_distcount.txt """ } @@ -1132,7 +1105,7 @@ process multiqc { file (mqc_custom_config) from ch_multiqc_custom_config.collect().ifEmpty([]) file ('input_*/*') from all_mstats.concat(all_mergestat).collect() file ('software_versions/*') from software_versions_yaml - file workflow_summary from create_workflow_summary(summary) + file workflow_summary from ch_workflow_summary.collect() output: file "*multiqc_report.html" into multiqc_report @@ -1146,8 +1119,9 @@ process multiqc { """ } + /* - * STEP 7 - Output Description HTML + * Output Description HTML */ process output_documentation { publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode @@ -1156,13 +1130,13 @@ process output_documentation { file output_docs from ch_output_docs file images from ch_output_docs_images - output: - file "results_description.html" + output: + file "results_description.html" - script: - """ - markdown_to_html.py $output_docs -o results_description.html - """ + script: + """ + markdown_to_html.py $output_docs -o results_description.html + """ } /* @@ -1199,7 +1173,6 @@ workflow.onComplete { email_fields['summary']['Nextflow Build'] = workflow.nextflow.build email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp - // If not using MultiQC, strip out this code (including params.maxMultiqcEmailFileSize) // On success try attach the multiqc report def mqc_report = null try { @@ -1282,7 +1255,6 @@ workflow.onComplete { checkHostname() log.info "-${c_purple}[nf-core/hic]${c_red} Pipeline completed with errors${c_reset}-" } - } diff --git a/nextflow.config b/nextflow.config index 57b95c5761e675f4a89fa043662449b17b5b688a..8cca55407e0507080912a66b28d2ff63541858b2 100644 --- a/nextflow.config +++ b/nextflow.config @@ -7,7 +7,6 @@ // Global default params, used in configs params { - // Inputs / outputs genome = false input = "data/*{1,2}.fastq.gz" @@ -18,6 +17,18 @@ params { restriction_fragments = false save_reference = false + // Mapping + split_fastq = false + fastq_chunks_size = 20000000 + save_interaction_bam = false + save_aligned_intermediates = false + bwt2_opts_end2end = '--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder' + bwt2_opts_trimmed = '--very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder' + keep_dups = false + keep_multi = false + min_mapq = 10 + + // Digestion Hi-C digestion = false digest { @@ -47,17 +58,6 @@ params { dnase = false min_cis_dist = 0 - // Mapping - split_fastq = false - fastq_chunks_size = 20000000 - save_interaction_bam = false - save_aligned_intermediates = false - bwt2_opts_end2end = '--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder' - bwt2_opts_trimmed = '--very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder' - keep_dups = false - keep_multi = false - min_mapq = 10 - // Contact maps bin_size = '1000000,500000' ice_max_iter = 100 @@ -67,8 +67,8 @@ params { // Downstream Analysis res_dist_decay = '1000000' - tads_caller = "hicexplorer,insulation" - res_tads = '40000,20000' + tads_caller = 'insulation' + res_tads = '40000' res_zoomify = '5000' // Workflow @@ -94,6 +94,7 @@ params { tracedir = "${params.outdir}/pipeline_info" igenomes_ignore = false + //Config custom_config_version = 'master' custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" hostnames = false diff --git a/nextflow_schema.json b/nextflow_schema.json index c0733c3263055245da9996ae775173c488c5f03e..f888dbb16b5bd5f6e055d804a8d78e76be96079a 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -5,11 +5,11 @@ "description": "Analysis of Chromosome Conformation Capture data (Hi-C)", "type": "object", "definitions": { - "mandatory_arguments": { - "title": "Mandatory arguments", + "input_output_options": { + "title": "Input/output options", "type": "object", "fa_icon": "fas fa-terminal", - "description": "Mandatory arguments to run the pipeline", + "description": "Define where the pipeline should find input data and save output data.", "required": [ "input" ], @@ -26,12 +26,6 @@ "description": "Input FastQ files for test only", "default": "undefined" }, - "genome": { - "type": "string", - "description": "Name of iGenomes reference.", - "fa_icon": "fas fa-book", - "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`.\n\nSee the [nf-core website docs](https://nf-co.re/usage/reference_genomes) for more details." - }, "outdir": { "type": "string", "description": "The output directory where the results will be saved.", @@ -53,6 +47,12 @@ "fa_icon": "fas fa-dna", "description": "Options for the reference genome indices used to align reads.", "properties": { + "genome": { + "type": "string", + "description": "Name of iGenomes reference.", + "fa_icon": "fas fa-book", + "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`.\n\nSee the [nf-core website docs](https://nf-co.re/usage/reference_genomes) for more details." + }, "fasta": { "type": "string", "fa_icon": "fas fa-font", @@ -251,6 +251,34 @@ "type": "integer", "default": "100", "description": "Maximum number of iteraction for ICE normalization" + }, + "res_zoomify": { + "type": "integer", + "default": 5000, + "description": "Maximum resolution to build mcool file" + } + } + }, + "downstream_analysis": { + "title": "Downstream Analysis", + "type": "object", + "description": "Set up downstream analysis from contact maps", + "default": "", + "properties": { + "res_dist_decay": { + "type": "integer", + "default": 1000000, + "description": "Resolution to build count/distance plot" + }, + "tads_caller": { + "type": "string", + "default": "hicexplorer,insulation", + "description": "Define methods for TADs calling" + }, + "res_tads": { + "type": "string", + "default": "40000,20000", + "description": "Resolution to run TADs callers (comma separated)" } } }, @@ -266,12 +294,29 @@ "description": "Do not build contact maps" }, "skip_ice": { - "type": "boolean", - "description": "Do not normalize contact maps" + "type": "string", + "description": "Do not run ICE normalization", + "default": "False" }, - "skip_cool": { - "type": "boolean", - "description": "Do not generate cooler file" + "skip_dist_decay": { + "type": "string", + "description": "Do not run distance/decay plot", + "default": "False" + }, + "skip_tads": { + "type": "string", + "description": "Do not run TADs calling", + "default": "False" + }, + "skip_balancing": { + "type": "string", + "description": "Do not run cooler balancing normalization", + "default": "False" + }, + "skip_mcool": { + "type": "string", + "description": "Do not generate mcool file for Higlass visualization", + "default": "False" }, "skip_multiqc": { "type": "boolean", @@ -445,7 +490,7 @@ }, "allOf": [ { - "$ref": "#/definitions/mandatory_arguments" + "$ref": "#/definitions/input_output_options" }, { "$ref": "#/definitions/reference_genome_options" @@ -465,6 +510,9 @@ { "$ref": "#/definitions/contact_maps_options" }, + { + "$ref": "#/definitions/downstream_analysis" + }, { "$ref": "#/definitions/skip_options" },