diff --git a/.github/.dockstore.yml b/.github/.dockstore.yml new file mode 100644 index 0000000000000000000000000000000000000000..030138a0ca97a91f378c4cd4d55e79ac4de1dc55 --- /dev/null +++ b/.github/.dockstore.yml @@ -0,0 +1,5 @@ +# Dockstore config version, not pipeline version +version: 1.2 +workflows: + - subclass: nfl + primaryDescriptorPath: /nextflow.config diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 929f493c0b09f80151a8816b167840ea8eca9f30..538f355cedd9809e4bd3ac8067cb00832e5385b1 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -48,7 +48,7 @@ These tests are run both with the latest available version of `Nextflow` and als ## Patch -: warning: Only in the unlikely and regretful event of a release happening with a bug. +:warning: Only in the unlikely and regretful event of a release happening with a bug. * On your own fork, make a new branch `patch` based on `upstream/master`. * Fix the bug, and bump version (X.Y.Z+1). @@ -59,3 +59,4 @@ For further information/help, please consult the [nf-core/hic documentation](htt don't hesitate to get in touch on the nf-core Slack [#hic](https://nfcore.slack.com/channels/hic) channel ([join our Slack here](https://nf-co.re/join/slack)). +For further information/help, please consult the [nf-core/hic documentation](https://nf-co.re/hic/docs) and don't hesitate to get in touch on the nf-core Slack [#hic](https://nfcore.slack.com/channels/hic) channel ([join our Slack here](https://nf-co.re/join/slack)). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index 2b9203377a6365822d0f13f6a59f2496ae717fb1..dea18053f8ab41016fc4e427dbbc48cf56709781 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -1,24 +1,26 @@ +<!-- # nf-core/hic bug report Hi there! Thanks for telling us about a problem with the pipeline. Please delete this text and anything that's not relevant from the template below: +--> -## Describe the bug +## Description of the bug -A clear and concise description of what the bug is. +<!-- A clear and concise description of what the bug is. --> ## Steps to reproduce Steps to reproduce the behaviour: -1. Command line: `nextflow run ...` -2. See error: _Please provide your error message_ +1. Command line: <!-- [e.g. `nextflow run ...`] --> +2. See error: <!-- [Please provide your error message] --> ## Expected behaviour -A clear and concise description of what you expected to happen. +<!-- A clear and concise description of what you expected to happen. --> ## System @@ -39,4 +41,4 @@ A clear and concise description of what you expected to happen. ## Additional context -Add any other context about the problem here. +<!-- Add any other context about the problem here. --> diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index 57fa7f7f41368f73726974ef548162c957b7fd7d..ab6ff7c45b9fd67b10c3370448778eb912e5c23f 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -1,3 +1,4 @@ +<!-- # nf-core/hic feature request Hi there! @@ -5,20 +6,23 @@ Hi there! Thanks for suggesting a new feature for the pipeline! Please delete this text and anything that's not relevant from the template below: +--> + ## Is your feature request related to a problem? Please describe -A clear and concise description of what the problem is. +<!-- A clear and concise description of what the problem is. --> -Ex. I'm always frustrated when [...] +<!-- e.g. [I'm always frustrated when ...] --> ## Describe the solution you'd like -A clear and concise description of what you want to happen. +<!-- A clear and concise description of what you want to happen. --> ## Describe alternatives you've considered -A clear and concise description of any alternative solutions or features you've considered. +<!-- A clear and concise description of any alternative solutions or features you've considered. --> ## Additional context -Add any other context about the feature request here. +<!-- Add any other context about the feature request here. --> + diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 50d7959aa9d49a9bc51a14c172917c904d2bafb9..0bdd57579b660e0b42eaa248a2b5e1a163ed95e3 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,3 +1,4 @@ +<!-- # nf-core/hic pull request Many thanks for contributing to nf-core/hic! @@ -5,15 +6,16 @@ Many thanks for contributing to nf-core/hic! Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs). +Remember that PRs should be made against the dev branch, unless you're preparing a pipeline release. + +Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/hic/tree/master/.github/CONTRIBUTING.md) +--> + ## PR checklist - [ ] This comment contains a description of changes (with reason) +- [ ] `CHANGELOG.md` is updated - [ ] If you've fixed a bug or added code that should be tested, add tests! -- [ ] If necessary, also make a PR on the [nf-core/hic branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/hic) -- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). -- [ ] Make sure your code lints (`nf-core lint .`). - [ ] Documentation in `docs` is updated -- [ ] `CHANGELOG.md` is updated -- [ ] `README.md` is updated +- [ ] If necessary, also make a PR on the [nf-core/hic branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/hic) -**Learn more about contributing:** [CONTRIBUTING.md](https://github.com/nf-core/hic/tree/master/.github/CONTRIBUTING.md) \ No newline at end of file diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml new file mode 100644 index 0000000000000000000000000000000000000000..a43253198aa92a5832082dcd687516047b7af65b --- /dev/null +++ b/.github/workflows/awsfulltest.yml @@ -0,0 +1,39 @@ +name: nf-core AWS full size tests +# This workflow is triggered on push to the master branch. +# It runs the -profile 'test_full' on AWS batch + +on: + release: + types: [published] + +jobs: + run-awstest: + name: Run AWS full tests + if: github.repository == 'nf-core/hic' + runs-on: ubuntu-latest + steps: + - name: Setup Miniconda + uses: goanpeca/setup-miniconda@v1.0.2 + with: + auto-update-conda: true + python-version: 3.7 + - name: Install awscli + run: conda install -c conda-forge awscli + - name: Start AWS batch job + # Add full size test data (but still relatively small datasets for few samples) + # on the `test_full.config` test runs with only one set of parameters + # Then specify `-profile test_full` instead of `-profile test` on the AWS batch command + env: + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} + TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} + AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} + AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} + AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} + run: | + aws batch submit-job \ + --region eu-west-1 \ + --job-name nf-core-hic \ + --job-queue $AWS_JOB_QUEUE \ + --job-definition $AWS_JOB_DEFINITION \ + --container-overrides '{"command": ["nf-core/hic", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/hic/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/hic/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml new file mode 100644 index 0000000000000000000000000000000000000000..94bca240004c28e2608c4e1a8c2a82bb054ce428 --- /dev/null +++ b/.github/workflows/awstest.yml @@ -0,0 +1,39 @@ +name: nf-core AWS test +# This workflow is triggered on push to the master branch. +# It runs the -profile 'test' on AWS batch + +on: + push: + branches: + - master + +jobs: + run-awstest: + name: Run AWS tests + if: github.repository == 'nf-core/hic' + runs-on: ubuntu-latest + steps: + - name: Setup Miniconda + uses: goanpeca/setup-miniconda@v1.0.2 + with: + auto-update-conda: true + python-version: 3.7 + - name: Install awscli + run: conda install -c conda-forge awscli + - name: Start AWS batch job + # For example: adding multiple test runs with different parameters + # Remember that you can parallelise this by using strategy.matrix + env: + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} + TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} + AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} + AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} + AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} + run: | + aws batch submit-job \ + --region eu-west-1 \ + --job-name nf-core-hic \ + --job-queue $AWS_JOB_QUEUE \ + --job-definition $AWS_JOB_DEFINITION \ + --container-overrides '{"command": ["nf-core/hic", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/hic/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/hic/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}'hic/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml index e95804c7cb51f306a7b2bf2028149c64358af705..04dbb3d4b77e32d8ff4e3fa7f0ba17046111d1a3 100644 --- a/.github/workflows/branch.yml +++ b/.github/workflows/branch.yml @@ -3,14 +3,35 @@ name: nf-core branch protection # It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev` on: pull_request: - branches: - - master + branches: [master] jobs: test: - runs-on: ubuntu-18.04 + runs-on: ubuntu-latest steps: - # PRs are only ok if coming from an nf-core `dev` branch or a fork `patch` branch + # PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches - name: Check PRs + if: github.repository == 'nf-core/hic' run: | - { [[ $(git remote get-url origin) == *nf-core/hic ]] && [[ ${GITHUB_HEAD_REF} = "dev" ]]; } || [[ ${GITHUB_HEAD_REF} == "patch" ]] + { [[ ${{github.event.pull_request.head.repo.full_name}} == nf-core/hic ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]] + + + # If the above check failed, post a comment on the PR explaining the failure + # NOTE - this doesn't currently work if the PR is coming from a fork, due to limitations in GitHub actions secrets + - name: Post PR comment + if: failure() + uses: mshick/add-pr-comment@v1 + with: + message: | + Hi @${{ github.event.pull_request.user.login }}, + + It looks like this pull-request is has been made against the ${{github.event.pull_request.head.repo.full_name}} `master` branch. + The `master` branch on nf-core repositories should always contain code from the latest release. + Because of this, PRs to `master` are only allowed if they come from the ${{github.event.pull_request.head.repo.full_name}} `dev` branch. + + You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page. + + Thanks again for your contribution! + repo-token: ${{ secrets.GITHUB_TOKEN }} + allow-repeats: false + diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index da1d4b729ad80b956f78514e46f3a7d11c374521..a8a8ba508df126c0af56f37daa5fe7e94a976875 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -1,30 +1,52 @@ name: nf-core CI -# This workflow is triggered on pushes and PRs to the repository. -# It runs the pipeline with the minimal test dataset to check that it completes without any syntax errors -on: [push, pull_request] +# This workflow runs the pipeline with the minimal test dataset to check that it completes without any syntax errors +on: + push: + branches: + - dev + pull_request: + release: + types: [published] jobs: test: + name: Run workflow tests + # Only run on push if this is the nf-core dev branch (merged PRs) + if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/hic') }} + runs-on: ubuntu-latest env: NXF_VER: ${{ matrix.nxf_ver }} NXF_ANSI_LOG: false - runs-on: ubuntu-latest strategy: matrix: # Nextflow versions: check pipeline minimum and current latest nxf_ver: ['19.10.0', ''] steps: - - uses: actions/checkout@v2 + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Check if Dockerfile or Conda environment changed + uses: technote-space/get-diff-action@v1 + with: + PREFIX_FILTER: | + Dockerfile + environment.yml + + - name: Build new docker image + if: env.GIT_DIFF + run: docker build --no-cache . -t nfcore/hic:1.2.2 + + - name: Pull docker image + if: ${{ !env.GIT_DIFF }} + run: | + docker pull nfcore/hic:dev + docker tag nfcore/hic:dev nfcore/hic:1.2.2 + - name: Install Nextflow run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - - name: Pull docker image - run: | - docker pull nfcore/hic:dev - docker tag nfcore/hic:dev nfcore/hic:1.2.1 + - name: Run pipeline with test data run: | - # nf-core: You can customise CI pipeline run tests as required - # (eg. adding multiple test runs with different parameters) nextflow run ${GITHUB_WORKSPACE} -profile test,docker diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 1e0827a800dcd520582e8f89d2325cbce15a6b12..7a41b0e744fdc86c94c3a83df5cb0cc59c07ed0f 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -33,18 +33,33 @@ jobs: nf-core: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v2 + - name: Check out pipeline code + uses: actions/checkout@v2 - name: Install Nextflow run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ + - uses: actions/setup-python@v1 with: python-version: '3.6' architecture: 'x64' + - name: Install dependencies run: | python -m pip install --upgrade pip pip install nf-core + - name: Run nf-core lint - run: nf-core lint ${GITHUB_WORKSPACE} + env: + GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} + run: nf-core -l lint_log.txt lint ${GITHUB_WORKSPACE} + + - name: Upload linting log file artifact + if: ${{ always() }} + uses: actions/upload-artifact@v2 + with: + name: linting-log-file + path: lint_log.txt diff --git a/.github/workflows/push_dockerhub.yml b/.github/workflows/push_dockerhub.yml new file mode 100644 index 0000000000000000000000000000000000000000..280f8ba5b9fa9581c0954275cf77629492aa5c7f --- /dev/null +++ b/.github/workflows/push_dockerhub.yml @@ -0,0 +1,40 @@ +name: nf-core Docker push +# This builds the docker image and pushes it to DockerHub +# Runs on nf-core repo releases and push event to 'dev' branch (PR merges) +on: + push: + branches: + - dev + release: + types: [published] + +jobs: + push_dockerhub: + name: Push new Docker image to Docker Hub + runs-on: ubuntu-latest + # Only run for the nf-core repo, for releases and merged PRs + if: ${{ github.repository == 'nf-core/hic' }} + env: + DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }} + DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }} + steps: + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Build new docker image + run: docker build --no-cache . -t nfcore/hic:latest + + - name: Push Docker image to DockerHub (dev) + if: ${{ github.event_name == 'push' }} + run: | + echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin + docker tag nfcore/hic:latest nfcore/hic:dev + docker push nfcore/hic:dev + + - name: Push Docker image to DockerHub (release) + if: ${{ github.event_name == 'release' }} + run: | + echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin + docker push nfcore/hic:latest + docker tag nfcore/hic:latest nfcore/hic:${{ github.event.release.tag_name }} + docker push nfcore/hic:${{ github.event.release.tag_name }} diff --git a/.gitignore b/.gitignore index 6354f3708fa7c35477f398801673e469c12726ea..aa4bb5b375a9021f754dbd91d2321d16d1c0afc7 100644 --- a/.gitignore +++ b/.gitignore @@ -5,4 +5,5 @@ results/ .DS_Store tests/ testing/ +testing* *.pyc diff --git a/CHANGELOG.md b/CHANGELOG.md index 783efce8be31a8e0ad7a105a5cde2d1e83655af4..58caad6f546a0eddf70f88e8723447f571b9fd24 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,7 +1,17 @@ # nf-core/hic: Changelog -The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) -and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## v1.2.2 - 2020-08-07 + +### `Added` + +* Template update for nf-core/tools v1.10.2 + +### `Fixed` + +* Bug in `--split_fastq` option not recognized ## v1.2.1 - 2020-07-06 diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 496ad3b59f0bc2e34e2a69f8d3b4cc760be51616..9d68eed2ae8c493a162c2294cdb7e5f229df6283 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -70,7 +70,7 @@ members of the project's leadership. This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at -[http://contributor-covenant.org/version/1/4][version] +[https://www.contributor-covenant.org/version/1/4/code-of-conduct/][version] -[homepage]: http://contributor-covenant.org -[version]: http://contributor-covenant.org/version/1/4/ +[homepage]: https://contributor-covenant.org +[version]: https://www.contributor-covenant.org/version/1/4/code-of-conduct/ diff --git a/Dockerfile b/Dockerfile index 7d67f4985fb28856beec496fe12c33aa64953818..3c8d019dc4fb85111fe16279257c734d8aeafd43 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,4 +1,5 @@ -FROM nfcore/base:1.9 +FROM nfcore/base:1.10.2 + LABEL authors="Nicolas Servant" \ description="Docker image containing all software requirements for the nf-core/hic pipeline" @@ -6,9 +7,14 @@ LABEL authors="Nicolas Servant" \ RUN apt-get update && apt-get install -y gcc g++ && apt-get clean -y COPY environment.yml / -RUN conda env create -f /environment.yml && conda clean -a -ENV PATH /opt/conda/envs/nf-core-hic-1.2.1/bin:$PATH +RUN conda env create --quiet -f /environment.yml && conda clean -a + +# Add conda installation dir to PATH (instead of doing 'conda activate') +ENV PATH /opt/conda/envs/nf-core-hic-1.2.2/bin:$PATH # Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-hic-1.2.1 > nf-core-hic-1.2.1.yml +RUN conda env export --name nf-core-hic-1.2.2 > nf-core-hic-1.2.2.yml +# Instruct R processes to use these empty files instead of clashing with a local version +RUN touch .Rprofile +RUN touch .Renviron diff --git a/README.md b/README.md index be3889dd90f80bc31850ce697468ffb6208be3d3..cc493d72b34aef3bc6d45e74b42174bc3ea5e9a0 100644 --- a/README.md +++ b/README.md @@ -6,10 +6,11 @@ [](https://github.com/nf-core/hic/actions) [](https://www.nextflow.io/) -[](http://bioconda.github.io/) +[](https://bioconda.github.io/) [](https://hub.docker.com/r/nfcore/hic) [](https://doi.org/10.5281/zenodo.2669513) +[](https://nfcore.slack.com/channels/hic) ## Introduction @@ -66,7 +67,8 @@ iv. Start running your own analysis! nextflow run nf-core/hic -profile <docker/singularity/conda/institute> --reads '*_R{1,2}.fastq.gz' --genome GRCh37 ``` -See [usage docs](docs/usage.md) for all of the available options when running the pipeline. +See [usage docs](docs/usage.md) for all of the available options when running +the pipeline. ## Documentation @@ -82,6 +84,10 @@ found in the `docs/` directory: 4. [Output and how to interpret the results](docs/output.md) 5. [Troubleshooting](https://nf-co.re/usage/troubleshooting) +The nf-core/hic pipeline comes with documentation about the pipeline which +you can read at [https://nf-core/hic/docs](https://nf-core/hic/docs) or +find in the [`docs/` directory](docs). + For further information or help, don't hesitate to get in touch on [Slack](https://nfcore.slack.com/channels/hic). You can join with [this invite](https://nf-co.re/join/slack). @@ -94,9 +100,9 @@ nf-core/hic was originally written by Nicolas Servant. If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). -For further information or help, don't hesitate to get in touch on -[Slack](https://nfcore.slack.com/channels/hic) (you can join with -[this invite](https://nf-co.re/join/slack)). +For further information or help, don't hesitate to get in touch on the +[Slack `#hic` channel](https://nfcore.slack.com/channels/hic) +(you can join with [this invite](https://nf-co.re/join/slack)). ## Citation @@ -110,6 +116,5 @@ You can cite the `nf-core` publication as follows: > Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. > -> _Nat Biotechnol._ 2020 Feb 13. -doi:[10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). +> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). > ReadCube: [Full Access Link](https://rdcu.be/b1GjZ) diff --git a/bin/markdown_to_html.py b/bin/markdown_to_html.py index 57cc4263fe4182373949388b5aa88e20d60a3c70..a26d1ff5e6de3c09385760e76cc40f11a512b3a4 100755 --- a/bin/markdown_to_html.py +++ b/bin/markdown_to_html.py @@ -4,33 +4,23 @@ import argparse import markdown import os import sys +import io + def convert_markdown(in_fn): - input_md = open(in_fn, mode="r", encoding="utf-8").read() + input_md = io.open(in_fn, mode="r", encoding="utf-8").read() html = markdown.markdown( "[TOC]\n" + input_md, - extensions = [ - 'pymdownx.extra', - 'pymdownx.b64', - 'pymdownx.highlight', - 'pymdownx.emoji', - 'pymdownx.tilde', - 'toc' - ], - extension_configs = { - 'pymdownx.b64': { - 'base_path': os.path.dirname(in_fn) - }, - 'pymdownx.highlight': { - 'noclasses': True - }, - 'toc': { - 'title': 'Table of Contents' - } - } + extensions=["pymdownx.extra", "pymdownx.b64", "pymdownx.highlight", "pymdownx.emoji", "pymdownx.tilde", "toc"], + extension_configs={ + "pymdownx.b64": {"base_path": os.path.dirname(in_fn)}, + "pymdownx.highlight": {"noclasses": True}, + "toc": {"title": "Table of Contents"}, + }, ) return html + def wrap_html(contents): header = """<!DOCTYPE html><html> <head> @@ -83,18 +73,19 @@ def wrap_html(contents): def parse_args(args=None): parser = argparse.ArgumentParser() - parser.add_argument('mdfile', type=argparse.FileType('r'), nargs='?', - help='File to convert. Defaults to stdin.') - parser.add_argument('-o', '--out', type=argparse.FileType('w'), - default=sys.stdout, - help='Output file name. Defaults to stdout.') + parser.add_argument("mdfile", type=argparse.FileType("r"), nargs="?", help="File to convert. Defaults to stdin.") + parser.add_argument( + "-o", "--out", type=argparse.FileType("w"), default=sys.stdout, help="Output file name. Defaults to stdout." + ) return parser.parse_args(args) + def main(args=None): args = parse_args(args) converted_md = convert_markdown(args.mdfile.name) html = wrap_html(converted_md) args.out.write(html) -if __name__ == '__main__': + +if __name__ == "__main__": sys.exit(main()) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index d5f4c5c0095a2006ce4e7a876dc1ac849c020c3a..9f5650db54daa74c59c3967500a399ff55534f29 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -34,7 +34,7 @@ for k, v in regexes.items(): # Remove software set to false in results for k in list(results): if not results[k]: - del(results[k]) + del results[k] # Remove software set to false in results for k in results: @@ -42,7 +42,8 @@ for k in results: del(results[k]) # Dump to YAML -print (''' +print( + """ id: 'software_versions' section_name: 'nf-core/hic Software Versions' section_href: 'https://github.com/nf-core/hic' @@ -50,12 +51,14 @@ plot_type: 'html' description: 'are collected at run time from the software output.' data: | <dl class="dl-horizontal"> -''') -for k,v in results.items(): - print(" <dt>{}</dt><dd><samp>{}</samp></dd>".format(k,v)) -print (" </dl>") +""" +) +for k, v in results.items(): + print(" <dt>{}</dt><dd><samp>{}</samp></dd>".format(k, v)) +print(" </dl>") # Write out regexes as csv file: -with open('software_versions.csv', 'w') as f: - for k,v in results.items(): - f.write("{}\t{}\n".format(k,v)) +with open("software_versions.csv", "w") as f: + for k, v in results.items(): + f.write("{}\t{}\n".format(k, v)) + diff --git a/conf/hicpro.config b/conf/hicpro.config index 01b755a955c5aee521a6cf43b00847cfbc8d0cd3..cd0cf0b5a54f860312f49ac193802d53964ce686 100644 --- a/conf/hicpro.config +++ b/conf/hicpro.config @@ -10,7 +10,6 @@ params { // Alignment options - splitFastq = false bwt2_opts_end2end = '--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder' bwt2_opts_trimmed = '--very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder' min_mapq = 10 @@ -35,9 +34,5 @@ params { ice_filer_low_count_perc = 0.02 ice_filer_high_count_perc = 0 ice_eps = 0.1 - - saveReference = false - saveAlignedIntermediates = false - saveInteractionBAM = false } diff --git a/conf/test.config b/conf/test.config index 39a2bba88d6da893f0f3ba97f397e77488556873..2ab8e57eda3d8fddb90c6d7ed3ddf2c0fd0672ca 100644 --- a/conf/test.config +++ b/conf/test.config @@ -18,7 +18,7 @@ config_profile_name = 'Hi-C test data from Schalbetter et al. (2017)' max_time = 1.h // Input data - readPaths = [ + input_paths = [ ['SRR4292758_00', ['https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R2.fastq.gz']] ] diff --git a/conf/test_full.config b/conf/test_full.config new file mode 100644 index 0000000000000000000000000000000000000000..47d31760585c66025666f112dcd03a23faeac543 --- /dev/null +++ b/conf/test_full.config @@ -0,0 +1,36 @@ +/* + * ------------------------------------------------- + * Nextflow config file for running full-size tests + * ------------------------------------------------- + * Defines bundled input files and everything required + * to run a full size pipeline test. Use as follows: + * nextflow run nf-core/hic -profile test_full,<docker/singularity> + */ + +params { + config_profile_name = 'Full test profile' + config_profile_description = 'Full test dataset to check pipeline function' + + // Input data for full size test + input_paths = [ + ['SRR4292758_00', ['https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R2.fastq.gz']] + ] + + // Annotations + fasta = 'https://github.com/nf-core/test-datasets/raw/hic/reference/W303_SGD_2015_JRIU00000000.fsa' + restriction_site = 'A^AGCTT' + ligation_site = 'AAGCTAGCTT' + + min_mapq = 2 + rm_dup = true + rm_singleton = true + rm_multi = true + + min_restriction_fragment_size = 100 + max_restriction_fragment_size = 100000 + min_insert_size = 100 + max_insert_size = 600 + + // Options + skip_cool = true +} diff --git a/docs/README.md b/docs/README.md index e160867d029e09c793168dd764f8a0ea01dcbd59..bdbc92abc939ff716f3fcaba1b5069be471c9049 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,12 +1,13 @@ # nf-core/hic: Documentation -The nf-core/hic documentation is split into the following files: +The nf-core/hic documentation is split into the following pages: -1. [Installation](https://nf-co.re/usage/installation) -2. Pipeline configuration - * [Local installation](https://nf-co.re/usage/local_installation) - * [Adding your own system config](https://nf-co.re/usage/adding_own_config) - * [Reference genomes](https://nf-co.re/usage/reference_genomes) -3. [Running the pipeline](usage.md) -4. [Output and how to interpret the results](output.md) -5. [Troubleshooting](https://nf-co.re/usage/troubleshooting) +* [Usage](usage.md) + * An overview of how the pipeline works, how to run it and a + description of all of the different command-line flags. +* [Output](output.md) + * An overview of the different results produced by the pipeline + and how to interpret them. + +You can find a lot more documentation about installing, configuring +and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re) diff --git a/docs/output.md b/docs/output.md index a83d0dae9b5a742b799f055163dd7dde2da77712..95aca423a2f44853e03c8ed443f6eb8ac43b9019 100644 --- a/docs/output.md +++ b/docs/output.md @@ -1,8 +1,12 @@ # nf-core/hic: Output -This document describes the output produced by the pipeline. Most of the plots -are taken from the MultiQC report, which summarises results at the end of the -pipeline. +This document describes the output produced by the pipeline. +Most of the plots are taken from the MultiQC report, which +summarises results at the end of the pipeline. + +The directories listed below will be created in the results directory +after the pipeline has finished. All paths are relative to the top-level +results directory. ## Pipeline overview @@ -185,7 +189,7 @@ within the report data directory. The pipeline has special steps which allow the software versions used to be reported in the MultiQC output for future traceability. -**Output directory: `results/multiqc`** +**Output files:** * `Project_multiqc_report.html` * MultiQC report - a standalone HTML file that can be viewed in your @@ -194,5 +198,10 @@ web browser * Directory containing parsed statistics from the different tools used in the pipeline -For more information about how to use MultiQC reports, see -[http://multiqc.info](http://multiqc.info) +* `pipeline_info/` + * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, + `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. + * Reports generated by the pipeline: `pipeline_report.html`, + `pipeline_report.txt` and `software_versions.csv`. + * Documentation for interpretation of results in HTML format: + `results_description.html`. diff --git a/docs/usage.md b/docs/usage.md index cef7bf3b60752e918d11dbfb9023aed8ca2d9242..4a057e74ebd2dd6e3464421aeaa92fc56bc0f813 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -1,109 +1,11 @@ # nf-core/hic: Usage -## Table of contents - -* [Table of contents](#table-of-contents) -* [Introduction](#introduction) -* [Running the pipeline](#running-the-pipeline) - * [Updating the pipeline](#updating-the-pipeline) - * [Reproducibility](#reproducibility) -* [Main arguments](#main-arguments) - * [`-profile`](#-profile-single-dash) - * [`awsbatch`](#awsbatch) - * [`conda`](#conda) - * [`docker`](#docker) - * [`singularity`](#singularity) - * [`test`](#test) - * [`--reads`](#--reads) - * [`--singleEnd`](#--singleend) -* [Reference genomes](#reference-genomes) - * [`--genome`](#--genome) - * [`--fasta`](#--fasta) - * [`--igenomesIgnore`](#--igenomesignore) - * [`--bwt2_index`](#--bwt2_index) - * [`--chromosome_size`](#--chromosome_size) - * [`--restriction_fragments`](#--restriction_fragments) -* [Hi-C specific options](#hi-c-specific-options) - * [Reads mapping](#reads-mapping) - * [`--bwt2_opts_end2end`](#--bwt2_opts_end2end) - * [`--bwt2_opts_trimmed`](#--bwt2_opts_trimmed) - * [`--min_mapq`](#--min_mapq) - * [Digestion Hi-C](#digestion-hi-c) - * [`--restriction_site`](#--restriction_site) - * [`--ligation_site`](#--ligation_site) - * [`--min_restriction_fragment_size`](#--min_restriction_fragment_size) - * [`--max_restriction_fragment_size`](#--max_restriction_fragment_size) - * [`--min_insert_size`](#--min_insert_size) - * [`--max_insert_size`](#--max_insert_size) - * [DNase Hi-C](#dnase-hi-c) - * [`--dnase`](#--dnase) - * [Hi-C Processing](#hi-c-processing) - * [`--min_cis_dist`](#--min_cis_dist) - * [`--rm_singleton`](#--rm_singleton) - * [`--rm_dup`](#--rm_dup) - * [`--rm_multi`](#--rm_multi) - * [Genome-wide contact maps](#genome-wide-contact-maps) - * [`--bins_size`](#--bins_size) - * [`--ice_max_iter`](#--ice_max_iter) - * [`--ice_filer_low_count_perc`](#--ice_filer_low_count_perc) - * [`--ice_filer_high_count_perc`](#--ice_filer_high_count_perc) - * [`--ice_eps`](#--ice_eps) - * [Inputs/Outputs](#inputs-outputs) - * [`--splitFastq`](#--splitFastq) - * [`--saveReference`](#--saveReference) - * [`--saveAlignedIntermediates`](#--saveAlignedIntermediates) - * [`--saveInteractionBAM`](#--saveInteractionBAM) -* [Skip options](#skip-options) - * [--skipMaps](#--skipMaps) - * [--skipIce](#--skipIce) - * [--skipCool](#--skipCool) - * [--skipMultiQC](#--skipMultiQC) -* [Job resources](#job-resources) -* [Automatic resubmission](#automatic-resubmission) -* [Custom resource requests](#custom-resource-requests) -* [AWS batch specific parameters](#aws-batch-specific-parameters) - * [`-awsbatch`](#-awsbatch) - * [`--awsqueue`](#--awsqueue) - * [`--awsregion`](#--awsregion) -* [Other command line parameters](#other-command-line-parameters) - * [`--outdir`](#--outdir) - * [`--email`](#--email) - * [`--email_on_fail`](#--email_on_fail) - * [`--max_multiqc_email_size`](#--max_multiqc_email_size) - * [`-name`](#-name-single-dash) - * [`-resume`](#-resume-single-dash) - * [`-c`](#-c-single-dash) - * [`--custom_config_version`](#--custom_config_version) - * [`--custom_config_base`](#--custom_config_base) - * [`--max_memory`](#--max_memory) - * [`--max_time`](#--max_time) - * [`--max_cpus`](#--max_cpus) - * [`--plaintext_email`](#--plaintext_email) - * [`--monochrome_logs`](#--monochrome_logs) - * [`--multiqc_config`](#--multiqc_config) - -## Introduction - -Nextflow handles job submissions on SLURM or other environments, and supervises -running the jobs. Thus the Nextflow process must run until the pipeline is -finished. We recommend that you put the process running in the background -through `screen` / `tmux` or similar tool. Alternatively you can run nextflow -within a cluster job submitted your job scheduler. - -It is recommended to limit the Nextflow Java virtual machines memory. -We recommend adding the following line to your environment (typically -in `~/.bashrc` or `~./bash_profile`): - -```bash -NXF_OPTS='-Xms1g -Xmx4g' -``` - ## Running the pipeline The typical command for running the pipeline is as follows: ```bash -nextflow run nf-core/hic --reads '*_R{1,2}.fastq.gz' --genome GRCh37 -profile docker +nextflow run nf-core/hic --input '*_R{1,2}.fastq.gz' -profile docker ``` This will launch the pipeline with the `docker` configuration profile. @@ -154,7 +56,18 @@ eg. `-r 1.3.1`. This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. -## Main arguments +### Automatic resubmission + +Each step in the pipeline has a default set of requirements for number of CPUs, +memory and time. For most of the steps in the pipeline, if the job exits with +an error code of `143` (exceeded requested resources) it will automatically +resubmit with higher requests (2 x original, then 3 x original). If it still +fails after three times then the pipeline is stopped. + +## Core Nextflow arguments + +> **NB:** These options are part of Nextflow and use a _single_ hyphen +(pipeline parameters use a double-hyphen). ### `-profile` @@ -169,38 +82,122 @@ the pipeline to use software packaged using different methods pipeline reproducibility, however when this is not possible, Conda is also supported. The pipeline also dynamically loads configurations from -[https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, -making multiple config profiles for various institutional clusters available at run time. -For more information and to see if your system is available in these configs please see +[https://github.com/nf-core/configs](https://github.com/nf-core/configs) +when it runs, making multiple config profiles for various institutional +clusters available at run time. +For more information and to see if your system is available in these +configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). -Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order -of arguments is important! -They are loaded in sequence, so later profiles can overwrite earlier profiles. +Note that multiple profiles can be loaded, for example: `-profile test,docker` - +the order of arguments is important! +They are loaded in sequence, so later profiles can overwrite +earlier profiles. -If `-profile` is not specified, the pipeline will run locally and expect all software to be +If `-profile` is not specified, the pipeline will run locally and +expect all software to be installed and available on the `PATH`. This is _not_ recommended. * `docker` - * A generic configuration profile to be used with [Docker](http://docker.com/) - * Pulls software from dockerhub: [`nfcore/hic`](http://hub.docker.com/r/nfcore/hic/) + * A generic configuration profile to be used with [Docker](https://docker.com/) + * Pulls software from Docker Hub: [`nfcore/hic`](https://hub.docker.com/r/nfcore/hic/) * `singularity` - * A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/) - * Pulls software from DockerHub: [`nfcore/hic`](http://hub.docker.com/r/nfcore/hic/) + * A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) + * Pulls software from Docker Hub: [`nfcore/hic`](https://hub.docker.com/r/nfcore/hic/) * `conda` - * Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker or Singularity. + * Please only use Conda as a last resort i.e. when it's not possible to run the + pipeline with Docker or Singularity. * A generic configuration profile to be used with [Conda](https://conda.io/docs/) * Pulls most software from [Bioconda](https://bioconda.github.io/) * `test` * A profile with a complete configuration for automated testing * Includes links to test data so needs no other parameters -### `--reads` +### `-resume` + +Specify this when restarting a pipeline. Nextflow will used cached results from +any pipeline steps where the inputs are the same, continuing from where it got +to previously. +You can also supply a run name to resume a specific run: `-resume [run-name]`. +Use the `nextflow log` command to show previous run names. + +### `-c` + +Specify the path to a specific config file (this is a core Nextflow command). +See the [nf-core website documentation](https://nf-co.re/usage/configuration) +for more information. + +#### Custom resource requests + +Each step in the pipeline has a default set of requirements for number of CPUs, +memory and time. For most of the steps in the pipeline, if the job exits with +an error code of `143` (exceeded requested resources) it will automatically resubmit +with higher requests (2 x original, then 3 x original). If it still fails after three +times then the pipeline is stopped. + +Whilst these default requirements will hopefully work for most people with most data, +you may find that you want to customise the compute resources that the pipeline requests. +You can do this by creating a custom config file. For example, to give the workflow +process `star` 32GB of memory, you could use the following config: + +```nextflow +process { + withName: star { + memory = 32.GB + } +} +``` + +See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) +for more information. + +If you are likely to be running `nf-core` pipelines regularly it may be a +good idea to request that your custom config file is uploaded to the +`nf-core/configs` git repository. Before you do this please can you test +that the config file works with your pipeline of choice using the `-c` +parameter (see definition below). You can then create a pull request to the +`nf-core/configs` repository with the addition of your config file, associated +documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), +and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) +to include your custom profile. + +If you have any questions or issues please send us a message on +[Slack](https://nf-co.re/join/slack) on the +[`#configs` channel](https://nfcore.slack.com/channels/configs). + +### Running in the background + +Nextflow handles job submissions and supervises the running jobs. +The Nextflow process must run until the pipeline is finished. + +The Nextflow `-bg` flag launches Nextflow in the background, detached from your terminal +so that the workflow does not stop if you log out of your session. The logs are +saved to a file. + +Alternatively, you can use `screen` / `tmux` or similar tool to create a detached +session which you can log back into at a later time. +Some HPC setups also allow you to run nextflow within a cluster job submitted +your job scheduler (from where it submits more jobs). + +#### Nextflow memory requirements + +In some cases, the Nextflow Java virtual machines can start to request a +large amount of memory. +We recommend adding the following line to your environment to limit this +(typically in `~/.bashrc` or `~./bash_profile`): + +```bash +NXF_OPTS='-Xms1g -Xmx4g' +``` + +## Inputs + +### `--input` Use this to specify the location of your input FastQ files. For example: ```bash ---reads 'path/to/data/sample_*_{1,2}.fastq' +--input 'path/to/data/sample_*_{1,2}.fastq' ``` Please note the following requirements: @@ -214,7 +211,10 @@ If left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz` ### `--single_end` -By default, the pipeline expects paired-end data. If you have single-end data, you need to specify `--single_end` on the command line when you launch the pipeline. A normal glob pattern, enclosed in quotation marks, can then be used for `--reads`. For example: +By default, the pipeline expects paired-end data. If you have single-end data, +you need to specify `--single_end` on the command line when you launch the pipeline. +A normal glob pattern, enclosed in quotation marks, can then be used for `--input`. +For example: ```bash --single_end --reads '*.fastq' @@ -224,48 +224,17 @@ It is not possible to run a mixture of single-end and paired-end files in one ru ## Reference genomes -The pipeline config files come bundled with paths to the illumina iGenomes reference index files. If running with docker or AWS, the configuration is set up to use the [AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource. +The pipeline config files come bundled with paths to the illumina iGenomes reference +index files. If running with docker or AWS, the configuration is set up to use the +[AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource. ### `--genome` (using iGenomes) -There are 31 different species supported in the iGenomes references. To run the pipeline, you must specify which to use with the `--genome` flag. - -There are 31 different species supported in the iGenomes references. To run +There are many different species supported in the iGenomes references. To run the pipeline, you must specify which to use with the `--genome` flag. You can find the keys to specify the genomes in the [iGenomes config file](../conf/igenomes.config). -Common genomes that are supported are: - -* Human - * `--genome GRCh37` -* Mouse - * `--genome GRCm38` -* _Drosophila_ - * `--genome BDGP6` -* _S. cerevisiae_ - * `--genome 'R64-1-1'` - -> There are numerous others - check the config file for more. - -Note that you can use the same configuration setup to save sets of reference -files for your own use, even if they are not part of the iGenomes resource. -See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) -for instructions on where to save such a file. - -The syntax for this reference configuration is as follows: - -```nextflow -params { - genomes { - 'GRCh37' { - fasta = '<path to the genome fasta file>' // Used if no annotations are given - bowtie2 = '<path to bowtie2 index files>' - } - // Any number of additional genomes, key is used with --genome - } -} -``` ### `--fasta` @@ -276,12 +245,6 @@ run the pipeline: --fasta '[path to Fasta reference]' ``` -### `--igenomesIgnore` - -Do not load `igenomes.config` when running the pipeline. You may choose this -option if you observe clashes between custom parameters and those supplied -in `igenomes.config`. - ### `--bwt2_index` The bowtie2 indexes are required to run the Hi-C pipeline. If the @@ -628,157 +591,3 @@ If defined, the MultiQC report is not generated. Default: false ```bash --skipMultiQC ``` - -## Job resources - -### Automatic resubmission - -Each step in the pipeline has a default set of requirements for number of CPUs, -memory and time. For most of the steps in the pipeline, if the job exits with -an error code of `143` (exceeded requested resources) it will automatically -resubmit with higher requests (2 x original, then 3 x original). If it still -fails after three times then the pipeline is stopped. - -### Custom resource requests - -Wherever process-specific requirements are set in the pipeline, the default value -can be changed by creating a custom config file. See the files hosted -at [`nf-core/configs`](https://github.com/nf-core/configs/tree/master/conf) for examples. - -If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack). - -## AWS Batch specific parameters - -Running the pipeline on AWS Batch requires a couple of specific parameters to be -set according to your AWS Batch configuration. Please use -[`-profile awsbatch`](https://github.com/nf-core/configs/blob/master/conf/awsbatch.config) -and then specify all of the following parameters. - -### `--awsqueue` - -The JobQueue that you intend to use on AWS Batch. - -### `--awsregion` - -The AWS region in which to run your job. Default is set to `eu-west-1` but can be adjusted to your needs. - -### `--awscli` - -The [AWS CLI](https://www.nextflow.io/docs/latest/awscloud.html#aws-cli-installation) -path in your custom AMI. Default: `/home/ec2-user/miniconda/bin/aws`. - -The AWS region to run your job in. Default is set to `eu-west-1` but can be -adjusted to your needs. - -Please make sure to also set the `-w/--work-dir` and `--outdir` parameters to -a S3 storage bucket of your choice - you'll get an error message notifying you -if you didn't. - -## Other command line parameters - -### `--outdir` - -The output directory where the results will be saved. - -### `--email` - -Set this parameter to your e-mail address to get a summary e-mail with details -of the run sent to you when the workflow exits. If set in your user config file -(`~/.nextflow/config`) then you don't need to specify this on the command line for every run. - -### `--email_on_fail` - -This works exactly as with `--email`, except emails are only sent if the workflow is not successful. - -### `--max_multiqc_email_size` - -Threshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB). - -### `-name` - -Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic. - -Name for the pipeline run. If not specified, Nextflow will automatically generate -a random mnemonic. - -This is used in the MultiQC report (if not default) and in the summary HTML / -e-mail (always). - -**NB:** Single hyphen (core Nextflow option) - -### `-resume` - -Specify this when restarting a pipeline. Nextflow will used cached results from -any pipeline steps where the inputs are the same, continuing from where it got -to previously. - -You can also supply a run name to resume a specific run: `-resume [run-name]`. -Use the `nextflow log` command to show previous run names. - -**NB:** Single hyphen (core Nextflow option) - -### `-c` - -Specify the path to a specific config file (this is a core NextFlow command). - -**NB:** Single hyphen (core Nextflow option) - -Note - you can use this to override pipeline defaults. - -### `--custom_config_version` - -Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. -This was implemented for reproducibility purposes. Default: `master`. - -```bash -## Download and use config file with following git commid id ---custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96 -``` - -### `--custom_config_base` - -If you're running offline, nextflow will not be able to fetch the institutional config files -from the internet. If you don't need them, then this is not a problem. If you do need them, -you should download the files from the repo and tell nextflow where to find them with the -`custom_config_base` option. For example: - -```bash -## Download and unzip the config files -cd /path/to/my/configs -wget https://github.com/nf-core/configs/archive/master.zip -unzip master.zip - -## Run the pipeline -cd /path/to/my/data -nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/ -``` - -> Note that the nf-core/tools helper package has a `download` command to download all required pipeline -> files + singularity containers + institutional configs in one go for you, to make this process easier. - -### `--max_memory` - -Use to set a top-limit for the default memory requirement for each process. -Should be a string in the format integer-unit. eg. `--max_memory '8.GB'` - -### `--max_time` - -Use to set a top-limit for the default time requirement for each process. -Should be a string in the format integer-unit. eg. `--max_time '2.h'` - -### `--max_cpus` - -Use to set a top-limit for the default CPU requirement for each process. -Should be a string in the format integer-unit. eg. `--max_cpus 1` - -### `--plaintext_email` - -Set to receive plain-text e-mails instead of HTML formatted. - -### `--monochrome_logs` - -Set to disable colourful command line output and live life in monochrome. - -### `--multiqc_config` - -Specify a path to a custom MultiQC configuration file. diff --git a/environment.yml b/environment.yml index e8944c64299636c7796f9ff22ff64cc48a7fb22f..ccca9c3d12e94380287c1be0f9f34ca9813890ae 100644 --- a/environment.yml +++ b/environment.yml @@ -1,6 +1,6 @@ # You can use this file to create a conda environment for this pipeline: # conda env create -f environment.yml -name: nf-core-hic-1.2.1 +name: nf-core-hic-1.2.2 channels: - conda-forge - bioconda diff --git a/main.nf b/main.nf index 565dbf19bcdc80620010617520070d8001884bdc..5f7e943327ce8163c0c6c1bc373e97c02b864907 100644 --- a/main.nf +++ b/main.nf @@ -18,10 +18,10 @@ def helpMessage() { The typical command for running the pipeline is as follows: - nextflow run nf-core/hic --reads '*_R{1,2}.fastq.gz' -profile conda + nextflow run nf-core/hic --input '*_R{1,2}.fastq.gz' -profile docker Mandatory arguments: - --reads [file] Path to input data (must be surrounded with quotes) + --input [file] Path to input data (must be surrounded with quotes) -profile [str] Configuration profile to use. Can use multiple (comma separated) Available: conda, docker, singularity, awsbatch, test and more. @@ -32,9 +32,10 @@ def helpMessage() { --chromosome_size [file] Path to chromosome size file --restriction_fragments [file] Path to restriction fragment file (bed) --save_reference [bool] Save reference genome to output folder. Default: False - --save_aligned_intermediates [bool] Save intermediates alignment files. Default: False Alignments + --split_fastq [bool] Size of read chuncks to use to speed up the workflow. Default: None + --save_aligned_intermediates [bool] Save intermediates alignment files. Default: False --bwt2_opts_end2end [str] Options for bowtie2 end-to-end mappinf (first mapping step). See hic.config for default. --bwt2_opts_trimmed [str] Options for bowtie2 mapping after ligation site trimming. See hic.config for default. --min_mapq [int] Minimum mapping quality values to consider. Default: 10 @@ -68,17 +69,18 @@ def helpMessage() { --skip_cool [bool] Skip generation of cool files. Default: False --skip_multiqc [bool] Skip MultiQC. Default: False - Other - --split_fastq [bool] Size of read chuncks to use to speed up the workflow. Default: None - --outdir [file] The output directory where the results will be saved. Default: './results' + Other options: + --outdir [file] The output directory where the results will be saved + --publish_dir_mode [str] Mode for publishing results in the output directory. Available: symlink, rellink, link, copy, copyNoFollow, move (Default: copy) --email [email] Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. Default: None --email_on_fail [email] Same as --email, except only send mail if the workflow is not successful --max_multiqc_email_size [str] Theshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB) -name [str] Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic. Default: None - AWSBatch + AWSBatch options: --awsqueue [str] The AWSBatch JobQueue that needs to be set when running on AWSBatch --awsregion [str] The AWS Region for your AWS Batch job to run on + --awscli [str] Path to the AWS CLI tool """.stripIndent() } @@ -107,12 +109,13 @@ params.bwt2_index = params.genome ? params.genomes[ params.genome ].bowtie2 ?: f params.fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false // Has the run name been specified by the user? -// this has the bonus effect of catching both -name and --name +// this has the bonus effect of catching both -name and --name custom_runName = params.name if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { custom_runName = workflow.runName } +// Check AWS batch settings if (workflow.profile.contains('awsbatch')) { // AWSBatch sanity checking if (!params.awsqueue || !params.awsregion) exit 1, "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" @@ -127,6 +130,7 @@ if (workflow.profile.contains('awsbatch')) { ch_multiqc_config = file("$baseDir/assets/multiqc_config.yaml", checkIfExists: true) ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) +ch_output_docs_images = file("$baseDir/docs/images/", checkIfExists: true) /********************************************************** * SET UP CHANNELS @@ -135,13 +139,13 @@ ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) /* * input read files */ -if (params.readPaths){ +if (params.input_paths){ raw_reads = Channel.create() raw_reads_2 = Channel.create() Channel - .from( params.readPaths ) + .from( params.input_paths ) .map { row -> [ row[0], [file(row[1][0]), file(row[1][1])]] } .separate( raw_reads, raw_reads_2 ) { a -> [tuple(a[0], a[1][0]), tuple(a[0], a[1][1])] } }else{ @@ -150,21 +154,21 @@ if (params.readPaths){ raw_reads_2 = Channel.create() Channel - .fromFilePairs( params.reads ) + .fromFilePairs( params.input ) .separate( raw_reads, raw_reads_2 ) { a -> [tuple(a[0], a[1][0]), tuple(a[0], a[1][1])] } } +// SPlit fastq files +// https://www.nextflow.io/docs/latest/operator.html#splitfastq + if ( params.split_fastq ){ raw_reads_full = raw_reads.concat( raw_reads_2 ) - raw_reads = raw_reads_full.splitFastq( by: params.splitFastq , file: true) + raw_reads = raw_reads_full.splitFastq( by: params.split_fastq , file: true) }else{ raw_reads = raw_reads.concat( raw_reads_2 ).dump(tag: "data") } -// SPlit fastq files -// https://www.nextflow.io/docs/latest/operator.html#splitfastq - /* * Other input channels */ @@ -233,13 +237,14 @@ log.info nfcoreHeader() def summary = [:] if(workflow.revision) summary['Pipeline Release'] = workflow.revision summary['Run Name'] = custom_runName ?: workflow.runName -summary['Reads'] = params.reads +summary['Input'] = params.input summary['splitFastq'] = params.split_fastq summary['Fasta Ref'] = params.fasta summary['Restriction Motif']= params.restriction_site summary['Ligation Motif'] = params.ligation_site summary['DNase Mode'] = params.dnase summary['Remove Dup'] = params.rm_dup +summary['Remove MultiHits'] = params.rm_multi summary['Min MAPQ'] = params.min_mapq summary['Min Fragment Size']= params.min_restriction_fragment_size summary['Max Fragment Size']= params.max_restriction_fragment_size @@ -267,12 +272,14 @@ if(workflow.profile == 'awsbatch'){ summary['AWS Queue'] = params.awsqueue } summary['Config Profile'] = workflow.profile -if(params.config_profile_description) summary['Config Description'] = params.config_profile_description -if(params.config_profile_contact) summary['Config Contact'] = params.config_profile_contact -if(params.config_profile_url) summary['Config URL'] = params.config_profile_url -if(params.email) { - summary['E-mail Address'] = params.email - summary['MultiQC maxsize'] = params.maxMultiqcEmailFileSize +if (params.config_profile_description) summary['Config Profile Description'] = params.config_profile_description +if (params.config_profile_contact) summary['Config Profile Contact'] = params.config_profile_contact +if (params.config_profile_url) summary['Config Profile URL'] = params.config_profile_url +summary['Config Files'] = workflow.configFiles.join(', ') +if (params.email || params.email_on_fail) { + summary['E-mail Address'] = params.email + summary['E-mail on failure'] = params.email_on_fail + summary['MultiQC maxsize'] = params.max_multiqc_email_size } log.info summary.collect { k,v -> "${k.padRight(18)}: $v" }.join("\n") log.info "-\033[2m--------------------------------------------------\033[0m-" @@ -301,11 +308,11 @@ Channel.from(summary.collect{ [it.key, it.value] }) */ process get_software_versions { - publishDir "${params.outdir}/pipeline_info", mode: 'copy', - saveAs: {filename -> - if (filename.indexOf(".csv") > 0) filename - else null - } + publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode, + saveAs: { filename -> + if (filename.indexOf(".csv") > 0) filename + else null + } output: file 'software_versions_mqc.yaml' into software_versions_yaml @@ -352,7 +359,7 @@ if(!params.bwt2_index && params.fasta){ tag "$bwt2_base" label 'process_highmem' publishDir path: { params.save_reference ? "${params.outdir}/reference_genome" : params.outdir }, - saveAs: { params.save_reference ? it : null }, mode: 'copy' + saveAs: { params.save_reference ? it : null }, mode: params.publish_dir_mode input: file fasta from fasta_for_index @@ -375,7 +382,7 @@ if(!params.chromosome_size && params.fasta){ tag "$fasta" label 'process_low' publishDir path: { params.save_reference ? "${params.outdir}/reference_genome" : params.outdir }, - saveAs: { params.save_reference ? it : null }, mode: 'copy' + saveAs: { params.save_reference ? it : null }, mode: params.publish_dir_mode input: file fasta from fasta_for_chromsize @@ -396,7 +403,7 @@ if(!params.restriction_fragments && params.fasta && !params.dnase){ tag "$fasta ${params.restriction_site}" label 'process_low' publishDir path: { params.save_reference ? "${params.outdir}/reference_genome" : params.outdir }, - saveAs: { params.save_reference ? it : null }, mode: 'copy' + saveAs: { params.save_reference ? it : null }, mode: params.publish_dir_mode input: file fasta from fasta_for_resfrag @@ -423,7 +430,7 @@ process bowtie2_end_to_end { tag "$prefix" label 'process_medium' publishDir path: { params.save_aligned_intermediates ? "${params.outdir}/mapping" : params.outdir }, - saveAs: { params.save_aligned_intermediates ? it : null }, mode: 'copy' + saveAs: { params.save_aligned_intermediates ? it : null }, mode: params.publish_dir_mode input: set val(sample), file(reads) from raw_reads @@ -462,7 +469,7 @@ process trim_reads { tag "$prefix" label 'process_low' publishDir path: { params.save_aligned_intermediates ? "${params.outdir}/mapping" : params.outdir }, - saveAs: { params.save_aligned_intermediates ? it : null }, mode: 'copy' + saveAs: { params.save_aligned_intermediates ? it : null }, mode: params.publish_dir_mode when: !params.dnase @@ -485,7 +492,7 @@ process bowtie2_on_trimmed_reads { tag "$prefix" label 'process_medium' publishDir path: { params.save_aligned_intermediates ? "${params.outdir}/mapping" : params.outdir }, - saveAs: { params.save_aligned_intermediates ? it : null }, mode: 'copy' + saveAs: { params.save_aligned_intermediates ? it : null }, mode: params.publish_dir_mode when: !params.dnase @@ -513,7 +520,7 @@ if (!params.dnase){ tag "$sample = $bam1 + $bam2" label 'process_medium' publishDir path: { params.save_aligned_intermediates ? "${params.outdir}/mapping" : params.outdir }, - saveAs: { params.save_aligned_intermediates ? it : null }, mode: 'copy' + saveAs: { params.save_aligned_intermediates ? it : null }, mode: params.publish_dir_mode input: set val(prefix), file(bam1), file(bam2) from end_to_end_bam.join( trimmed_bam ) @@ -554,7 +561,7 @@ if (!params.dnase){ tag "$sample = $bam1" label 'process_medium' publishDir path: { params.save_aligned_intermediates ? "${params.outdir}/mapping" : params.outdir }, - saveAs: { params.save_aligned_intermediates ? it : null }, mode: 'copy' + saveAs: { params.save_aligned_intermediates ? it : null }, mode: params.publish_dir_mode input: set val(prefix), file(bam1) from end_to_end_bam @@ -583,7 +590,7 @@ if (!params.dnase){ process combine_mapped_files{ tag "$sample = $r1_prefix + $r2_prefix" label 'process_low' - publishDir "${params.outdir}/mapping", mode: 'copy', + publishDir "${params.outdir}/mapping", mode: params.publish_dir_mode, saveAs: {filename -> filename.indexOf(".pairstat") > 0 ? "stats/$filename" : "$filename"} input: @@ -618,7 +625,7 @@ if (!params.dnase){ process get_valid_interaction{ tag "$sample" label 'process_low' - publishDir "${params.outdir}/hic_results/data", mode: 'copy', + publishDir "${params.outdir}/hic_results/data", mode: params.publish_dir_mode, saveAs: {filename -> filename.indexOf("*stat") > 0 ? "stats/$filename" : "$filename"} input: @@ -657,7 +664,7 @@ else{ process get_valid_interaction_dnase{ tag "$sample" label 'process_low' - publishDir "${params.outdir}/hic_results/data", mode: 'copy', + publishDir "${params.outdir}/hic_results/data", mode: params.publish_dir_mode, saveAs: {filename -> filename.indexOf("*stat") > 0 ? "stats/$filename" : "$filename"} input: @@ -691,7 +698,7 @@ else{ process remove_duplicates { tag "$sample" label 'process_highmem' - publishDir "${params.outdir}/hic_results/data", mode: 'copy', + publishDir "${params.outdir}/hic_results/data", mode: params.publish_dir_mode, saveAs: {filename -> filename.indexOf("*stat") > 0 ? "stats/$sample/$filename" : "$filename"} input: @@ -738,7 +745,7 @@ process remove_duplicates { process merge_sample { tag "$ext" label 'process_low' - publishDir "${params.outdir}/hic_results/stats/${sample}", mode: 'copy' + publishDir "${params.outdir}/hic_results/stats/${sample}", mode: params.publish_dir_mode input: set val(prefix), file(fstat) from all_mapstat.groupTuple().concat(all_pairstat.groupTuple(), all_rsstat.groupTuple()) @@ -760,7 +767,7 @@ process merge_sample { process build_contact_maps{ tag "$sample - $mres" label 'process_highmem' - publishDir "${params.outdir}/hic_results/matrix/raw", mode: 'copy' + publishDir "${params.outdir}/hic_results/matrix/raw", mode: params.publish_dir_mode when: !params.skip_maps @@ -786,7 +793,7 @@ process build_contact_maps{ process run_ice{ tag "$rmaps" label 'process_highmem' - publishDir "${params.outdir}/hic_results/matrix/iced", mode: 'copy' + publishDir "${params.outdir}/hic_results/matrix/iced", mode: params.publish_dir_mode when: !params.skip_maps && !params.skip_ice @@ -815,7 +822,7 @@ process run_ice{ process generate_cool{ tag "$sample" label 'process_medium' - publishDir "${params.outdir}/export/cool", mode: 'copy' + publishDir "${params.outdir}/export/cool", mode: params.publish_dir_mode when: !params.skip_cool @@ -839,7 +846,7 @@ process generate_cool{ */ process multiqc { label 'process_low' - publishDir "${params.outdir}/MultiQC", mode: 'copy' + publishDir "${params.outdir}/MultiQC", mode: params.publish_dir_mode when: !params.skip_multiqc @@ -867,10 +874,11 @@ process multiqc { * STEP 7 - Output Description HTML */ process output_documentation { - publishDir "${params.outdir}/pipeline_info", mode: 'copy' + publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode - input: - file output_docs from ch_output_docs + input: + file output_docs from ch_output_docs + file images from ch_output_docs_images output: file "results_description.html" @@ -962,7 +970,11 @@ workflow.onComplete { log.info "[nf-core/hic] Sent summary e-mail to $email_address (sendmail)" } catch (all) { // Catch failures and try with plaintext - [ 'mail', '-s', subject, email_address ].execute() << email_txt + def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] + if ( mqc_report.size() <= params.max_multiqc_email_size.toBytes() ) { + mail_cmd += [ '-A', mqc_report ] + } + mail_cmd.execute() << email_html log.info "[nf-core/hic] Sent summary e-mail to $email_address (mail)" } } diff --git a/nextflow.config b/nextflow.config index 391610d38352d7e17b2d1437dbe6c66fab760804..edb8038b95bffe88773574849a2ce4df07e9aec4 100644 --- a/nextflow.config +++ b/nextflow.config @@ -10,12 +10,13 @@ params { // Workflow flags genome = false - reads = "data/*{1,2}.fastq.gz" + input = "data/*{1,2}.fastq.gz" single_end = false outdir = './results' genome = false - readPaths = false + input_paths = false + split_fastq = false chromosome_size = false restriction_fragments = false skip_maps = false @@ -25,16 +26,30 @@ params { save_reference = false save_interaction_bam = false save_aligned_intermediates = false - - dnase = false - rm_dup = false - rm_singleton = false - rm_multi = false + + bwt2_opts_end2end = '--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder' + bwt2_opts_trimmed = '--very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder' + min_mapq = 10 + + // Digestion Hi-C + restriction_site = 'A^AGCTT' + ligation_site = 'AAGCTAGCTT' min_restriction_fragment_size = false max_restriction_fragment_size = false min_insert_size = false max_insert_size = false + dnase = false min_cis_dist = false + rm_dup = true + rm_singleton = true + rm_multi = true + bin_size = '1000000,500000' + ice_max_iter = 100 + ice_filer_low_count_perc = 0.02 + ice_filer_high_count_perc = 0 + ice_eps = 0.1 + + publish_dir_mode = 'copy' // Boilerplate options multiqc_config = false @@ -64,7 +79,7 @@ params { // Container slug. Stable releases should specify release tag! // Developmental code should specify :dev -process.container = 'nfcore/hic:1.2.1' +process.container = 'nfcore/hic:1.2.2' // Load base.config by default for all pipelines includeConfig 'conf/base.config' @@ -76,9 +91,6 @@ try { System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") } -// Load hic config file -includeConfig 'conf/hicpro.config' - // Create profiles profiles { conda { process.conda = "$baseDir/environment.yml" } @@ -103,9 +115,11 @@ if (!params.igenomes_ignore) { includeConfig 'conf/igenomes.config' } -// Export this variable to prevent local Python libraries from conflicting with those in the container +// Export these variables to prevent local Python/R libraries from conflicting with those in the container env { PYTHONNOUSERSITE = 1 + R_PROFILE_USER = "/.Rprofile" + R_ENVIRON_USER = "/.Renviron" } // Capture exit codes from upstream processes when piping @@ -135,7 +149,7 @@ manifest { description = 'Analysis of Chromosome Conformation Capture data (Hi-C)' mainScript = 'main.nf' nextflowVersion = '>=19.10.0' - version = '1.2.1' + version = '1.2.2' } // Function to ensure that resource requirements don't go beyond diff --git a/nextflow_schema.json b/nextflow_schema.json new file mode 100644 index 0000000000000000000000000000000000000000..ed2f701f6d62f11b45df12a035594bec31401d5e --- /dev/null +++ b/nextflow_schema.json @@ -0,0 +1,459 @@ +{ + "$schema": "https://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/hic/master/nextflow_schema.json", + "title": "nf-core/hic pipeline parameters", + "description": "Analysis of Chromosome Conformation Capture data (Hi-C)", + "type": "object", + "definitions": { + "input_output_options": { + "title": "Input/output options", + "type": "object", + "fa_icon": "fas fa-terminal", + "description": "Define where the pipeline should find input data and save output data.", + "required": [ + "input" + ], + "properties": { + "input": { + "type": "string", + "fa_icon": "fas fa-dna", + "description": "Input FastQ files.", + "help_text": "Use this to specify the location of your input FastQ files. For example:\n\n```bash\n--input 'path/to/data/sample_*_{1,2}.fastq'\n```\n\nPlease note the following requirements:\n\n1. The path must be enclosed in quotes\n2. The path must have at least one `*` wildcard character\n3. When using the pipeline with paired end data, the path must use `{1,2}` notation to specify read pairs.\n\nIf left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz`" + }, + "input_paths": { + "type": "string", + "hidden": true, + "description": "Input FastQ files for test only", + "default": "undefined" + }, + "split_fastq": { + "type": "number", + "description": "Split the reads into chunks before running. Specify the number of reads per chuncks as --split_fastq 20000000.", + "fa_icon": "fas fa-dna" + }, + "single_end": { + "type": "boolean", + "description": "Specifies that the input is single-end reads.", + "fa_icon": "fas fa-align-center", + "help_text": "By default, the pipeline expects paired-end data. If you have single-end data, you need to specify `--single_end` on the command line when you launch the pipeline. A normal glob pattern, enclosed in quotation marks, can then be used for `--input`. For example:\n\n```bash\n--single_end --input '*.fastq'\n```\n\nIt is not possible to run a mixture of single-end and paired-end files in one run." + }, + "outdir": { + "type": "string", + "description": "The output directory where the results will be saved.", + "default": "./results", + "fa_icon": "fas fa-folder-open" + }, + "email": { + "type": "string", + "description": "Email address for completion summary.", + "fa_icon": "fas fa-envelope", + "help_text": "Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$" + } + } + }, + "reference_genome_options": { + "title": "Reference genome options", + "type": "object", + "fa_icon": "fas fa-dna", + "description": "Options for the reference genome indices used to align reads.", + "properties": { + "genome": { + "type": "string", + "description": "Name of iGenomes reference.", + "fa_icon": "fas fa-book", + "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`.\n\nSee the [nf-core website docs](https://nf-co.re/usage/reference_genomes) for more details." + }, + "fasta": { + "type": "string", + "fa_icon": "fas fa-font", + "description": "Path to FASTA genome file.", + "help_text": "If you have no genome reference available, the pipeline can build one using a FASTA file. This requires additional time and resources, so it's better to use a pre-build index if possible." + }, + "igenomes_base": { + "type": "string", + "description": "Directory / URL base for iGenomes references.", + "default": "s3://ngi-igenomes/igenomes/", + "fa_icon": "fas fa-cloud-download-alt", + "hidden": true + }, + "igenomes_ignore": { + "type": "boolean", + "description": "Do not load the iGenomes reference config.", + "fa_icon": "fas fa-ban", + "hidden": true, + "help_text": "Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`." + }, + "bwt2_index": { + "type": "string", + "description": "Full path to directory containing Bowtie index including base name. i.e. `/path/to/index/base`.", + "fa_icon": "far fa-file-alt" + }, + "chromosome_size": { + "type": "string", + "description": "Full path to file specifying chromosome sizes (tab separated with chromosome name and size)`.", + "fa_icon": "far fa-file-alt", + "help_text": "If not specified, the pipeline will build this file from the reference genome file" + }, + "restriction_fragments": { + "type": "string", + "description": "Full path to restriction fragment (bed) file.", + "fa_icon": "far fa-file-alt", + "help_text": "This file depends on the Hi-C protocols and digestion strategy. If not provided, the pipeline will build it using the --restriction_site option" + }, + "save_reference": { + "type": "boolean", + "description": "If generated by the pipeline save the annotation and indexes in the results directory.", + "help_text": "Use this parameter to save all annotations to your results folder. These can then be used for future pipeline runs, reducing processing times.", + "fa_icon": "fas fa-save" + } + } + }, + "data_processing_options": { + "title": "Data processing", + "type": "object", + "description": "Parameters for Hi-C data processing", + "default": "", + "fa_icon": "fas fa-bahai", + "properties": { + "dnase": { + "type": "boolean", + "description": "For Hi-C protocols which are not based on enzyme digestion such as DNase Hi-C" + }, + "restriction_site": { + "type": "string", + "default": "'A^AGCTT'", + "description": "Restriction motifs used during digestion. Several motifs (comma separated) can be provided." + }, + "ligation_site": { + "type": "string", + "default": "'AAGCTAGCTT", + "description": "Expected motif after DNA ligation. Several motifs (comma separated) can be provided." + }, + "rm_dup": { + "type": "boolean", + "description": "Remove duplicates", + "default": true + }, + "rm_multi": { + "type": "boolean", + "description": "Remove multi-mapped reads", + "default": true + }, + "rm_singleton": { + "type": "boolean", + "description": "Remove singleton", + "default": true + }, + "min_mapq": { + "type": "integer", + "default": "10", + "description": "Keep aligned reads with a minimum quality value" + }, + "bwt2_opts_end2end": { + "type": "string", + "default": "'--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder'", + "description": "Option for end-to-end bowtie mapping" + }, + "bwt2_opts_trimmed": { + "type": "string", + "default": "'--very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder'", + "description": "Option for trimmed reads mapping" + }, + "save_interaction_bam": { + "type": "boolean", + "description": "Save a BAM file where all reads are flagged by their interaction classes" + }, + "save_aligned_intermediates": { + "type": "boolean", + "description": "Save all BAM files during two-steps mapping" + } + } + }, + "contacts_calling_options": { + "title": "Contacts calling", + "type": "object", + "description": "Options to call significant interactions", + "default": "", + "fa_icon": "fas fa-signature", + "properties": { + "min_cis_dist": { + "type": "string", + "default": "undefined", + "description": "Minimum distance between loci to consider. Useful for --dnase mode to remove spurious ligation products" + }, + "max_insert_size": { + "type": "string", + "default": "undefined", + "description": "Maximum fragment size to consider" + }, + "min_insert_size": { + "type": "string", + "default": "undefined", + "description": "Minimum fragment size to consider" + }, + "max_restriction_fragment_size": { + "type": "string", + "default": "undefined", + "description": "Maximum restriction fragment size to consider" + }, + "min_restriction_fragment_size": { + "type": "string", + "default": "undefined", + "description": "Minimum restriction fragment size to consider" + } + } + }, + "contact_maps_options": { + "title": "Contact maps", + "type": "object", + "description": "Options to build Hi-C contact maps", + "default": "", + "fa_icon": "fas fa-chess-board", + "properties": { + "bin_size": { + "type": "string", + "default": "'1000000,500000'", + "description": "Resolution to build the maps (comma separated)" + }, + "ice_filer_low_count_perc": { + "type": "string", + "default": 0.02, + "description": "Filter low counts rows before normalization" + }, + "ice_filer_high_count_perc": { + "type": "integer", + "default": "0", + "description": "Filter high counts rows before normalization" + }, + "ice_eps": { + "type": "string", + "default": "0.1", + "description": "Threshold for ICE convergence" + }, + "ice_max_iter": { + "type": "integer", + "default": "100", + "description": "Maximum number of iteraction for ICE normalization" + } + } + }, + "skip_options": { + "title": "Skip options", + "type": "object", + "description": "Skip some steps of the pipeline", + "default": "", + "fa_icon": "fas fa-random", + "properties": { + "skip_maps": { + "type": "boolean", + "description": "Do not build contact maps" + }, + "skip_ice": { + "type": "boolean", + "description": "Do not normalize contact maps" + }, + "skip_cool": { + "type": "boolean", + "description": "Do not generate cooler file" + }, + "skip_multiqc": { + "type": "boolean", + "description": "Do not generate MultiQC report" + } + } + }, + "generic_options": { + "title": "Generic options", + "type": "object", + "fa_icon": "fas fa-file-import", + "description": "Less common options for the pipeline, typically set in a config file.", + "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", + "properties": { + "help": { + "type": "boolean", + "description": "Display help text.", + "hidden": true, + "fa_icon": "fas fa-question-circle" + }, + "publish_dir_mode": { + "type": "string", + "default": "copy", + "hidden": true, + "description": "Method used to save pipeline results to output directory.", + "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.", + "fa_icon": "fas fa-copy", + "enum": [ + "symlink", + "rellink", + "link", + "copy", + "copyNoFollow", + "mov" + ] + }, + "name": { + "type": "string", + "description": "Workflow name.", + "fa_icon": "fas fa-fingerprint", + "hidden": true, + "help_text": "A custom name for the pipeline run. Unlike the core nextflow `-name` option with one hyphen this parameter can be reused multiple times, for example if using `-resume`. Passed through to steps such as MultiQC and used for things like report filenames and titles." + }, + "email_on_fail": { + "type": "string", + "description": "Email address for completion summary, only when pipeline fails.", + "fa_icon": "fas fa-exclamation-triangle", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$", + "hidden": true, + "help_text": "This works exactly as with `--email`, except emails are only sent if the workflow is not successful." + }, + "plaintext_email": { + "type": "boolean", + "description": "Send plain-text email instead of HTML.", + "fa_icon": "fas fa-remove-format", + "hidden": true, + "help_text": "Set to receive plain-text e-mails instead of HTML formatted." + }, + "max_multiqc_email_size": { + "type": "string", + "description": "File size limit when attaching MultiQC reports to summary emails.", + "default": "25.MB", + "fa_icon": "fas fa-file-upload", + "hidden": true, + "help_text": "If file generated by pipeline exceeds the threshold, it will not be attached." + }, + "monochrome_logs": { + "type": "boolean", + "description": "Do not use coloured log outputs.", + "fa_icon": "fas fa-palette", + "hidden": true, + "help_text": "Set to disable colourful command line output and live life in monochrome." + }, + "multiqc_config": { + "type": "string", + "description": "Custom config file to supply to MultiQC.", + "fa_icon": "fas fa-cog", + "hidden": true + }, + "tracedir": { + "type": "string", + "description": "Directory to keep pipeline Nextflow logs and reports.", + "default": "${params.outdir}/pipeline_info", + "fa_icon": "fas fa-cogs", + "hidden": true + } + } + }, + "max_job_request_options": { + "title": "Max job request options", + "type": "object", + "fa_icon": "fab fa-acquisitions-incorporated", + "description": "Set the top limit for requested resources for any single job.", + "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", + "properties": { + "max_cpus": { + "type": "integer", + "description": "Maximum number of CPUs that can be requested for any single job.", + "default": 16, + "fa_icon": "fas fa-microchip", + "hidden": true, + "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" + }, + "max_memory": { + "type": "string", + "description": "Maximum amount of memory that can be requested for any single job.", + "default": "128.GB", + "fa_icon": "fas fa-memory", + "hidden": true, + "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" + }, + "max_time": { + "type": "string", + "description": "Maximum amount of time that can be requested for any single job.", + "default": "240.h", + "fa_icon": "far fa-clock", + "hidden": true, + "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" + } + } + }, + "institutional_config_options": { + "title": "Institutional config options", + "type": "object", + "fa_icon": "fas fa-university", + "description": "Parameters used to describe centralised config profiles. These should not be edited.", + "help_text": "The centralised nf-core configuration profiles use a handful of pipeline parameters to describe themselves. This information is then printed to the Nextflow log when you run a pipeline. You should not need to change these values when you run a pipeline.", + "properties": { + "custom_config_version": { + "type": "string", + "description": "Git commit id for Institutional configs.", + "default": "master", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`.\n\n```bash\n## Download and use config file with following git commit id\n--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96\n```" + }, + "custom_config_base": { + "type": "string", + "description": "Base directory for Institutional configs.", + "default": "https://raw.githubusercontent.com/nf-core/configs/master", + "hidden": true, + "help_text": "If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the `custom_config_base` option. For example:\n\n```bash\n## Download and unzip the config files\ncd /path/to/my/configs\nwget https://github.com/nf-core/configs/archive/master.zip\nunzip master.zip\n\n## Run the pipeline\ncd /path/to/my/data\nnextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/\n```\n\n> Note that the nf-core/tools helper package has a `download` command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.", + "fa_icon": "fas fa-users-cog" + }, + "hostnames": { + "type": "string", + "description": "Institutional configs hostname.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "config_profile_description": { + "type": "string", + "description": "Institutional config description.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "config_profile_contact": { + "type": "string", + "description": "Institutional config contact information.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "config_profile_url": { + "type": "string", + "description": "Institutional config URL link.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + } + } + } + }, + "allOf": [ + { + "$ref": "#/definitions/input_output_options" + }, + { + "$ref": "#/definitions/reference_genome_options" + }, + { + "$ref": "#/definitions/data_processing_options" + }, + { + "$ref": "#/definitions/contacts_calling_options" + }, + { + "$ref": "#/definitions/contact_maps_options" + }, + { + "$ref": "#/definitions/skip_options" + }, + { + "$ref": "#/definitions/generic_options" + }, + { + "$ref": "#/definitions/max_job_request_options" + }, + { + "$ref": "#/definitions/institutional_config_options" + } + ] +}