Skip to content
Snippets Groups Projects
Unverified Commit 31cfa722 authored by Nicolas Servant's avatar Nicolas Servant Committed by GitHub
Browse files

Merge pull request #23 from nservant/template_merge

Template merge
parents e2f846bb 9f24674d
Branches
Tags
No related merge requests found
Showing
with 355 additions and 342 deletions
......@@ -9,9 +9,7 @@ Please use the pre-filled template to save time.
However, don't be put off by this template - other more general issues and suggestions are welcome!
Contributions to the code are even more welcome ;)
> If you need help using or modifying nf-core/hic then the best place to ask is on the nf-core
Slack [#hic](https://nfcore.slack.com/channels/hic) channel ([join our Slack here](https://nf-co.re/join/slack)).
> If you need help using or modifying nf-core/hic then the best place to ask is on the nf-core Slack [#hic](https://nfcore.slack.com/channels/hic) channel ([join our Slack here](https://nf-co.re/join/slack)).
## Contribution workflow
......@@ -20,8 +18,9 @@ If you'd like to write some code for nf-core/hic, the standard workflow is as fo
1. Check that there isn't already an issue about your idea in the [nf-core/hic issues](https://github.com/nf-core/hic/issues) to avoid duplicating work
* If there isn't one already, please create one so that others know you're working on this
2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/hic repository](https://github.com/nf-core/hic) to your GitHub account
3. Make the necessary changes / additions within your forked repository
4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged
3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions)
4. Use `nf-core schema build .` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10).
5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged
If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/).
......@@ -32,14 +31,14 @@ Typically, pull-requests are only fully reviewed when these tests are passing, t
There are typically two types of tests that run:
### Lint Tests
### Lint tests
`nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to.
To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint <pipeline-directory>` command.
If any failures or warnings are encountered, please follow the listed URL for more documentation.
### Pipeline Tests
### Pipeline tests
Each `nf-core` pipeline should be set up with a minimal set of test-data.
`GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully.
......@@ -55,8 +54,75 @@ These tests are run both with the latest available version of `Nextflow` and als
* A PR should be made on `master` from patch to directly this particular bug.
## Getting help
For further information/help, please consult the [nf-core/hic documentation](https://nf-co.re/nf-core/hic/docs) and
don't hesitate to get in touch on the nf-core Slack [#hic](https://nfcore.slack.com/channels/hic) channel
([join our Slack here](https://nf-co.re/join/slack)).
For further information/help, please consult the [nf-core/hic documentation](https://nf-co.re/hic/usage) and don't hesitate to get in touch on the nf-core Slack [#hic](https://nfcore.slack.com/channels/hic) channel ([join our Slack here](https://nf-co.re/join/slack)).
## Pipeline contribution conventions
To make the nf-core/hic code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written.
### Adding a new step
If you wish to contribute a new step, please use the following coding standards:
1. Define the corresponding input channel into your new process from the expected previous process channel
2. Write the process block (see below).
3. Define the output channel if needed (see below).
4. Add any new flags/options to `nextflow.config` with a default (see below).
5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`)
6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter).
7. Add sanity checks for all relevant parameters.
8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`.
9. Do local tests that the new code works properly and as expected.
10. Add a new test command in `.github/workflow/ci.yaml`.
11. If applicable add a [MultiQC](https://https://multiqc.info/) module.
12. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order.
13. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`.
### Default values
Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope.
Once there, use `nf-core schema build .` to add to `nextflow_schema.json`.
### Default processes resource requirements
Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/%7B%7Bcookiecutter.name_noslash%7D%7D/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.
The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block.
### Naming schemes
Please use the following naming schemes, to make it easy to understand what is going where.
* initial process channel: `ch_output_from_<process>`
* intermediate and terminal channels: `ch_<previousprocess>_for_<nextprocess>`
### Nextflow version bumping
If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]`
### Software version reporting
If you add a new tool to the pipeline, please ensure you add the information of the tool to the `get_software_version` process.
Add to the script block of the process, something like the following:
```bash
<YOUR_TOOL> --version &> v_<YOUR_TOOL>.txt 2>&1 || true
```
or
```bash
<YOUR_TOOL> --help | head -n 1 &> v_<YOUR_TOOL>.txt 2>&1 || true
```
You then need to edit the script `bin/scrape_software_versions.py` to:
1. Add a Python regex for your tool's `--version` output (as in stored in the `v_<YOUR_TOOL>.txt` file), to ensure the version is reported as a `v` and the version number e.g. `v2.1.1`
2. Add a HTML entry to the `OrderedDict` for formatting in MultiQC.
### Images and figures
For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines).
......@@ -13,6 +13,13 @@ Thanks for telling us about a problem with the pipeline.
Please delete this text and anything that's not relevant from the template below:
-->
## Check Documentation
I have checked the following places for your error:
- [ ] [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting)
- [ ] [nf-core/hic pipeline documentation](https://nf-co.re/nf-core/hic/usage)
## Description of the bug
<!-- A clear and concise description of what the bug is. -->
......@@ -28,6 +35,13 @@ Steps to reproduce the behaviour:
<!-- A clear and concise description of what you expected to happen. -->
## Log files
Have you provided the following extra information/files:
- [ ] The command used to run the pipeline
- [ ] The `.nextflow.log` file <!-- this is a hidden file in the directory where you launched the pipeline -->
## System
- Hardware: <!-- [e.g. HPC, Desktop, Cloud...] -->
......
......@@ -11,7 +11,6 @@ Hi there!
Thanks for suggesting a new feature for the pipeline!
Please delete this text and anything that's not relevant from the template below:
-->
## Is your feature request related to a problem? Please describe
......
......@@ -13,9 +13,15 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/hic/
## PR checklist
- [ ] This comment contains a description of changes (with reason)
- [ ] `CHANGELOG.md` is updated
- [ ] This comment contains a description of changes (with reason).
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] Documentation in `docs` is updated
- [ ] If necessary, also make a PR on the [nf-core/hic branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/hic)
- [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py`
- [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/hic/tree/master/.github/CONTRIBUTING.md)
- [ ] If necessary, also make a PR on the nf-core/hic _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository.
- [ ] Make sure your code lints (`nf-core lint .`).
- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`).
- [ ] Usage Documentation in `docs/usage.md` is updated.
- [ ] Output Documentation in `docs/output.md` is updated.
- [ ] `CHANGELOG.md` is updated.
- [ ] `README.md` is updated (including new tool citations and authors/contributors).
# Markdownlint configuration file
default: true,
default: true
line-length: false
no-duplicate-header:
siblings_only: true
no-inline-html:
allowed_elements:
- img
- p
- kbd
- details
- summary
......@@ -33,8 +33,10 @@ jobs:
nf-core:
runs-on: ubuntu-latest
steps:
- name: Check out pipeline code
uses: actions/checkout@v2
- name: Install Nextflow
env:
CAPSULE_LOG: none
......@@ -72,5 +74,3 @@ jobs:
lint_log.txt
lint_results.md
PR_number.txt
......@@ -21,7 +21,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
* Fix bug in `--bin_size` parameter (#85)
* `--min_mapq` is ignored if `--keep_multi` is used
### Deprecated
### `Deprecated`
* `--rm_dup` and `--rm_multi` are replaced by `--keep_dup` and `--keep_multi`
......
......@@ -54,13 +54,7 @@ project may be further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team on
[Slack](https://nf-co.re/join/slack). The project team will review
and investigate all complaints, and will respond in a way that it deems
appropriate to the circumstances. The project team is obligated to maintain
confidentiality with regard to the reporter of an incident. Further details
of specific enforcement policies may be posted separately.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-co.re/join/slack). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
......@@ -68,9 +62,7 @@ members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 1.4, available at
[https://www.contributor-covenant.org/version/1/4/code-of-conduct/][version]
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [https://www.contributor-covenant.org/version/1/4/code-of-conduct/][version]
[homepage]: https://contributor-covenant.org
[version]: https://www.contributor-covenant.org/version/1/4/code-of-conduct/
FROM nfcore/base:1.12
FROM nfcore/base:1.12.1
LABEL authors="Nicolas Servant" \
description="Docker image containing all software requirements for the nf-core/hic pipeline"
## Install gcc for pip iced install
RUN apt-get update && apt-get install -y gcc g++ && apt-get clean -y
# Install the conda environment
COPY environment.yml /
RUN conda env create --quiet -f /environment.yml && conda clean -a
......
# ![nf-core/hic](docs/images/nfcore-hic_logo.png)
# ![nf-core/hic](docs/images/nf-core-hic_logo.png)
**Analysis of Chromosome Conformation Capture data (Hi-C)**.
......@@ -70,8 +70,6 @@ sites ([`bowtie2`](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml))
nextflow run nf-core/hic -profile <docker/singularity/podman/conda/institute> --input '*_R{1,2}.fastq.gz' --genome GRCh37
```
See [usage docs](https://nf-co.re/hic/usage) for all of the available options when running the pipeline.
## Documentation
The nf-core/hic pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/hic/usage) and [output](https://nf-co.re/hic/output).
......@@ -87,9 +85,7 @@ nf-core/hic was originally written by Nicolas Servant.
If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).
For further information or help, don't hesitate to get in touch on the
[Slack `#hic` channel](https://nfcore.slack.com/channels/hic)
(you can join with [this invite](https://nf-co.re/join/slack)).
For further information or help, don't hesitate to get in touch on the [Slack `#hic` channel](https://nfcore.slack.com/channels/hic) (you can join with [this invite](https://nf-co.re/join/slack)).
## Citation
......@@ -100,8 +96,15 @@ You can cite the `nf-core` publication as follows:
> **The nf-core framework for community-curated bioinformatics pipelines.**
>
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg,
Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
>
> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).
> ReadCube: [Full Access Link](https://rdcu.be/b1GjZ)
In addition, references of tools and data used in this pipeline are as follows:
> **HiC-Pro: An optimized and flexible pipeline for Hi-C processing.**
>
> Nicolas Servant, Nelle Varoquaux, Bryan R. Lajoie, Eric Viara, Chongjian Chen, Jean-Philippe Vert, Job Dekker, Edith Heard, Emmanuel Barillot.
>
> Genome Biology 2015, 16:259 doi: [10.1186/s13059-015-0831-x](https://dx.doi.org/10.1186/s13059-015-0831-x)
......@@ -83,7 +83,6 @@ def sam_flag(read1, read2, hr1, hr2):
else:
r2_chrom="*"
##Relevant bitwise flags (flag in an 11-bit binary number)
##1 The read is one of a pair
##2 The alignment is one end of a proper paired-end alignment
......@@ -112,7 +111,6 @@ def sam_flag(read1, read2, hr1, hr2):
f2 = f2 | 0x1
f2 = f2 | 0x2
##Indicate if the pair is on the reverse strand
if f1 & 0x10:
f2 = f2 | 0x20
......@@ -215,24 +213,19 @@ if __name__ == "__main__":
## Loop on all reads
if verbose:
print("## Merging forward and reverse tags ...")
with pysam.Samfile(R1file, "rb") as hr1, pysam.Samfile(R2file, "rb") as hr2:
if output == "-":
outfile = pysam.AlignmentFile(output, "w", template=hr1)
else:
outfile = pysam.AlignmentFile(output, "wb", template=hr1)
for r1, r2 in zip(hr1.fetch(until_eof=True), hr2.fetch(until_eof=True)):
reads_counter +=1
#print r1
#print r2
#print hr1.getrname(r1.tid)
#print hr2.getrname(r2.tid)
if (reads_counter % 1000000 == 0 and verbose):
print("##", reads_counter)
if get_read_name(r1) == get_read_name(r2):
## both unmapped
if r1.is_unmapped == True and r2.is_unmapped == True:
unmapped_pairs_counter += 1
......@@ -252,7 +245,8 @@ if __name__ == "__main__":
multi_pairs_counter += 1
if report_multi == False:
continue
# one end mapped, other is not
## One mate maped
else:
singleton_counter += 1
if report_single == False:
......@@ -285,10 +279,6 @@ if __name__ == "__main__":
tot_pairs_counter += 1
(r1, r2) = sam_flag(r1,r2, hr1, hr2)
#print hr1.getrname(r1.tid)
#print hr2.getrname(r2.tid)
#print r1
#print r2
## Write output
outfile.write(r1)
outfile.write(r2)
......
......@@ -36,11 +36,6 @@ for k in list(results):
if not results[k]:
del results[k]
# Remove software set to false in results
for k in results:
if not results[k]:
del(results[k])
# Dump to YAML
print(
"""
......@@ -61,4 +56,3 @@ print(" </dl>")
with open("software_versions.csv", "w") as f:
for k, v in results.items():
f.write("{}\t{}\n".format(k, v))
......@@ -10,7 +10,6 @@
*/
process {
// nf-core: Check the defaults for all processes
cpus = { check_max( 1 * task.attempt, 'cpus' ) }
memory = { check_max( 7.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
......
/*
* -------------------------------------------------
* Nextflow config file for Genomes paths
* -------------------------------------------------
* Defines reference genomes
* Can be used by any config that customises the base
* path using $params.genomes_base / --genomes_base
*/
params {
// Alignment options
bwt2_opts_end2end = '--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder'
bwt2_opts_trimmed = '--very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder'
min_mapq = 10
// Digestion Hi-C
restriction_site = 'A^AGCTT'
ligation_site = 'AAGCTAGCTT'
min_restriction_fragment_size =
max_restriction_fragment_size =
min_insert_size =
max_insert_size =
// Hi-C Processing
min_cis_dist =
rm_singleton = true
rm_multi = true
rm_dup = true
bin_size = '1000000,500000'
ice_max_iter = 100
ice_filer_low_count_perc = 0.02
ice_filer_high_count_perc = 0
ice_eps = 0.1
}
......@@ -8,7 +8,6 @@
*/
params {
config_profile_name = 'Hi-C test data from Schalbetter et al. (2017)'
config_profile_description = 'Minimal test dataset to check pipeline function'
......@@ -25,12 +24,12 @@ config_profile_name = 'Hi-C test data from Schalbetter et al. (2017)'
// Annotations
fasta = 'https://github.com/nf-core/test-datasets/raw/hic/reference/W303_SGD_2015_JRIU00000000.fsa'
digestion = 'hindiii'
min_mapq = 20
min_mapq = 10
min_restriction_fragment_size = 100
max_restriction_fragment_size = 100000
min_insert_size = 100
max_insert_size = 600
// Options
skip_cool = true
//hicexplorer does not run
skip_dist_decay = true
}
......@@ -11,6 +11,8 @@ params {
config_profile_name = 'Full test profile'
config_profile_description = 'Full test dataset to check pipeline function'
// TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
// TODO nf-core: Give any required params for the test so that command line flags are not needed
// Input data for full size test
input_paths = [
['SRR4292758_00', ['https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R2.fastq.gz']]
......
......@@ -3,11 +3,8 @@
The nf-core/hic documentation is split into the following pages:
* [Usage](usage.md)
* An overview of how the pipeline works, how to run it and a
description of all of the different command-line flags.
* An overview of how the pipeline works, how to run it and a description of all of the different command-line flags.
* [Output](output.md)
* An overview of the different results produced by the pipeline
and how to interpret them.
* An overview of the different results produced by the pipeline and how to interpret them.
You can find a lot more documentation about installing, configuring
and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re)
You can find a lot more documentation about installing, configuring and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re)
......@@ -7,10 +7,7 @@
## Introduction
This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
The directories listed below will be created in the results directory
after the pipeline has finished. All paths are relative to the top-level
results directory.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
## Pipeline overview
......
......@@ -28,12 +28,7 @@ results # Finished results (configurable, see below)
### Updating the pipeline
When you run the above command, Nextflow automatically pulls the pipeline code
from GitHub and stores it as a cached version. When running the pipeline after
this, it will always use the cached version if available - even if the pipeline
has been updated since. To make sure that you're running the latest version of
the pipeline, make sure that you regularly update the cached version of the
pipeline:
When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:
```bash
nextflow pull nf-core/hic
......@@ -41,17 +36,7 @@ nextflow pull nf-core/hic
### Reproducibility
It's a good idea to specify a pipeline version when running the pipeline on
your data. This ensures that a specific version of the pipeline code and
software are used when you run your pipeline. If you keep using the same tag,
you'll be running the same version of the pipeline, even if there have been
changes to the code since.
It's a good idea to specify a pipeline version when running the pipeline on
your data. This ensures that a specific version of the pipeline code and
software are used when you run your pipeline. If you keep using the same tag,
you'll be running the same version of the pipeline, even if there have been
changes to the code since.
It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since.
First, go to the
[nf-core/hic releases page](https://github.com/nf-core/hic/releases) and find
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment