Skip to content
Snippets Groups Projects
Unverified Commit 66ca62fb authored by Nicolas Servant's avatar Nicolas Servant Committed by GitHub
Browse files

Merge pull request #22 from nservant/dev

mv python env to 2.15
parents 73dca880 cbec39f3
No related branches found
No related tags found
No related merge requests found
Showing with 122 additions and 63 deletions
......@@ -6,7 +6,9 @@ We try to manage the required tasks for nf-core/hic using GitHub issues, you pro
However, don't be put off by this template - other more general issues and suggestions are welcome! Contributions to the code are even more welcome ;)
> If you need help using or modifying nf-core/hic then the best place to go is the Gitter chatroom where you can ask us questions directly: https://gitter.im/nf-core/Lobby
> If you need help using or modifying nf-core/hic then the best place to ask is on the pipeline channel on [Slack](https://nf-core-invite.herokuapp.com/).
## Contribution workflow
If you'd like to write some code for nf-core/hic, the standard workflow
......@@ -42,4 +44,4 @@ If there are any failures then the automated tests fail.
These tests are run both with the latest available version of Nextflow and also the minimum required version that is stated in the pipeline code.
## Getting help
For further information/help, please consult the [nf-core/hic documentation](https://github.com/nf-core/hic#documentation) and don't hesitate to get in touch on [Gitter](https://gitter.im/nf-core/Lobby)
For further information/help, please consult the [nf-core/hic documentation](https://github.com/nf-core/hic#documentation) and don't hesitate to get in touch on the pipeline channel on [Slack](https://nf-core-invite.herokuapp.com/).
# Markdownlint configuration file
default: true,
line-length: false
no-multiple-blanks: 0
blanks-around-headers: false
blanks-around-lists: false
header-increment: false
no-duplicate-header:
siblings_only: true
......@@ -4,3 +4,4 @@ data/
results/
.DS_Store
tests/test_data
*.pyc
......@@ -13,7 +13,8 @@ before_install:
# Pull the docker image first so the test doesn't wait for this
- docker pull nfcore/hic:dev
# Fake the tag locally so that the pipeline runs properly
- docker tag nfcore/hic:dev nfcore/hic:dev
# Looks weird when this is :dev to :dev, but makes sense when testing code for a release (:dev to :1.0.1)
- docker tag nfcore/hic:dev nfcore/hic:1.0.0
install:
# Install Nextflow
......@@ -25,12 +26,17 @@ install:
- pip install nf-core
# Reset
- mkdir ${TRAVIS_BUILD_DIR}/tests && cd ${TRAVIS_BUILD_DIR}/tests
# Install markdownlint-cli
- sudo apt-get install npm && npm install -g markdownlint-cli
env:
- NXF_VER='0.32.0' # Specify a minimum NF version that should be tested and work
- NXF_VER='' # Plus: get the latest NF version and check that it works
script:
# Lint the pipeline code
- nf-core lint ${TRAVIS_BUILD_DIR}
# Lint the documentation
- markdownlint ${TRAVIS_BUILD_DIR} -c ${TRAVIS_BUILD_DIR}/.github/markdownlint.yml
# Run the pipeline with the test profile
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker
......@@ -2,14 +2,15 @@
## v1.0dev - 2019-04-09
First version of nf-core-hic pipeline which is a Nextflow implementation of the HiC-Pro pipeline [https://github.com/nservant/HiC-Pro].
Note that all HiC-Pro functionalities are not yet all implemented. The current version is designed for protocols based on restriction enzyme digestion.
In summary, this version allows :
* Automatic detection and generation of annotation files based on igenomes if not provided.
* Two-steps alignment of raw sequencing reads
* Reads filtering and detection of valid interaction products
* Generation of raw contact matrices for a set of resolutions
* Normalization of the contact maps using the ICE algorithm
* Generation of cooler file for visualization on higlass [https://higlass.io/]
* Quality report based on HiC-Pro MultiQC module
First version of nf-core-hic pipeline which is a Nextflow implementation of the [HiC-Pro pipeline](https://github.com/nservant/HiC-Pro/).
Note that all HiC-Pro functionalities are not yet all implemented. The current version is designed for protocols based on restriction enzyme digestion.
In summary, this version allows :
* Automatic detection and generation of annotation files based on igenomes if not provided.
* Two-steps alignment of raw sequencing reads
* Reads filtering and detection of valid interaction products
* Generation of raw contact matrices for a set of resolutions
* Normalization of the contact maps using the ICE algorithm
* Generation of cooler file for visualization on [higlass](https://higlass.io/)
* Quality report based on HiC-Pro MultiQC module
......@@ -34,7 +34,7 @@ This Code of Conduct applies both within project spaces and in public spaces whe
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on the [Gitter channel](https://gitter.im/nf-core/Lobby). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-core-invite.herokuapp.com/). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
......
......@@ -7,4 +7,4 @@ RUN apt-get update && apt-get install -y gcc g++ && apt-get clean -y
COPY environment.yml /
RUN conda env create -f /environment.yml && conda clean -a
ENV PATH /opt/conda/envs/nf-core-hic-1.0dev/bin:$PATH
ENV PATH /opt/conda/envs/nf-core-hic-1.0.0/bin:$PATH
# ![nf-core/hic](docs/images/nfcore-hic_logo.png)
# nf-core/hic
**Analysis of Chromosome Conformation Capture data (Hi-C)**
**Analysis of Chromosome Conformation Capture data (Hi-C)**.
[![Build Status](https://travis-ci.org/nf-core/hic.svg?branch=master)](https://travis-ci.org/nf-core/hic)
[![Build Status](https://travis-ci.com/nf-core/hic.svg?branch=master)](https://travis-ci.com/nf-core/hic)
[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A50.32.0-brightgreen.svg)](https://www.nextflow.io/)
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/)
......
......@@ -17,23 +17,6 @@ ${errorReport}
} %>
<% if (!success){
out << """####################################################
## nf-core/hic execution completed unsuccessfully! ##
####################################################
The exit status of the task that caused the workflow execution to fail was: $exitStatus.
The full error message was:
${errorReport}
"""
} else {
out << "## nf-core/hic execution completed successfully! ##"
}
%>
The workflow was completed at $dateComplete (duration: $duration)
The command used to launch the workflow was as follows:
......
report_comment: >
This report has been generated by the <a href="https://github.com/nf-core/hic" target="_blank">nf-core/hic</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://github.com/nf-core/hic" target="_blank">documentation</a>.
report_section_order:
nf-core/hic-software-versions:
order: -1000
export_plots: true
To: $email
Subject: $subject
Mime-Version: 1.0
Content-Type: multipart/related;boundary="nfmimeboundary"
Content-Type: multipart/related;boundary="nfcoremimeboundary"
--nfmimeboundary
--nfcoremimeboundary
Content-Type: text/html; charset=utf-8
$email_html
--nfmimeboundary--
<%
if (mqcFile){
def mqcFileObj = new File("$mqcFile")
if (mqcFileObj.length() < mqcMaxSize){
out << """
--nfcoremimeboundary
Content-Type: text/html; name=\"multiqc_report\"
Content-Transfer-Encoding: base64
Content-ID: <mqcreport>
Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\"
${mqcFileObj.
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' )}
"""
}}
%>
--nfcoremimeboundary--
......@@ -3,14 +3,22 @@ from __future__ import print_function
from collections import OrderedDict
import re
# TODO nf-core: Add additional regexes for new tools in process get_software_versions
# Add additional regexes for new tools in process get_software_versions
regexes = {
'nf-core/hic': ['v_pipeline.txt', r"(\S+)"],
'Nextflow': ['v_nextflow.txt', r"(\S+)"],
'Bowtie2': ['v_bowtie2.txt', r"Bowtie2 v(\S+)"],
'Python': ['v_python.txt', r"Python v(\S+)"],
'Samtools': ['v_samtools.txt', r"Samtools v(\S+)"],
'MultiQC': ['v_multiqc.txt', r"multiqc, version (\S+)"],
}
results = OrderedDict()
results['nf-core/hic'] = '<span style="color:#999999;\">N/A</span>'
results['Nextflow'] = '<span style="color:#999999;\">N/A</span>'
results['Bowtie2'] = '<span style="color:#999999;\">N/A</span>'
results['Python'] = '<span style="color:#999999;\">N/A</span>'
results['Samtools'] = '<span style="color:#999999;\">N/A</span>'
results['MultiQC'] = '<span style="color:#999999;\">N/A</span>'
# Search each file using its regex
for k, v in regexes.items():
......@@ -20,9 +28,14 @@ for k, v in regexes.items():
if match:
results[k] = "v{}".format(match.group(1))
# Remove software set to false in results
for k in results:
if not results[k]:
del(results[k])
# Dump to YAML
print ('''
id: 'nf-core/hic-software-versions'
id: 'software_versions'
section_name: 'nf-core/hic Software Versions'
section_href: 'https://github.com/nf-core/hic'
plot_type: 'html'
......@@ -31,5 +44,10 @@ data: |
<dl class="dl-horizontal">
''')
for k,v in results.items():
print(" <dt>{}</dt><dd>{}</dd>".format(k,v))
print(" <dt>{}</dt><dd><samp>{}</samp></dd>".format(k,v))
print (" </dl>")
# Write out regexes as csv file:
with open('software_versions.csv', 'w') as f:
for k,v in results.items():
f.write("{}\t{}\n".format(k,v))
/*
* -------------------------------------------------
* Nextflow config file for AWS Batch
* Nextflow config file for running on AWS batch
* -------------------------------------------------
* Imported under the 'awsbatch' Nextflow profile in nextflow.config
* Uses docker for software depedencies automagically, so not specified here.
* Base config needed for running with -profile awsbatch
*/
params {
config_profile_name = 'AWSBATCH'
config_profile_description = 'AWSBATCH Cloud Profile'
config_profile_contact = 'Alexander Peltzer (@apeltzer)'
config_profile_url = 'https://aws.amazon.com/de/batch/'
}
aws.region = params.awsregion
process.executor = 'awsbatch'
......
/*
* -------------------------------------------------
* Nextflow base config file
* nf-core/hic Nextflow base config file
* -------------------------------------------------
* A 'blank slate' config file, appropriate for general
* use on most high performace compute environments.
......@@ -11,13 +11,12 @@
process {
container = process.container
cpus = { check_max( 2, 'cpus' ) }
// Check the defaults for all processes
cpus = { check_max( 1 * task.attempt, 'cpus' ) }
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
time = { check_max( 2.h * task.attempt, 'time' ) }
errorStrategy = { task.exitStatus in [1,143,137,104,134,139] ? 'retry' : 'terminate' }
errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }
maxRetries = 1
maxErrors = '-1'
......@@ -25,7 +24,7 @@ process {
withName:makeBowtie2Index {
cpus = { check_max( 1, 'cpus' ) }
memory = { check_max( 10.GB * task.attempt, 'memory' ) }
time = { check_max( 12.h * task.attempt, 'time' ) }
time = { check_max( 12.h * task.attempt, 'time' ) }
}
withName:bowtie2_end_to_end {
cpus = { check_max( 4, 'cpus' ) }
......
......@@ -60,7 +60,7 @@ params {
}
'Gm01' {
fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa"
bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/genome"
bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/genome"
}
'Mmul_1' {
fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa"
......@@ -96,7 +96,7 @@ params {
}
'AGPv3' {
fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa"
bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/genome"
bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/genome"
}
}
}
......@@ -16,7 +16,7 @@ params {
max_cpus = 2
max_memory = 4.GB
max_time = 1.h
// Input data
readPaths = [
['SRR4292758_00', ['https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R2.fastq.gz']]
......@@ -31,4 +31,3 @@ params {
// Options
skip_cool = true
}
......@@ -10,6 +10,7 @@ Nextflow has [excellent integration](https://www.nextflow.io/docs/latest/docker.
First, install docker on your system: [Docker Installation Instructions](https://docs.docker.com/engine/installation/)
Then, simply run the analysis pipeline:
```bash
nextflow run nf-core/hic -profile docker --genome '<genome ID>'
```
......
......@@ -39,11 +39,12 @@ Multiple reference index types are held together with consistent structure for m
We have put a copy of iGenomes up onto AWS S3 hosting and this pipeline is configured to use this by default.
The hosting fees for AWS iGenomes are currently kindly funded by a grant from Amazon.
The pipeline will automatically download the required reference files when you run the pipeline.
For more information about the AWS iGenomes, see https://ewels.github.io/AWS-iGenomes/
For more information about the AWS iGenomes, see [AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/)
Downloading the files takes time and bandwidth, so we recommend making a local copy of the iGenomes resource.
Once downloaded, you can customise the variable `params.igenomes_base` in your custom configuration file to point to the reference location.
For example:
```nextflow
params.igenomes_base = '/path/to/data/igenomes/'
```
......@@ -74,7 +74,7 @@ Be warned of two important points about this default configuration:
#### 3.1) Software deps: Docker
First, install docker on your system: [Docker Installation Instructions](https://docs.docker.com/engine/installation/)
Then, running the pipeline with the option `-profile docker` tells Nextflow to enable Docker for this run. An image containing all of the software requirements will be automatically fetched and used from dockerhub (https://hub.docker.com/r/nfcore/hic).
Then, running the pipeline with the option `-profile docker` tells Nextflow to enable Docker for this run. An image containing all of the software requirements will be automatically fetched and used from [dockerhub](https://hub.docker.com/r/nfcore/hic).
#### 3.1) Software deps: Singularity
If you're not able to use Docker then [Singularity](http://singularity.lbl.gov/) is a great alternative.
......
......@@ -26,7 +26,7 @@ Singletons are discarded, and multi-hits are filtered according to the configura
Note that if the `--dnase` mode is activated, HiC-Pro will skip the second mapping step.
**Output directory: `results/mapping`**
* `*bwt2pairs.bam` - final BAM file with aligned paired data
* `*.pairstat` - mapping statistics
......@@ -50,7 +50,7 @@ Invalid pairs are classified as follow:
* Dangling end, i.e. unligated fragments (both reads mapped on the same restriction fragment)
* Self circles, i.e. fragments ligated on themselves (both reads mapped on the same restriction fragment in inverted orientation)
* Religation, i.e. ligation of juxtaposed fragments
* Filtered pairs, i.e. any pairs that do not match the filtering criteria on inserts size, restriction fragments size
* Filtered pairs, i.e. any pairs that do not match the filtering criteria on inserts size, restriction fragments size
* Dumped pairs, i.e. any pairs for which we were not able to reconstruct the ligation product.
Only valid pairs involving two different restriction fragments are used to build the contact maps.
......@@ -59,12 +59,12 @@ Duplicated valid pairs associated to PCR artefacts are discarded (see `--rm_dup`
In case of Hi-C protocols that do not require a restriction enzyme such as DNase Hi-C or micro Hi-C, the assignment to a restriction is not possible (see `--dnase`).
Short range interactions that are likely to be spurious ligation products can thus be discarded using the `--min_cis_dist` parameter.
* `*.validPairs` - List of valid ligation products
* `*.validPairs` - List of valid ligation products
* `*RSstat` - Statitics of number of read pairs falling in each category
The validPairs are stored using a simple tab-delimited text format ;
```
```bash
read name / chr_reads1 / pos_reads1 / strand_reads1 / chr_reads2 / pos_reads2 / strand_reads2 / fragment_size / res frag name R1 / res frag R2 / mapping qual R1 / mapping qual R2 [/ allele_specific_tag]
```
......@@ -102,7 +102,7 @@ A contact map is defined by :
Based on the observation that a contact map is symmetric and usually sparse, only non-zero values are stored for half of the matrix. The user can specified if the 'upper', 'lower' or 'complete' matrix has to be stored. The 'asis' option allows to store the contacts as they are observed from the valid pairs files.
```
```bash
A B 10
A C 23
B C 24
......@@ -124,4 +124,4 @@ The pipeline has special steps which allow the software versions used to be repo
* `Project_multiqc_data/`
* Directory containing parsed statistics from the different tools used in the pipeline
For more information about how to use MultiQC reports, see http://multiqc.info
For more information about how to use MultiQC reports, see [http://multiqc.info](http://multiqc.info)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment