Skip to content
Snippets Groups Projects
Commit 3d578833 authored by nservant's avatar nservant
Browse files

update for markdown lint

parent 4a75b6b6
Branches
Tags
No related merge requests found
...@@ -11,17 +11,27 @@ ...@@ -11,17 +11,27 @@
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2669513.svg)](https://doi.org/10.5281/zenodo.2669513) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2669513.svg)](https://doi.org/10.5281/zenodo.2669513)
## Introduction
### Introduction This pipeline is based on the
This pipeline is based on the [HiC-Pro workflow](https://github.com/nservant/HiC-Pro). [HiC-Pro workflow](https://github.com/nservant/HiC-Pro).
It was designed to process Hi-C data from raw fastq files (paired-end Illumina data) to normalized contact maps. It was designed to process Hi-C data from raw fastq files (paired-end Illumina
The current version supports most protocols, including digestion protocols as well as protocols that do not require restriction enzymes such as DNase Hi-C. data) to normalized contact maps.
In practice, this workflow was successfully applied to many data-sets including dilution Hi-C, in situ Hi-C, DNase Hi-C, Micro-C, capture-C, capture Hi-C or HiChip data. The current version supports most protocols, including digestion protocols as
well as protocols that do not require restriction enzymes such as DNase Hi-C.
In practice, this workflow was successfully applied to many data-sets including
dilution Hi-C, in situ Hi-C, DNase Hi-C, Micro-C, capture-C, capture Hi-C or
HiChip data.
The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker / singularity containers making installation trivial and results highly reproducible. The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
to run tasks across multiple compute infrastructures in a very portable manner.
It comes with docker / singularity containers making installation trivial and
results highly reproducible.
### Pipeline summary ## Pipeline summary
1. Mapping using a two steps strategy to rescue reads spanning the ligation sites (bowtie2)
1. Mapping using a two steps strategy to rescue reads spanning the ligation
sites (bowtie2)
2. Detection of valid interaction products 2. Detection of valid interaction products
3. Duplicates removal 3. Duplicates removal
4. Create genome-wide contact maps at various resolution 4. Create genome-wide contact maps at various resolution
...@@ -29,22 +39,29 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool ...@@ -29,22 +39,29 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
6. Quality controls and report (MultiQC) 6. Quality controls and report (MultiQC)
7. Addition export for visualisation and downstream analysis (cooler) 7. Addition export for visualisation and downstream analysis (cooler)
### Documentation ## Documentation
The nf-core/hic pipeline comes with documentation about the pipeline, found in the `docs/` directory:
The nf-core/hic pipeline comes with documentation about the pipeline, found in
the `docs/` directory:
1. [Installation](docs/installation.md) 1. [Installation](docs/installation.md)
2. Pipeline configuration 2. Pipeline configuration
* [Local installation](docs/configuration/local.md) * [Local installation](docs/configuration/local.md)
* [Adding your own system](docs/configuration/adding_your_own.md) * [Adding your own system](docs/configuration/adding_your_own.md)
* [Reference genomes](docs/configuration/reference_genomes.md) * [Reference genomes](docs/configuration/reference_genomes.md)
3. [Running the pipeline](docs/usage.md) 3. [Running the pipeline](docs/usage.md)
4. [Output and how to interpret the results](docs/output.md) 4. [Output and how to interpret the results](docs/output.md)
5. [Troubleshooting](docs/troubleshooting.md) 5. [Troubleshooting](docs/troubleshooting.md)
### Credits ## Credits
nf-core/hic was originally written by Nicolas Servant. nf-core/hic was originally written by Nicolas Servant.
If you use nf-core/hic for your analysis, please cite it using the following doi: [10.5281/zenodo.2669513](https://doi.org/10.5281/zenodo.2669513) If you use nf-core/hic for your analysis, please cite it using the following
doi: [10.5281/zenodo.2669513](https://doi.org/10.5281/zenodo.2669513)
You can cite the `nf-core` pre-print as follows: You can cite the `nf-core` pre-print as follows:
Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. **nf-core: Community curated bioinformatics pipelines**. *bioRxiv*. 2019. p. 610741. [doi: 10.1101/610741](https://www.biorxiv.org/content/10.1101/610741v1). Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di
\ No newline at end of file Tommaso P, Nahnsen S. **nf-core: Community curated bioinformatics pipelines**.
*bioRxiv*. 2019. p. 610741.
[doi: 10.1101/610741](https://www.biorxiv.org/content/10.1101/610741v1).
...@@ -14,7 +14,9 @@ To start using the nf-core/hic pipeline, follow the steps below: ...@@ -14,7 +14,9 @@ To start using the nf-core/hic pipeline, follow the steps below:
4. [Reference genomes](#4-reference-genomes) 4. [Reference genomes](#4-reference-genomes)
## 1) Install NextFlow ## 1) Install NextFlow
Nextflow runs on most POSIX systems (Linux, Mac OSX etc). It can be installed by running the following commands:
Nextflow runs on most POSIX systems (Linux, Mac OSX etc). It can be installed
by running the following commands:
```bash ```bash
# Make sure that Java v8+ is installed: # Make sure that Java v8+ is installed:
...@@ -29,15 +31,21 @@ mv nextflow ~/bin/ ...@@ -29,15 +31,21 @@ mv nextflow ~/bin/
# sudo mv nextflow /usr/local/bin # sudo mv nextflow /usr/local/bin
``` ```
See [nextflow.io](https://www.nextflow.io/) for further instructions on how to install and configure Nextflow. See [nextflow.io](https://www.nextflow.io/) for further instructions on how to
install and configure Nextflow.
## 2) Install the pipeline ## 2) Install the pipeline
#### 2.1) Automatic ### 2.1) Automatic
This pipeline itself needs no installation - NextFlow will automatically fetch it from GitHub if `nf-core/hic` is specified as the pipeline name.
This pipeline itself needs no installation - NextFlow will automatically fetch
it from GitHub if `nf-core/hic` is specified as the pipeline name.
### 2.2) Offline
#### 2.2) Offline The above method requires an internet connection so that Nextflow can download
The above method requires an internet connection so that Nextflow can download the pipeline files. If you're running on a system that has no internet connection, you'll need to download and transfer the pipeline files manually: the pipeline files. If you're running on a system that has no internet
connection, you'll need to download and transfer the pipeline files manually:
```bash ```bash
wget https://github.com/nf-core/hic/archive/master.zip wget https://github.com/nf-core/hic/archive/master.zip
...@@ -47,61 +55,91 @@ cd ~/my_data/ ...@@ -47,61 +55,91 @@ cd ~/my_data/
nextflow run ~/my-pipelines/nf-core/hic-master nextflow run ~/my-pipelines/nf-core/hic-master
``` ```
To stop nextflow from looking for updates online, you can tell it to run in offline mode by specifying the following environment variable in your ~/.bashrc file: To stop nextflow from looking for updates online, you can tell it to run in
offline mode by specifying the following environment variable in your
~/.bashrc file:
```bash ```bash
export NXF_OFFLINE='TRUE' export NXF_OFFLINE='TRUE'
``` ```
#### 2.3) Development ### 2.3) Development
If you would like to make changes to the pipeline, it's best to make a fork on GitHub and then clone the files. Once cloned you can run the pipeline directly as above.
If you would like to make changes to the pipeline, it's best to make a fork on
GitHub and then clone the files. Once cloned you can run the pipeline directly
as above.
## 3) Pipeline configuration ## 3) Pipeline configuration
By default, the pipeline loads a basic server configuration [`conf/base.config`](../conf/base.config)
This uses a number of sensible defaults for process requirements and is suitable for running By default, the pipeline loads a basic server configuration
on a simple (if powerful!) local server. [`conf/base.config`](../conf/base.config)
This uses a number of sensible defaults for process requirements and is
suitable for running on a simple (if powerful!) local server.
Be warned of two important points about this default configuration: Be warned of two important points about this default configuration:
1. The default profile uses the `local` executor 1. The default profile uses the `local` executor
* All jobs are run in the login session. If you're using a simple server, this may be fine. If you're using a compute cluster, this is bad as all jobs will run on the head node. * All jobs are run in the login session. If you're using a simple server,
* See the [nextflow docs](https://www.nextflow.io/docs/latest/executor.html) for information about running with other hardware backends. Most job scheduler systems are natively supported. this may be fine. If you're using a compute cluster, this is bad as all jobs
2. Nextflow will expect all software to be installed and available on the `PATH` will run on the head node.
* It's expected to use an additional config profile for docker, singularity or conda support. See below. * See the
[nextflow docs](https://www.nextflow.io/docs/latest/executor.html) for
#### 3.1) Software deps: Docker information about running with other hardware backends. Most job scheduler
First, install docker on your system: [Docker Installation Instructions](https://docs.docker.com/engine/installation/) systems are natively supported.
2. Nextflow will expect all software to be installed and available on the
Then, running the pipeline with the option `-profile docker` tells Nextflow to enable Docker for this run. An image containing all of the software requirements will be automatically fetched and used from [dockerhub](https://hub.docker.com/r/nfcore/hic). `PATH`
* It's expected to use an additional config profile for docker, singularity
#### 3.1) Software deps: Singularity or conda support. See below.
If you're not able to use Docker then [Singularity](http://singularity.lbl.gov/) is a great alternative.
The process is very similar: running the pipeline with the option `-profile singularity` tells Nextflow to enable singularity for this run. An image containing all of the software requirements will be automatically fetched and used from singularity hub. ### 3.1) Software deps: Docker
If running offline with Singularity, you'll need to download and transfer the Singularity image first: First, install docker on your system:
[Docker Installation Instructions](https://docs.docker.com/engine/installation/)
Then, running the pipeline with the option `-profile docker` tells Nextflow to
enable Docker for this run. An image containing all of the software
requirements will be automatically fetched and used from
[dockerhub](https://hub.docker.com/r/nfcore/hic).
### 3.1) Software deps: Singularity
If you're not able to use Docker then
[Singularity](http://singularity.lbl.gov/) is a great alternative.
The process is very similar: running the pipeline with the option
`-profile singularity` tells Nextflow to enable singularity for this run.
An image containing all of the software requirements will be automatically
fetched and used from singularity hub.
If running offline with Singularity, you'll need to download and transfer the
Singularity image first:
```bash ```bash
singularity pull --name nf-core-hic.simg shub://nf-core/hic singularity pull --name nf-core-hic.simg shub://nf-core/hic
``` ```
Once transferred, use `-with-singularity` and specify the path to the image file: Once transferred, use `-with-singularity` and specify the path to the image
file:
```bash ```bash
nextflow run /path/to/nf-core-hic -with-singularity nf-core-hic.simg nextflow run /path/to/nf-core-hic -with-singularity nf-core-hic.simg
``` ```
Remember to pull updated versions of the singularity image if you update the pipeline. Remember to pull updated versions of the singularity image if you update the
pipeline.
### 3.2) Software deps: conda
#### 3.2) Software deps: conda If you're not able to use Docker _or_ Singularity, you can instead use conda to
If you're not able to use Docker _or_ Singularity, you can instead use conda to manage the software requirements. manage the software requirements.
This is slower and less reproducible than the above, but is still better than having to install all requirements yourself! This is slower and less reproducible than the above, but is still better than
The pipeline ships with a conda environment file and nextflow has built-in support for this. having to install all requirements yourself!
To use it first ensure that you have conda installed (we recommend [miniconda](https://conda.io/miniconda.html)), then follow the same pattern as above and use the flag `-profile conda` The pipeline ships with a conda environment file and nextflow has built-in
support for this.
To use it first ensure that you have conda installed (we recommend
[miniconda](https://conda.io/miniconda.html)), then follow the same pattern
as above and use the flag `-profile conda`
#### 3.3) Configuration profiles ### 3.3) Configuration profiles
See [`docs/configuration/adding_your_own.md`](configuration/adding_your_own.md) See [`docs/configuration/adding_your_own.md`](configuration/adding_your_own.md)
......
# nf-core/hic: Output # nf-core/hic: Output
This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. This document describes the output produced by the pipeline. Most of the plots
are taken from the MultiQC report, which summarises results at the end of the
pipeline.
## Pipeline overview ## Pipeline overview
The pipeline is built using [Nextflow](https://www.nextflow.io/) The pipeline is built using [Nextflow](https://www.nextflow.io/)
and processes data using the following steps: and processes data using the following steps:
...@@ -10,27 +13,38 @@ and processes data using the following steps: ...@@ -10,27 +13,38 @@ and processes data using the following steps:
* [Valid pairs detection](#valid-pairs-detection) * [Valid pairs detection](#valid-pairs-detection)
* [Duplicates removal](#duplicates-removal) * [Duplicates removal](#duplicates-removal)
* [Contact maps](#contact-maps) * [Contact maps](#contact-maps)
* [MultiQC](#multiqc) - aggregate report and quality controls, describing results of the whole pipeline * [MultiQC](#multiqc) - aggregate report and quality controls, describing
* [Export](#exprot) - additionnal export for compatibility with downstream analysis tool and visualization results of the whole pipeline
* [Export](#exprot) - additionnal export for compatibility with downstream
analysis tool and visualization
The current version is mainly based on the [HiC-Pro](https://github.com/nservant/HiC-Pro) pipeline. The current version is mainly based on the
For details about the workflow, see [Servant et al. 2015](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0831-x) [HiC-Pro](https://github.com/nservant/HiC-Pro) pipeline.
For details about the workflow, see
[Servant et al. 2015](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0831-x)
## Reads alignment ## Reads alignment
Using Hi-C data, each reads mate has to be independantly aligned on the reference genome. Using Hi-C data, each reads mate has to be independantly aligned on the
The current workflow implements a two steps mapping strategy. First, the reads are aligned using an end-to-end aligner. reference genome.
Second, reads spanning the ligation junction are trimmmed from their 3' end, and aligned back on the genome. The current workflow implements a two steps mapping strategy. First, the reads
Aligned reads for both fragment mates are then paired in a single paired-end BAM file. are aligned using an end-to-end aligner.
Singletons are discarded, and multi-hits are filtered according to the configuration parameters (`--rm-multi`). Second, reads spanning the ligation junction are trimmmed from their 3' end,
Note that if the `--dnase` mode is activated, HiC-Pro will skip the second mapping step. and aligned back on the genome.
Aligned reads for both fragment mates are then paired in a single paired-end
BAM file.
Singletons are discarded, and multi-hits are filtered according to the
configuration parameters (`--rm-multi`).
Note that if the `--dnase` mode is activated, HiC-Pro will skip the second
mapping step.
**Output directory: `results/mapping`** **Output directory: `results/mapping`**
* `*bwt2pairs.bam` - final BAM file with aligned paired data * `*bwt2pairs.bam` - final BAM file with aligned paired data
* `*.pairstat` - mapping statistics * `*.pairstat` - mapping statistics
if `--saveAlignedIntermediates` is specified, additional mapping file results are available ; if `--saveAlignedIntermediates` is specified, additional mapping file results
are available ;
* `*.bam` - Aligned reads (R1 and R2) from end-to-end alignment * `*.bam` - Aligned reads (R1 and R2) from end-to-end alignment
* `*_unmap.fastq` - Unmapped reads after end-to-end alignment * `*_unmap.fastq` - Unmapped reads after end-to-end alignment
...@@ -39,68 +53,117 @@ if `--saveAlignedIntermediates` is specified, additional mapping file results ar ...@@ -39,68 +53,117 @@ if `--saveAlignedIntermediates` is specified, additional mapping file results ar
* `*bwt2merged.bam` - merged BAM file after the two-steps alignment * `*bwt2merged.bam` - merged BAM file after the two-steps alignment
* `*.mapstat` - mapping statistics per read mate * `*.mapstat` - mapping statistics per read mate
Usually, a high fraction of reads is expected to be aligned on the genome (80-90%). Among them, we usually observed a few percent (around 10%) of step 2 aligned reads. Those reads are chimeric fragments for which we detect a ligation junction. An abnormal level of chimeric reads can reflect a ligation issue during the library preparation. Usually, a high fraction of reads is expected to be aligned on the genome
The fraction of singleton or multi-hits depends on the genome complexity and the fraction of unmapped reads. The fraction of singleton is usually close to the sum of unmapped R1 and R2 reads, as it is unlikely that both mates from the same pair were unmapped. (80-90%). Among them, we usually observed a few percent (around 10%) of step 2
aligned reads. Those reads are chimeric fragments for which we detect a
ligation junction. An abnormal level of chimeric reads can reflect a ligation
issue during the library preparation.
The fraction of singleton or multi-hits depends on the genome complexity and
the fraction of unmapped reads. The fraction of singleton is usually close to
the sum of unmapped R1 and R2 reads, as it is unlikely that both mates from the
same pair were unmapped.
## Valid pairs detection ## Valid pairs detection
Each aligned reads can be assigned to one restriction fragment according to the reference genome and the digestion protocol. Each aligned reads can be assigned to one restriction fragment according to the
reference genome and the digestion protocol.
Invalid pairs are classified as follow: Invalid pairs are classified as follow:
* Dangling end, i.e. unligated fragments (both reads mapped on the same restriction fragment)
* Self circles, i.e. fragments ligated on themselves (both reads mapped on the same restriction fragment in inverted orientation)
* Religation, i.e. ligation of juxtaposed fragments
* Filtered pairs, i.e. any pairs that do not match the filtering criteria on inserts size, restriction fragments size
* Dumped pairs, i.e. any pairs for which we were not able to reconstruct the ligation product.
Only valid pairs involving two different restriction fragments are used to build the contact maps.
Duplicated valid pairs associated to PCR artefacts are discarded (see `--rm_dup`.
In case of Hi-C protocols that do not require a restriction enzyme such as DNase Hi-C or micro Hi-C, the assignment to a restriction is not possible (see `--dnase`). * Dangling end, i.e. unligated fragments (both reads mapped on the same
Short range interactions that are likely to be spurious ligation products can thus be discarded using the `--min_cis_dist` parameter. restriction fragment)
* Self circles, i.e. fragments ligated on themselves (both reads mapped on the
same restriction fragment in inverted orientation)
* Religation, i.e. ligation of juxtaposed fragments
* Filtered pairs, i.e. any pairs that do not match the filtering criteria on
inserts size, restriction fragments size
* Dumped pairs, i.e. any pairs for which we were not able to reconstruct the
ligation product.
Only valid pairs involving two different restriction fragments are used to
build the contact maps.
Duplicated valid pairs associated to PCR artefacts are discarded
(see `--rm_dup`).
In case of Hi-C protocols that do not require a restriction enzyme such as
DNase Hi-C or micro Hi-C, the assignment to a restriction is not possible
(see `--dnase`).
Short range interactions that are likely to be spurious ligation products
can thus be discarded using the `--min_cis_dist` parameter.
* `*.validPairs` - List of valid ligation products * `*.validPairs` - List of valid ligation products
* `*.DEpairs` - List of dangling-end products
* `*.SCPairs` - List of self-circle products
* `*.REPairs` - List of religation products
* `*.FiltPairs` - List of filtered pairs
* `*RSstat` - Statitics of number of read pairs falling in each category * `*RSstat` - Statitics of number of read pairs falling in each category
The validPairs are stored using a simple tab-delimited text format ; The validPairs are stored using a simple tab-delimited text format ;
```bash ```bash
read name / chr_reads1 / pos_reads1 / strand_reads1 / chr_reads2 / pos_reads2 / strand_reads2 / fragment_size / res frag name R1 / res frag R2 / mapping qual R1 / mapping qual R2 [/ allele_specific_tag] read name / chr_reads1 / pos_reads1 / strand_reads1 / chr_reads2 / pos_reads2 /
strand_reads2 / fragment_size / res frag name R1 / res frag R2 / mapping qual R1
/ mapping qual R2 [/ allele_specific_tag]
``` ```
The ligation efficiency can be assessed using the filtering of valid and invalid pairs. As the ligation is a random process, 25% of each valid ligation class is expected. In the same way, a high level of dangling-end or self-circle read pairs is associated with a low quality experiment, and reveals a problem during the digestion, fill-in or ligation steps. The ligation efficiency can be assessed using the filtering of valid and
invalid pairs. As the ligation is a random process, 25% of each valid ligation
class is expected. In the same way, a high level of dangling-end or self-circle
read pairs is associated with a low quality experiment, and reveals a problem
during the digestion, fill-in or ligation steps.
In the context of Hi-C protocol without restriction enzyme, this analysis step is skipped. The aligned pairs are therefore directly used to generate the contact maps. A filter of the short range contact (typically <1kb) is recommanded as this pairs are likely to be self ligation products. In the context of Hi-C protocol without restriction enzyme, this analysis step
is skipped. The aligned pairs are therefore directly used to generate the
contact maps. A filter of the short range contact (typically <1kb) is
recommanded as this pairs are likely to be self ligation products.
## Duplicates removal ## Duplicates removal
Note that validPairs file are generated per reads chunck. Note that validPairs file are generated per reads chunck.
These files are then merged in the allValidPairs file, and duplicates are removed if the `--rm_dup` parameter is used. These files are then merged in the allValidPairs file, and duplicates are
removed if the `--rm_dup` parameter is used.
* `*allValidPairs` - combined valid pairs from all read chunks * `*allValidPairs` - combined valid pairs from all read chunks
* `*mergestat` - statistics about duplicates removal and valid pairs information * `*mergestat` - statistics about duplicates removal and valid pairs information
Additional quality controls such as fragment size distribution can be extracted from the list of valid interaction products. Additional quality controls such as fragment size distribution can be extracted
We usually expect to see a distribution centered around 300 pb which correspond to the paired-end insert size commonly used. from the list of valid interaction products.
The fraction of dplicates is also presented. A high level of duplication indicates a poor molecular complexity and a potential PCR bias. We usually expect to see a distribution centered around 300 pb which correspond
Finaly, an important metric is to look at the fraction of intra and inter-chromosomal interactions, as well as long range (>20kb) versus short range (<20kb) intra-chromosomal interactions. to the paired-end insert size commonly used.
The fraction of dplicates is also presented. A high level of duplication
indicates a poor molecular complexity and a potential PCR bias.
Finaly, an important metric is to look at the fraction of intra and
inter-chromosomal interactions, as well as long range (>20kb) versus short
range (<20kb) intra-chromosomal interactions.
## Contact maps ## Contact maps
Intra et inter-chromosomal contact maps are build for all specified resolutions. Intra et inter-chromosomal contact maps are build for all specified resolutions.
The genome is splitted into bins of equal size. Each valid interaction is associated with the genomic bins to generate the raw maps. The genome is splitted into bins of equal size. Each valid interaction is
In addition, Hi-C data can contain several sources of biases which has to be corrected. associated with the genomic bins to generate the raw maps.
The current workflow uses the [ìced](https://github.com/hiclib/iced) and [Varoquaux and Servant, 2018](http://joss.theoj.org/papers/10.21105/joss.01286) python package which proposes a fast implementation of the original ICE normalization algorithm (Imakaev et al. 2012), making the assumption of equal visibility of each fragment. In addition, Hi-C data can contain several sources of biases which has to be
corrected.
The current workflow uses the [ìced](https://github.com/hiclib/iced) and
[Varoquaux and Servant, 2018](http://joss.theoj.org/papers/10.21105/joss.01286)
python package which proposes a fast implementation of the original ICE
normalization algorithm (Imakaev et al. 2012), making the assumption of equal
visibility of each fragment.
* `*.matrix` - genome-wide contact maps * `*.matrix` - genome-wide contact maps
* `*_iced.matrix` - genome-wide iced contact maps * `*_iced.matrix` - genome-wide iced contact maps
The contact maps are generated for all specified resolution (see `--bin_size` argument) The contact maps are generated for all specified resolution
(see `--bin_size` argument)
A contact map is defined by : A contact map is defined by :
* A list of genomic intervals related to the specified resolution (BED format). * A list of genomic intervals related to the specified resolution (BED format).
* A matrix, stored as standard triplet sparse format (i.e. list format). * A matrix, stored as standard triplet sparse format (i.e. list format).
Based on the observation that a contact map is symmetric and usually sparse, only non-zero values are stored for half of the matrix. The user can specified if the 'upper', 'lower' or 'complete' matrix has to be stored. The 'asis' option allows to store the contacts as they are observed from the valid pairs files. Based on the observation that a contact map is symmetric and usually sparse,
only non-zero values are stored for half of the matrix. The user can specified
if the 'upper', 'lower' or 'complete' matrix has to be stored. The 'asis'
option allows to store the contacts as they are observed from the valid pairs
files.
```bash ```bash
A B 10 A B 10
...@@ -109,19 +172,27 @@ Based on the observation that a contact map is symmetric and usually sparse, onl ...@@ -109,19 +172,27 @@ Based on the observation that a contact map is symmetric and usually sparse, onl
(...) (...)
``` ```
This format is memory efficient, and is compatible with several software for downstream analysis. This format is memory efficient, and is compatible with several software for
downstream analysis.
## MultiQC ## MultiQC
[MultiQC](http://multiqc.info) is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in within the report data directory. [MultiQC](http://multiqc.info) is a visualisation tool that generates a single
HTML report summarising all samples in your project. Most of the pipeline QC
results are visualised in the report and further statistics are available in
within the report data directory.
The pipeline has special steps which allow the software versions used to be reported in the MultiQC output for future traceability. The pipeline has special steps which allow the software versions used to be
reported in the MultiQC output for future traceability.
**Output directory: `results/multiqc`** **Output directory: `results/multiqc`**
* `Project_multiqc_report.html` * `Project_multiqc_report.html`
* MultiQC report - a standalone HTML file that can be viewed in your web browser * MultiQC report - a standalone HTML file that can be viewed in your
web browser
* `Project_multiqc_data/` * `Project_multiqc_data/`
* Directory containing parsed statistics from the different tools used in the pipeline * Directory containing parsed statistics from the different tools used
in the pipeline
For more information about how to use MultiQC reports, see [http://multiqc.info](http://multiqc.info) For more information about how to use MultiQC reports, see
[http://multiqc.info](http://multiqc.info)
...@@ -2,11 +2,14 @@ ...@@ -2,11 +2,14 @@
## Input files not found ## Input files not found
If only no file, only one input file , or only read one and not read two is picked up then something is wrong with your input file declaration If only no file, only one input file , or only read one and not read two is
picked up then something is wrong with your input file declaration
1. The path must be enclosed in quotes (`'` or `"`) 1. The path must be enclosed in quotes (`'` or `"`)
2. The path must have at least one `*` wildcard character. This is even if you are only running one paired end sample. 2. The path must have at least one `*` wildcard character. This is even if
3. When using the pipeline with paired end data, the path must use `{1,2}` or `{R1,R2}` notation to specify read pairs. you are only running one paired end sample.
3. When using the pipeline with paired end data, the path must use `{1,2}` or
`{R1,R2}` notation to specify read pairs.
4. If you are running Single end data make sure to specify `--singleEnd` 4. If you are running Single end data make sure to specify `--singleEnd`
If the pipeline can't find your files then you will get the following error If the pipeline can't find your files then you will get the following error
...@@ -15,14 +18,26 @@ If the pipeline can't find your files then you will get the following error ...@@ -15,14 +18,26 @@ If the pipeline can't find your files then you will get the following error
ERROR ~ Cannot find any reads matching: *{1,2}.fastq.gz ERROR ~ Cannot find any reads matching: *{1,2}.fastq.gz
``` ```
Note that if your sample name is "messy" then you have to be very particular with your glob specification. A file name like `L1-1-D-2h_S1_L002_R1_001.fastq.gz` can be difficult enough for a human to read. Specifying `*{1,2}*.gz` wont work give you what you want Whilst `*{R1,R2}*.gz` will. Note that if your sample name is "messy" then you have to be very particular
with your glob specification. A file name like
`L1-1-D-2h_S1_L002_R1_001.fastq.gz` can be difficult enough for a human to
read. Specifying `*{1,2}*.gz` wont work whilst `*{R1,R2}*.gz` will.
## Data organization ## Data organization
The pipeline can't take a list of multiple input files - it takes a glob expression. If your input files are scattered in different paths then we recommend that you generate a directory with symlinked files. If running in paired end mode please make sure that your files are sensibly named so that they can be properly paired. See the previous point.
The pipeline can't take a list of multiple input files - it takes a glob
expression. If your input files are scattered in different paths then we
recommend that you generate a directory with symlinked files. If running
in paired end mode please make sure that your files are sensibly named so
that they can be properly paired. See the previous point.
## Extra resources and getting help ## Extra resources and getting help
If you still have an issue with running the pipeline then feel free to contact us.
Have a look at the [pipeline website](https://github.com/nf-core/hic) to find out how.
If you have problems that are related to Nextflow and not our pipeline then check out the [Nextflow gitter channel](https://gitter.im/nextflow-io/nextflow) or the [google group](https://groups.google.com/forum/#!forum/nextflow). If you still have an issue with running the pipeline then feel free to
contact us.
Have a look at the [pipeline website](https://github.com/nf-core/hic) to
find out how.
If you have problems that are related to Nextflow and not our pipeline then
check out the [Nextflow gitter channel](https://gitter.im/nextflow-io/nextflow)
or the [google group](https://groups.google.com/forum/#!forum/nextflow).
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment