Skip to content
Snippets Groups Projects
Commit 51c4f9da authored by nservant's avatar nservant
Browse files

[lint] fix lint error

parent 5402ab5f
Branches
Tags
No related merge requests found
......@@ -24,4 +24,3 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/hic/
- [ ] Output Documentation in `docs/output.md` is updated.
- [ ] `CHANGELOG.md` is updated.
- [ ] `README.md` is updated (including new tool citations and authors/contributors).
......@@ -129,3 +129,4 @@ jobs:
lint_log.txt
lint_results.md
PR_number.txt
......@@ -35,13 +35,13 @@ results highly reproducible.
## Pipeline summary
1. HiC-Pro data processing [`HiC-Pro`](https://github.com/nservant/HiC-Pro)
1. HiC-Pro data processing ([`HiC-Pro`](https://github.com/nservant/HiC-Pro))
1. Mapping using a two steps strategy to rescue reads spanning the ligation
sites ([`bowtie2`](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml))
2. Detection of valid interaction products
3. Duplicates removal
4. Generate raw and normalized contact maps ([`iced`](https://github.com/hiclib/iced))
2. Create genome-wide contact maps at various resolution ([`cooler`](https://github.com/open2c/cooler))
2. Create genome-wide contact maps at various resolutions ([`cooler`](https://github.com/open2c/cooler))
3. Contact maps normalization using balancing algorithm ([`cooler`](https://github.com/open2c/cooler))
4. Export to various contact maps formats ([`HiC-Pro`](https://github.com/nservant/HiC-Pro), [`cooler`](https://github.com/open2c/cooler))
5. Quality controls ([`HiC-Pro`](https://github.com/nservant/HiC-Pro), [`HiCExplorer`](https://github.com/deeptools/HiCExplorer))
......
assets/nf-core-hic_logo.png

10 KiB | W: | H:

assets/nf-core-hic_logo.png

13.7 KiB | W: | H:

assets/nf-core-hic_logo.png
assets/nf-core-hic_logo.png
assets/nf-core-hic_logo.png
assets/nf-core-hic_logo.png
  • 2-up
  • Swipe
  • Onion skin
docs/images/nf-core-hic_logo.png

17.4 KiB | W: | H:

docs/images/nf-core-hic_logo.png

27.1 KiB | W: | H:

docs/images/nf-core-hic_logo.png
docs/images/nf-core-hic_logo.png
docs/images/nf-core-hic_logo.png
docs/images/nf-core-hic_logo.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -25,7 +25,7 @@ results of the whole pipeline
* [Export](#exprot) - additionnal export for compatibility with downstream
analysis tool and visualization
## HiC-Pro outputs
## HiC-Pro
The current version is mainly based on the
[HiC-Pro](https://github.com/nservant/HiC-Pro) pipeline.
......@@ -162,13 +162,16 @@ python package which proposes a fast implementation of the original ICE
normalization algorithm (Imakaev et al. 2012), making the assumption of equal
visibility of each fragment.
Importantly, the HiC-Pro maps are generated only if the `--hicpro_maps` option
is specified on the command line.
**Output directory: `results/hicpro/matrix`**
* `*.matrix` - genome-wide contact maps
* `*_iced.matrix` - genome-wide iced contact maps
The contact maps are generated for all specified resolution
(see `--bin_size` argument)
The contact maps are generated for all specified resolutions
(see `--bin_size` argument).
A contact map is defined by :
* A list of genomic intervals related to the specified resolution (BED format).
......@@ -192,37 +195,38 @@ downstream analysis.
## Contact maps
Contact maps are usually stored as simple txt (`HiC-Pro` based), .hic (`Juicer/Juicebox` based) and .(m)cool (`cooler/Higlass` based) formats.
Contact maps are usually stored as simple txt (`HiC-Pro`), .hic (`Juicer/Juicebox`) and .(m)cool (`cooler/Higlass`) formats.
Note that .cool and .hic format are compressed and usually much more efficient that the txt format.
In the current workflow, we propose to use the `cooler` format as a standard after valid pairs detection as it is the input of several downstream analysis tools.
In the current workflow, we propose to use the `cooler` format as a standard to build the raw and normalized maps
after valid pairs detection as it is used by several downstream analysis and visualization tools.
Raw contact maps are therefore stored in *results/contact_maps/raw* which contains the different maps in txt and cool format, at various resolutions.
Normalized contact maps are stored in *results/contact_maps/norm* which contains the different maps in txt, cool, and mcool format.
Raw contact maps are therefore in **`results/contact_maps/raw`** which contains the different maps in `txt` and `cool` formats, at various resolutions.
Normalized contact maps are stored in **`results/contact_maps/norm`** which contains the different maps in `txt`, `cool`, and `mcool` format.
Note that txt contact maps generated with `cooler` are identical to those generated by `HiC-Pro`.
Note that `txt` contact maps generated with `cooler` are identical to those generated by `HiC-Pro`.
However, differences can be observed on the normalized contact maps as the balancing algorithm is not the same.
## Downstream analysis
Downstream analysis are performed from cool files at specified resolution.
Downstream analysis are performed from `cool` files at specified resolution.
### Distance decay
The distance decay plot shows the relationship between contact frequencies and genomic distance. It gives a good indication of the compaction of the genome.
According to the organism, the slope of the curve should fit the expection of polymer physics models.
According to the organism, the slope of the curve should fit the expectation of polymer physics models.
The results generated with the `HiCExplorer hicPlotDistVsCounts` tool are available in the *results/dist_decay/* folder.
The results generated with the `HiCExplorer hicPlotDistVsCounts` tool (plot and table) are available in the **`results/dist_decay/`** folder.
### Compartments calling
Compartments calling is one of the most common analysis using Hi-C data which allow to detect A (open, active) / B (close, inactive) compartments.
Compartments calling is one of the most common analysis which aims at detecting A (open, active) / B (close, inactive) compartments.
In the first studies on the subject, the compartments were called at high/medium resolution (1000000 to 250000) which is enough to call A/B comparments.
Analysis at higher resolution have shown that these two main types of compartments can be further divided in more precise compartments subtypes.
Analysis at higher resolution has shown that these two main types of compartments can be further divided into compartments subtypes.
Although different methods have been proposed for compartment calling, the standard remains the one based on eigen vector decomposition generation from the normalized correlation maps.
Although different methods have been proposed for compartment calling, the standard remains the eigen vector decomposition from the normalized correlation maps.
Here, we use the implementation available in the [`cooltools`](https://cooltools.readthedocs.io/en/lates) package.
Results are available in *results/compartments/* folder and includes :
Results are available in **`results/compartments/`** folder and includes :
* `*cis.vecs.tsv`: eigenvectors decomposition along the genome
* `*cis.lam.txt`: eigenvalues associated with the eigenvectors
......@@ -233,8 +237,8 @@ While contacts between genes and regulatority elements can occur within a single
TADs calling remains a challenging task, and even if many methods have been proposed in the last decade, little overlap have been found between their results.
Currently, the pipeline proposes two approaches :
- Insulation score using the [`cooltools`](https://cooltools.readthedocs.io/en/latest/cli.html#cooltools-diamond-insulation) package. Results are availabe in *results/tads/insulation*.
- [`HiCExplorer TADs calling`](https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindTADs.html). Results are available at *results/tads/hicexplorer*.
- Insulation score using the [`cooltools`](https://cooltools.readthedocs.io/en/latest/cli.html#cooltools-diamond-insulation) package. Results are availabe in **`results/tads/insulation`**.
- [`HiCExplorer TADs calling`](https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindTADs.html). Results are available at **`results/tads/hicexplorer`**.
Usually, TADs results are presented as simple BED files, or bigWig files, with the position of boundaries along the genome.
......
......@@ -503,11 +503,11 @@ Default:'5000'
### HiC-Pro contact maps
Note that by default, the contact maps are now generated with the `cooler` framework.
By default, the contact maps are now generated with the `cooler` framework.
However, for backward compatibility, the raw and normalized maps can still be generated
by HiC-pro if the `--hicpro_maps` parameter is set.
#### `--hicpro_maps
#### `--hicpro_maps`
If specified, the raw and ICE normalized contact maps will be generated by HiC-Pro.
......@@ -557,7 +557,7 @@ normalization. Default: 0.1
#### `--res_dist_decay`
Generates distance vs Hi-C counts plots at a given resolution using HiCExplorer
Generates distance vs Hi-C counts plots at a given resolution using `HiCExplorer`.
Several resolution can be specified (comma separeted). Default: '250000'
```bash
......@@ -582,7 +582,7 @@ Default: '250000'
#### `--tads_caller`
TADs calling can be performed using different approaches.
Currently available options are 'insulation' and 'hicexplorer'.
Currently available options are `insulation` and `hicexplorer`.
Note that all options can be specified (comma separated).
Default: 'insulation'
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment