diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 74ffc13db32b7299abb2d73f5a1829e6d01799fd..0fc5c61df27df619c3fc7a9e7e920622568ef2c1 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -24,4 +24,3 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/hic/ - [ ] Output Documentation in `docs/output.md` is updated. - [ ] `CHANGELOG.md` is updated. - [ ] `README.md` is updated (including new tool citations and authors/contributors). - diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 8b53665cc055f8e383f39f1973cbc97c5d7f617c..fcde400cedbc1566f84e8a811e0b45a1c113df60 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -129,3 +129,4 @@ jobs: lint_log.txt lint_results.md PR_number.txt + diff --git a/README.md b/README.md index b211b5de851ab16ce9a07f376c33b10c344ad077..9ba1078d093723191c746198725674ee1384ab8e 100644 --- a/README.md +++ b/README.md @@ -35,13 +35,13 @@ results highly reproducible. ## Pipeline summary -1. HiC-Pro data processing [`HiC-Pro`](https://github.com/nservant/HiC-Pro) +1. HiC-Pro data processing ([`HiC-Pro`](https://github.com/nservant/HiC-Pro)) 1. Mapping using a two steps strategy to rescue reads spanning the ligation sites ([`bowtie2`](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)) 2. Detection of valid interaction products 3. Duplicates removal 4. Generate raw and normalized contact maps ([`iced`](https://github.com/hiclib/iced)) -2. Create genome-wide contact maps at various resolution ([`cooler`](https://github.com/open2c/cooler)) +2. Create genome-wide contact maps at various resolutions ([`cooler`](https://github.com/open2c/cooler)) 3. Contact maps normalization using balancing algorithm ([`cooler`](https://github.com/open2c/cooler)) 4. Export to various contact maps formats ([`HiC-Pro`](https://github.com/nservant/HiC-Pro), [`cooler`](https://github.com/open2c/cooler)) 5. Quality controls ([`HiC-Pro`](https://github.com/nservant/HiC-Pro), [`HiCExplorer`](https://github.com/deeptools/HiCExplorer)) diff --git a/assets/nf-core-hic_logo.png b/assets/nf-core-hic_logo.png index 6b364161664e70224fac3a83fb9f02ed0acbd9f8..37461d9a32ae1f73d9090a3a2387cf8997c9a0ed 100644 Binary files a/assets/nf-core-hic_logo.png and b/assets/nf-core-hic_logo.png differ diff --git a/docs/images/nf-core-hic_logo.png b/docs/images/nf-core-hic_logo.png index e5fead372861ff430d7f1428e15dad9b045523e8..274eb3dc3f3db879c7f3cbc3fd8f49a705a9a3fb 100644 Binary files a/docs/images/nf-core-hic_logo.png and b/docs/images/nf-core-hic_logo.png differ diff --git a/docs/output.md b/docs/output.md index b84a49f19a77e009a32bae63947f81af2741c247..342ce3a704e7d0cdf6ba5fed8f28c00a0d4d8f1f 100644 --- a/docs/output.md +++ b/docs/output.md @@ -25,7 +25,7 @@ results of the whole pipeline * [Export](#exprot) - additionnal export for compatibility with downstream analysis tool and visualization -## HiC-Pro outputs +## HiC-Pro The current version is mainly based on the [HiC-Pro](https://github.com/nservant/HiC-Pro) pipeline. @@ -162,13 +162,16 @@ python package which proposes a fast implementation of the original ICE normalization algorithm (Imakaev et al. 2012), making the assumption of equal visibility of each fragment. +Importantly, the HiC-Pro maps are generated only if the `--hicpro_maps` option +is specified on the command line. + **Output directory: `results/hicpro/matrix`** * `*.matrix` - genome-wide contact maps * `*_iced.matrix` - genome-wide iced contact maps -The contact maps are generated for all specified resolution -(see `--bin_size` argument) +The contact maps are generated for all specified resolutions +(see `--bin_size` argument). A contact map is defined by : * A list of genomic intervals related to the specified resolution (BED format). @@ -192,37 +195,38 @@ downstream analysis. ## Contact maps -Contact maps are usually stored as simple txt (`HiC-Pro` based), .hic (`Juicer/Juicebox` based) and .(m)cool (`cooler/Higlass` based) formats. +Contact maps are usually stored as simple txt (`HiC-Pro`), .hic (`Juicer/Juicebox`) and .(m)cool (`cooler/Higlass`) formats. Note that .cool and .hic format are compressed and usually much more efficient that the txt format. -In the current workflow, we propose to use the `cooler` format as a standard after valid pairs detection as it is the input of several downstream analysis tools. +In the current workflow, we propose to use the `cooler` format as a standard to build the raw and normalized maps +after valid pairs detection as it is used by several downstream analysis and visualization tools. -Raw contact maps are therefore stored in *results/contact_maps/raw* which contains the different maps in txt and cool format, at various resolutions. -Normalized contact maps are stored in *results/contact_maps/norm* which contains the different maps in txt, cool, and mcool format. +Raw contact maps are therefore in **`results/contact_maps/raw`** which contains the different maps in `txt` and `cool` formats, at various resolutions. +Normalized contact maps are stored in **`results/contact_maps/norm`** which contains the different maps in `txt`, `cool`, and `mcool` format. -Note that txt contact maps generated with `cooler` are identical to those generated by `HiC-Pro`. +Note that `txt` contact maps generated with `cooler` are identical to those generated by `HiC-Pro`. However, differences can be observed on the normalized contact maps as the balancing algorithm is not the same. ## Downstream analysis -Downstream analysis are performed from cool files at specified resolution. +Downstream analysis are performed from `cool` files at specified resolution. ### Distance decay The distance decay plot shows the relationship between contact frequencies and genomic distance. It gives a good indication of the compaction of the genome. -According to the organism, the slope of the curve should fit the expection of polymer physics models. +According to the organism, the slope of the curve should fit the expectation of polymer physics models. -The results generated with the `HiCExplorer hicPlotDistVsCounts` tool are available in the *results/dist_decay/* folder. +The results generated with the `HiCExplorer hicPlotDistVsCounts` tool (plot and table) are available in the **`results/dist_decay/`** folder. ### Compartments calling -Compartments calling is one of the most common analysis using Hi-C data which allow to detect A (open, active) / B (close, inactive) compartments. +Compartments calling is one of the most common analysis which aims at detecting A (open, active) / B (close, inactive) compartments. In the first studies on the subject, the compartments were called at high/medium resolution (1000000 to 250000) which is enough to call A/B comparments. -Analysis at higher resolution have shown that these two main types of compartments can be further divided in more precise compartments subtypes. +Analysis at higher resolution has shown that these two main types of compartments can be further divided into compartments subtypes. -Although different methods have been proposed for compartment calling, the standard remains the one based on eigen vector decomposition generation from the normalized correlation maps. +Although different methods have been proposed for compartment calling, the standard remains the eigen vector decomposition from the normalized correlation maps. Here, we use the implementation available in the [`cooltools`](https://cooltools.readthedocs.io/en/lates) package. -Results are available in *results/compartments/* folder and includes : +Results are available in **`results/compartments/`** folder and includes : * `*cis.vecs.tsv`: eigenvectors decomposition along the genome * `*cis.lam.txt`: eigenvalues associated with the eigenvectors @@ -233,8 +237,8 @@ While contacts between genes and regulatority elements can occur within a single TADs calling remains a challenging task, and even if many methods have been proposed in the last decade, little overlap have been found between their results. Currently, the pipeline proposes two approaches : -- Insulation score using the [`cooltools`](https://cooltools.readthedocs.io/en/latest/cli.html#cooltools-diamond-insulation) package. Results are availabe in *results/tads/insulation*. -- [`HiCExplorer TADs calling`](https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindTADs.html). Results are available at *results/tads/hicexplorer*. +- Insulation score using the [`cooltools`](https://cooltools.readthedocs.io/en/latest/cli.html#cooltools-diamond-insulation) package. Results are availabe in **`results/tads/insulation`**. +- [`HiCExplorer TADs calling`](https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindTADs.html). Results are available at **`results/tads/hicexplorer`**. Usually, TADs results are presented as simple BED files, or bigWig files, with the position of boundaries along the genome. diff --git a/docs/usage.md b/docs/usage.md index 82a79b05c3822c50804753cb11e889f1405e3096..3c39e290986aa5f0efc59c628bac3c2984c8c186 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -503,11 +503,11 @@ Default:'5000' ### HiC-Pro contact maps -Note that by default, the contact maps are now generated with the `cooler` framework. +By default, the contact maps are now generated with the `cooler` framework. However, for backward compatibility, the raw and normalized maps can still be generated by HiC-pro if the `--hicpro_maps` parameter is set. -#### `--hicpro_maps +#### `--hicpro_maps` If specified, the raw and ICE normalized contact maps will be generated by HiC-Pro. @@ -557,7 +557,7 @@ normalization. Default: 0.1 #### `--res_dist_decay` -Generates distance vs Hi-C counts plots at a given resolution using HiCExplorer +Generates distance vs Hi-C counts plots at a given resolution using `HiCExplorer`. Several resolution can be specified (comma separeted). Default: '250000' ```bash @@ -582,7 +582,7 @@ Default: '250000' #### `--tads_caller` TADs calling can be performed using different approaches. -Currently available options are 'insulation' and 'hicexplorer'. +Currently available options are `insulation` and `hicexplorer`. Note that all options can be specified (comma separated). Default: 'insulation'