@@ -146,7 +146,7 @@ We usually expect to see a distribution centered around 300 pb which correspond
...
@@ -146,7 +146,7 @@ We usually expect to see a distribution centered around 300 pb which correspond
to the paired-end insert size commonly used.
to the paired-end insert size commonly used.
The fraction of dplicates is also presented. A high level of duplication
The fraction of dplicates is also presented. A high level of duplication
indicates a poor molecular complexity and a potential PCR bias.
indicates a poor molecular complexity and a potential PCR bias.
Finaly, an important metric is to look at the fraction of intra and
Finally, an important metric is to look at the fraction of intra and
inter-chromosomal interactions, as well as long range (>20kb) versus short
inter-chromosomal interactions, as well as long range (>20kb) versus short
range (<20kb) intra-chromosomal interactions.
range (<20kb) intra-chromosomal interactions.
...
@@ -221,16 +221,16 @@ downstream analysis.
...
@@ -221,16 +221,16 @@ downstream analysis.
## Hi-C contact maps
## Hi-C contact maps
Contact maps are usually stored as simple txt (`HiC-Pro`), .hic (`Juicer/Juicebox`) and .(m)cool (`cooler/Higlass`) formats.
Contact maps are usually stored as simple txt (`HiC-Pro`), .hic (`Juicer/Juicebox`) and .(m)cool (`cooler/Higlass`) formats.
Note that .cool and .hic format are compressed and usually much more efficient that the txt format.
The .cool and .hic format are compressed and indexed and usually much more efficient that the txt format.
In the current workflow, we propose to use the `cooler` format as a standard to build the raw and normalized maps
In the current workflow, we propose to use the `cooler` format as a standard to build the raw and normalized maps
after valid pairs detection as it is used by several downstream analysis and visualization tools.
after valid pairs detection as it is used by several downstream analysis and visualization tools.
Raw contact maps are therefore in **`results/contact_maps/raw`** which contains the different maps in `txt` and `cool` formats, at various resolutions.
Raw contact maps are therefore in **`results/contact_maps/raw`** which contains the different maps in `txt` and `cool` formats, at various resolutions.
Normalized contact maps are stored in **`results/contact_maps/norm`** which contains the different maps in `txt`, `cool`, and `mcool` format.
Normalized contact maps are stored in **`results/contact_maps/norm`** which contains the different maps in `txt`, `cool`, and `mcool` format.
The bin coordinates used for the various resolution are available in **`results/contact_maps/bins`**.
The bin coordinates used for all resolutions are available in **`results/contact_maps/bins`**.
Note that `txt` contact maps generated with `cooler` are identical to those generated by `HiC-Pro`.
Note that `txt` contact maps generated with `cooler` are identical to those generated by `HiC-Pro`.
However, differences can be observed on the normalized contact maps as the balancing algorithm is not the same.
However, differences can be observed on the normalized contact maps as the balancing algorithm is not exactly the same.