@@ -146,7 +146,7 @@ We usually expect to see a distribution centered around 300 pb which correspond
to the paired-end insert size commonly used.
The fraction of dplicates is also presented. A high level of duplication
indicates a poor molecular complexity and a potential PCR bias.
Finaly, an important metric is to look at the fraction of intra and
Finally, an important metric is to look at the fraction of intra and
inter-chromosomal interactions, as well as long range (>20kb) versus short
range (<20kb) intra-chromosomal interactions.
...
...
@@ -221,16 +221,16 @@ downstream analysis.
## Hi-C contact maps
Contact maps are usually stored as simple txt (`HiC-Pro`), .hic (`Juicer/Juicebox`) and .(m)cool (`cooler/Higlass`) formats.
Note that .cool and .hic format are compressed and usually much more efficient that the txt format.
The .cool and .hic format are compressed and indexed and usually much more efficient that the txt format.
In the current workflow, we propose to use the `cooler` format as a standard to build the raw and normalized maps
after valid pairs detection as it is used by several downstream analysis and visualization tools.
Raw contact maps are therefore in **`results/contact_maps/raw`** which contains the different maps in `txt` and `cool` formats, at various resolutions.
Normalized contact maps are stored in **`results/contact_maps/norm`** which contains the different maps in `txt`, `cool`, and `mcool` format.
The bin coordinates used for the various resolution are available in **`results/contact_maps/bins`**.
The bin coordinates used for all resolutions are available in **`results/contact_maps/bins`**.
Note that `txt` contact maps generated with `cooler` are identical to those generated by `HiC-Pro`.
However, differences can be observed on the normalized contact maps as the balancing algorithm is not the same.
However, differences can be observed on the normalized contact maps as the balancing algorithm is not exactly the same.