From 273cba0e755a02ef0ed9a3b857ee5fedd2a24187 Mon Sep 17 00:00:00 2001 From: nservant <nicolas.servant@curie.fr> Date: Wed, 4 Jan 2023 17:13:25 +0100 Subject: [PATCH] [MODIF] update docs --- docs/output.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/output.md b/docs/output.md index 4595abf..2953d32 100644 --- a/docs/output.md +++ b/docs/output.md @@ -109,8 +109,8 @@ can thus be discarded using the `--min_cis_dist` parameter. - `*.FiltPairs` - List of filtered pairs - `*RSstat` - Statitics of number of read pairs falling in each category -Of note, these results are saved only if `--save_pairs_intermediates` is used. -The validPairs are stored using a simple tab-delimited text format ; +Of note, these results are saved only if `--save_pairs_intermediates` is used. +The `validPairs` are stored using a simple tab-delimited text format ; ```bash read name / chr_reads1 / pos_reads1 / strand_reads1 / chr_reads2 / pos_reads2 / @@ -131,9 +131,9 @@ recommanded as this pairs are likely to be self ligation products. #### Duplicates removal -Note that validPairs file are generated per reads chunck (and saved only if +Note that `validPairs` file are generated per reads chunck (and saved only if `--save_pairs_intermediates` is specified). -These files are then merged in the allValidPairs file, and duplicates are +These files are then merged in the `allValidPairs` file, and duplicates are removed (see `--keep_dups` to disable duplicates filtering). **Output directory: `results/hicpro/valid_pairs`** @@ -146,7 +146,7 @@ We usually expect to see a distribution centered around 300 pb which correspond to the paired-end insert size commonly used. The fraction of dplicates is also presented. A high level of duplication indicates a poor molecular complexity and a potential PCR bias. -Finaly, an important metric is to look at the fraction of intra and +Finally, an important metric is to look at the fraction of intra and inter-chromosomal interactions, as well as long range (>20kb) versus short range (<20kb) intra-chromosomal interactions. @@ -221,16 +221,16 @@ downstream analysis. ## Hi-C contact maps Contact maps are usually stored as simple txt (`HiC-Pro`), .hic (`Juicer/Juicebox`) and .(m)cool (`cooler/Higlass`) formats. -Note that .cool and .hic format are compressed and usually much more efficient that the txt format. +The .cool and .hic format are compressed and indexed and usually much more efficient that the txt format. In the current workflow, we propose to use the `cooler` format as a standard to build the raw and normalized maps after valid pairs detection as it is used by several downstream analysis and visualization tools. Raw contact maps are therefore in **`results/contact_maps/raw`** which contains the different maps in `txt` and `cool` formats, at various resolutions. Normalized contact maps are stored in **`results/contact_maps/norm`** which contains the different maps in `txt`, `cool`, and `mcool` format. -The bin coordinates used for the various resolution are available in **`results/contact_maps/bins`**. +The bin coordinates used for all resolutions are available in **`results/contact_maps/bins`**. Note that `txt` contact maps generated with `cooler` are identical to those generated by `HiC-Pro`. -However, differences can be observed on the normalized contact maps as the balancing algorithm is not the same. +However, differences can be observed on the normalized contact maps as the balancing algorithm is not exactly the same. ## Downstream analysis -- GitLab