Skip to content
Snippets Groups Projects
Commit 273cba0e authored by nservant's avatar nservant
Browse files

[MODIF] update docs

parent f1633719
No related branches found
No related tags found
No related merge requests found
......@@ -109,8 +109,8 @@ can thus be discarded using the `--min_cis_dist` parameter.
- `*.FiltPairs` - List of filtered pairs
- `*RSstat` - Statitics of number of read pairs falling in each category
Of note, these results are saved only if `--save_pairs_intermediates` is used.
The validPairs are stored using a simple tab-delimited text format ;
Of note, these results are saved only if `--save_pairs_intermediates` is used.
The `validPairs` are stored using a simple tab-delimited text format ;
```bash
read name / chr_reads1 / pos_reads1 / strand_reads1 / chr_reads2 / pos_reads2 /
......@@ -131,9 +131,9 @@ recommanded as this pairs are likely to be self ligation products.
#### Duplicates removal
Note that validPairs file are generated per reads chunck (and saved only if
Note that `validPairs` file are generated per reads chunck (and saved only if
`--save_pairs_intermediates` is specified).
These files are then merged in the allValidPairs file, and duplicates are
These files are then merged in the `allValidPairs` file, and duplicates are
removed (see `--keep_dups` to disable duplicates filtering).
**Output directory: `results/hicpro/valid_pairs`**
......@@ -146,7 +146,7 @@ We usually expect to see a distribution centered around 300 pb which correspond
to the paired-end insert size commonly used.
The fraction of dplicates is also presented. A high level of duplication
indicates a poor molecular complexity and a potential PCR bias.
Finaly, an important metric is to look at the fraction of intra and
Finally, an important metric is to look at the fraction of intra and
inter-chromosomal interactions, as well as long range (>20kb) versus short
range (<20kb) intra-chromosomal interactions.
......@@ -221,16 +221,16 @@ downstream analysis.
## Hi-C contact maps
Contact maps are usually stored as simple txt (`HiC-Pro`), .hic (`Juicer/Juicebox`) and .(m)cool (`cooler/Higlass`) formats.
Note that .cool and .hic format are compressed and usually much more efficient that the txt format.
The .cool and .hic format are compressed and indexed and usually much more efficient that the txt format.
In the current workflow, we propose to use the `cooler` format as a standard to build the raw and normalized maps
after valid pairs detection as it is used by several downstream analysis and visualization tools.
Raw contact maps are therefore in **`results/contact_maps/raw`** which contains the different maps in `txt` and `cool` formats, at various resolutions.
Normalized contact maps are stored in **`results/contact_maps/norm`** which contains the different maps in `txt`, `cool`, and `mcool` format.
The bin coordinates used for the various resolution are available in **`results/contact_maps/bins`**.
The bin coordinates used for all resolutions are available in **`results/contact_maps/bins`**.
Note that `txt` contact maps generated with `cooler` are identical to those generated by `HiC-Pro`.
However, differences can be observed on the normalized contact maps as the balancing algorithm is not the same.
However, differences can be observed on the normalized contact maps as the balancing algorithm is not exactly the same.
## Downstream analysis
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment