Skip to content
Snippets Groups Projects
Commit 273cba0e authored by nservant's avatar nservant
Browse files

[MODIF] update docs

parent f1633719
No related branches found
No related tags found
No related merge requests found
...@@ -109,8 +109,8 @@ can thus be discarded using the `--min_cis_dist` parameter. ...@@ -109,8 +109,8 @@ can thus be discarded using the `--min_cis_dist` parameter.
- `*.FiltPairs` - List of filtered pairs - `*.FiltPairs` - List of filtered pairs
- `*RSstat` - Statitics of number of read pairs falling in each category - `*RSstat` - Statitics of number of read pairs falling in each category
Of note, these results are saved only if `--save_pairs_intermediates` is used. Of note, these results are saved only if `--save_pairs_intermediates` is used.
The validPairs are stored using a simple tab-delimited text format ; The `validPairs` are stored using a simple tab-delimited text format ;
```bash ```bash
read name / chr_reads1 / pos_reads1 / strand_reads1 / chr_reads2 / pos_reads2 / read name / chr_reads1 / pos_reads1 / strand_reads1 / chr_reads2 / pos_reads2 /
...@@ -131,9 +131,9 @@ recommanded as this pairs are likely to be self ligation products. ...@@ -131,9 +131,9 @@ recommanded as this pairs are likely to be self ligation products.
#### Duplicates removal #### Duplicates removal
Note that validPairs file are generated per reads chunck (and saved only if Note that `validPairs` file are generated per reads chunck (and saved only if
`--save_pairs_intermediates` is specified). `--save_pairs_intermediates` is specified).
These files are then merged in the allValidPairs file, and duplicates are These files are then merged in the `allValidPairs` file, and duplicates are
removed (see `--keep_dups` to disable duplicates filtering). removed (see `--keep_dups` to disable duplicates filtering).
**Output directory: `results/hicpro/valid_pairs`** **Output directory: `results/hicpro/valid_pairs`**
...@@ -146,7 +146,7 @@ We usually expect to see a distribution centered around 300 pb which correspond ...@@ -146,7 +146,7 @@ We usually expect to see a distribution centered around 300 pb which correspond
to the paired-end insert size commonly used. to the paired-end insert size commonly used.
The fraction of dplicates is also presented. A high level of duplication The fraction of dplicates is also presented. A high level of duplication
indicates a poor molecular complexity and a potential PCR bias. indicates a poor molecular complexity and a potential PCR bias.
Finaly, an important metric is to look at the fraction of intra and Finally, an important metric is to look at the fraction of intra and
inter-chromosomal interactions, as well as long range (>20kb) versus short inter-chromosomal interactions, as well as long range (>20kb) versus short
range (<20kb) intra-chromosomal interactions. range (<20kb) intra-chromosomal interactions.
...@@ -221,16 +221,16 @@ downstream analysis. ...@@ -221,16 +221,16 @@ downstream analysis.
## Hi-C contact maps ## Hi-C contact maps
Contact maps are usually stored as simple txt (`HiC-Pro`), .hic (`Juicer/Juicebox`) and .(m)cool (`cooler/Higlass`) formats. Contact maps are usually stored as simple txt (`HiC-Pro`), .hic (`Juicer/Juicebox`) and .(m)cool (`cooler/Higlass`) formats.
Note that .cool and .hic format are compressed and usually much more efficient that the txt format. The .cool and .hic format are compressed and indexed and usually much more efficient that the txt format.
In the current workflow, we propose to use the `cooler` format as a standard to build the raw and normalized maps In the current workflow, we propose to use the `cooler` format as a standard to build the raw and normalized maps
after valid pairs detection as it is used by several downstream analysis and visualization tools. after valid pairs detection as it is used by several downstream analysis and visualization tools.
Raw contact maps are therefore in **`results/contact_maps/raw`** which contains the different maps in `txt` and `cool` formats, at various resolutions. Raw contact maps are therefore in **`results/contact_maps/raw`** which contains the different maps in `txt` and `cool` formats, at various resolutions.
Normalized contact maps are stored in **`results/contact_maps/norm`** which contains the different maps in `txt`, `cool`, and `mcool` format. Normalized contact maps are stored in **`results/contact_maps/norm`** which contains the different maps in `txt`, `cool`, and `mcool` format.
The bin coordinates used for the various resolution are available in **`results/contact_maps/bins`**. The bin coordinates used for all resolutions are available in **`results/contact_maps/bins`**.
Note that `txt` contact maps generated with `cooler` are identical to those generated by `HiC-Pro`. Note that `txt` contact maps generated with `cooler` are identical to those generated by `HiC-Pro`.
However, differences can be observed on the normalized contact maps as the balancing algorithm is not the same. However, differences can be observed on the normalized contact maps as the balancing algorithm is not exactly the same.
## Downstream analysis ## Downstream analysis
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment