@@ -14,7 +14,7 @@ By visualizing the differences between matrices, we aim to detect patterns that
Here we are gonna compare HiC matrices (output of the pipeline) with different parameters to see the effect off those parameters on the matrices, on the HiC contacts. When we checked the number of contacts depending of the alignment and filtering parameters of the pipeline we observed some differences. We want to see more precisely which parameters impacts the number of contacts and if we can observe some kind of particular pattern of contact.
We are comparing HiC matrices (output of the pipeline) with different parameters and options to evaluate their effects on the matrices.
We are comparing HiC matrices (output of the pipeline) with different parameters and options to evaluate their effects on the matrices.
We have four alignment options :
- "Normal" is the basic alignment, with `bowtie2`,
...
...
@@ -28,9 +28,9 @@ And four filtering options :
- "filter_pcr" is for hicstuff `--filter_pcr` option, which remove duplicated reads based on their start position,
- "filter_filterpcr" is when both those filter are applied in a run.
Depending on the alignment and filtering parameters of the pipeline, we observed some differences on the total number of contacts kept by the pipeline (Figure 1). Overall the 'parasplit' option leads to more contacts for all filtering options.
Depending on the alignment and filtering parameters of the pipeline, we observed some differences on the total number of contacts kept by the pipeline (Figure 1). Overall the `parasplit` option leads to more contacts for all filtering options.


Figure 1 : legend ?
## Comparison of alignment options
...
...
@@ -60,7 +60,7 @@ When we look in detail the difference between parasplit and cutsite, we observed
For the alignment, we seem to get more contact with parasplit.
### Comparison of filtering options
## Comparison of filtering options
#### All chromosomes (4kb, normalized to 1):
...
...
@@ -76,13 +76,13 @@ No filter vs filter | No filter vs filter pcr | No filter vs bot
We can see a red line around the diagonal for the filtering, and some random adding and losing of contacts for the duplicate filtering. For the combination it doesn't seem different at first look.
If we look at these matrices without normalization:
#### Chromosome 3 (1kb, **not** normalized)
#### Chromosome 3 (1kb, **NOT** normalized)
No filter vs filter | No filter vs filter pcr | No filter vs both
Without the normalization, we can see that the filter has an impact only on the diagonal, which makes sense because its targeting the loops.
The duplicate filtering removes contacts everywhere, we cannot see a pattern and give an explanation. It's probably due to the selection of duplicate, which is based on the start position of reads (see usage : *PCR duplicates will be filtered based on genomic positions pairs where both reads have exactly the same coordinates are considered duplicates and only one of those will be conserved.*)
Without the normalization, we can see that the filter has an impact only on the diagonal, which makes sense because it is targeting the loops.
The duplicate filtering removes contacts everywhere, we cannot see a pattern. No obvious bias in the filtering is observed.
We can check that the filtering options have the same effect regardless of the mapping situation :
...
...
@@ -103,5 +103,5 @@ No filter vs filter (cutsite) | No filter vs filter pcr (cutsite) |
We still have the line on the diagonal for the filtering option, same as with normal alignment, regardless of the color which is depending of which condition is place as the first or second matrix in the formula, and intensity of color which is higher for parasplit and cutsite, due to their high number of contact.