Skip to content
Snippets Groups Projects
Verified Commit d7e9372f authored by Mia Croiset's avatar Mia Croiset
Browse files

Add benchmark README + figures

parent cb6f4067
No related branches found
No related tags found
No related merge requests found
Showing with 89 additions and 0 deletions
# Benchmark
## What is compared
Here we are gonna compare HiC matrices (output of the pipeline) with different parameters to see the effect off those parameters on the matrices, on the HiC contacts.
When we checked the number of contacts depending of the alignment and filtering parameters of the pipeline we observed some differences (see plot below). We want to see more precisely which parameters impacts the number of contacts and if we can observe some kind of particular pattern of contact.
![](images/contacts_points.png)
## Comparisons
We compare only for the hicstuff workflow. Each matrix figure is a diff between 2 conditions : **(log<sub>2</sub>(matrix1/matrix2))**
### Alignment
We have different alignement options in the pipeline :
- "Normal" is the basic alignment, with `bowtie2`,
- "Iterative" is the iterative alignment from hicstuff with option `--iterative`,
- "Cutsite" is the normal alignment with read preprocessing form hicstuff with option `--cutgsite`,
- "Parasplit" is also with preprocessing but with the new module we made, with option `--parasplit`.
Here we have matrices with **4kb** resolution, **normalized to 1** (to the total of reads) and with the **diagonal down to 0**.
Normal vs iterative | Normal vs parasplit | Normal vs cutsite
:-------------------------:|:-------------------------:|:-------------------------:
![](images/17.png) | ![](images/24.png) | ![](images/28.png)
We can zoom on a chromosome to better see the details and a **resolution of 1kb**:
Normal vs iterative | Normal vs parasplit | Normal vs cutsite
:-------------------------:|:-------------------------:|:-------------------------:
![](images/chr3_17.png) | ![](images/chr3_24.png) | ![](images/chr3_28.png)
The iterative alignment gives more reads in general but less around the diagonal, in opposite with cutsite and parasplit which give more contacts, especially around the diagonal.
Parasplit vs cutsite | Chr3 | Chr3 not normalized
:-------------------------:|:------------------------:|:------------------------:
![](images/104.png) | ![](images/chr3_104.png) | ![](images/chr3_104_noNorm.png)
When we look in detail the difference between parasplit and cutsite, we observed that parasplit gives more contacts, which is obvious when we look at the no normalized matrices diff (right).
For the alignment, we seem to get more contact with parasplit.
### Filtering options
We have different combinations possible for filtering options :
- "noFilter" is for when no filtering options are applied,
- "filter" is for hicstuff `--filter_event` options, which filter "weird, loop and uncut reads",
- "filter_pcr" is for hicstuff `--filter_pcr` option, which remove duplicated reads based on their start position,
- "filter_filterpcr" is when both those filter are applied in a run.
We are gonna look at the matrices diff for normal alignment, no filtering against each of the other possibilities :
No filter vs filter | No filter vs filter pcr | No filter vs both
:-------------------------:|:-------------------------:|:-------------------------:
![](images/21.png) | ![](images/30.png) | ![](images/18.png)
No filter vs filter | No filter vs filter pcr | No filter vs both
:-------------------------:|:-------------------------:|:-------------------------:
![](images/chr3_21.png) | ![](images/chr3_30.png) | ![](images/chr3_18.png)
We can see a red line around the diagonal for the filtering, and some random adding and losing of contacts for the duplicate filtering. For the combination it doesn't seem different at first look.
If we look at these matrices without normalization:
No filter vs filter not normalized | No filter vs filter pcr not normalized | No filter vs both not normalized
:-------------------------:|:-------------------------:|:-------------------------:
![](images/chr3_21_noNorm.png) | ![](images/chr3_30_noNorm.png) | ![](images/chr3_18_noNorm.png)
Without the normalization, we can see that the filter has an impact only on the diagonal, which makes sense because its targeting the loops.
The duplicate filtering removes contacts everywhere, we cannot see a pattern and give an explanation. It's probably due to the selection of duplicate, which is based on the start position of reads (see usage : *PCR duplicates will be filtered based on genomic positions pairs where both reads have exactly the same coordinates are considered duplicates and only one of those will be conserved.*)
We can check that the filtering options have the same effect regardless of the mapping situation :
No filter vs filter (iterative) | No filter vs filter pcr (iterative) | No filter vs both (iterative)
:-------------------------:|:-------------------------:|:-------------------------:
![](images/chr3_42.png) | ![](images/chr3_36.png) | ![](images/chr3_40.png)
No filter vs filter (parasplit) | No filter vs filter pcr (parasplit) | No filter vs both (parasplit)
:-------------------------:|:-------------------------:|:-------------------------:
![](images/chr3_102.png) | ![](images/chr3_87.png) | ![](images/chr3_70.png)
No filter vs filter (cutsite) | No filter vs filter pcr (cutsite) | No filter vs both (cutsite)
:-------------------------:|:-------------------------:|:-------------------------:
![](images/chr3_109.png) | ![](images/chr3_14.png) | ![](images/chr3_64.png)
We still have the line on the diagonal for the filtering option, same as with normal alignment, regardless of the color which is depending of which condition is place as the first or second matrix in the formula, and intensity of color which is higher for parasplit and cutsite, due to their high number of contact.
docs/images/102.png

111 KiB

docs/images/104.png

167 KiB

docs/images/109.png

101 KiB

docs/images/14.png

169 KiB

docs/images/16.png

132 KiB

docs/images/17.png

136 KiB

docs/images/18.png

124 KiB

docs/images/21.png

85.4 KiB

docs/images/24.png

133 KiB

docs/images/28.png

127 KiB

docs/images/30.png

106 KiB

docs/images/35.png

184 KiB

docs/images/36.png

178 KiB

docs/images/40.png

179 KiB

docs/images/42.png

112 KiB

docs/images/64.png

156 KiB

docs/images/70.png

175 KiB

docs/images/87.png

178 KiB

docs/images/chr3_102.png

70.9 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment