Skip to content
Snippets Groups Projects
Commit d3444619 authored by fmortreu's avatar fmortreu
Browse files

franck's comments

parent ea82861f
No related branches found
No related tags found
No related merge requests found
# Benchmark
## Aim
Hi-C experiments generate contact matrices that show how different regions of the genome interact in 3D space. However, the way we process Hi-C data—including how we align the reads and apply filtering can significantly impact the results.
## What is compared
Here we are gonna compare HiC matrices (output of the pipeline) with different parameters to see the effect off those parameters on the matrices, on the HiC contacts.
This analysis compares Hi-C matrices obtained with different alignment and filtering parameters to:
When we checked the number of contacts depending of the alignment and filtering parameters of the pipeline we observed some differences (see plot below). We want to see more precisely which parameters impacts the number of contacts and if we can observe some kind of particular pattern of contact.
- Understand how these parameters influence the number and distribution of Hi-C contacts.
- Identify potential biases introduced by different processing methods.
- Find the best settings to maximize meaningful biological information.
![](images/contacts_points.png)
By visualizing the differences between matrices, we aim to detect patterns that might help optimize Hi-C workflows.
## Comparisons
## What is compared
We compare only for the hicstuff workflow. Each matrix figure is a diff between 2 conditions : **(log<sub>2</sub>(matrix1/matrix2))**
Here we are gonna compare HiC matrices (output of the pipeline) with different parameters to see the effect off those parameters on the matrices, on the HiC contacts. When we checked the number of contacts depending of the alignment and filtering parameters of the pipeline we observed some differences. We want to see more precisely which parameters impacts the number of contacts and if we can observe some kind of particular pattern of contact.
### Alignment
We are comparing HiC matrices (output of the pipeline) with different parameters and options to evaluate their effects on the matrices.
We have different alignment options in the pipeline :
We have four alignment options :
- "Normal" is the basic alignment, with `bowtie2`,
- "Iterative" is the iterative alignment from hicstuff with option `--iterative`,
- "Cutsite" is the normal alignment with read preprocessing form hicstuff with option `--cutsite`,
- "Parasplit" is also with preprocessing but with the new module we made, with option `--parasplit`.
And four filtering options :
- "noFilter" is for when no filtering options are applied,
- "filter" is for hicstuff `--filter_event` options, which filter "weird, loop and uncut reads",
- "filter_pcr" is for hicstuff `--filter_pcr` option, which remove duplicated reads based on their start position,
- "filter_filterpcr" is when both those filter are applied in a run.
Depending on the alignment and filtering parameters of the pipeline, we observed some differences on the total number of contacts kept by the pipeline (Figure 1). Overall the 'parasplit' option leads to more contacts for all filtering options.
![](images/contacts_points.png)
Figure 1 : legend ?
## Comparison of alignment options
Here we have matrices with **4kb** resolution, **normalized to 1** (to the total of reads) and with the **diagonal down to 0**.
#### All chromosomes 4kb, normalized to 1:
......@@ -45,15 +60,8 @@ When we look in detail the difference between parasplit and cutsite, we observed
For the alignment, we seem to get more contact with parasplit.
### Filtering options
### Comparison of filtering options
We have different combinations possible for filtering options :
- "noFilter" is for when no filtering options are applied,
- "filter" is for hicstuff `--filter_event` options, which filter "weird, loop and uncut reads",
- "filter_pcr" is for hicstuff `--filter_pcr` option, which remove duplicated reads based on their start position,
- "filter_filterpcr" is when both those filter are applied in a run.
We are gonna look at the matrices diff for normal alignment, no filtering against each of the other possibilities :
#### All chromosomes (4kb, normalized to 1):
No filter vs filter | No filter vs filter pcr | No filter vs both
......@@ -94,3 +102,6 @@ No filter vs filter (cutsite) | No filter vs filter pcr (cutsite) |
![](images/chr3_109.png) | ![](images/chr3_14.png) | ![](images/chr3_64.png)
We still have the line on the diagonal for the filtering option, same as with normal alignment, regardless of the color which is depending of which condition is place as the first or second matrix in the formula, and intensity of color which is higher for parasplit and cutsite, due to their high number of contact.
### Conclusion
balblaba
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment