diff --git a/docs/benchmark.md b/docs/benchmark.md new file mode 100644 index 0000000000000000000000000000000000000000..9c6fbc03c00a6cddd0c78efbfe167e37c0595ce4 --- /dev/null +++ b/docs/benchmark.md @@ -0,0 +1,89 @@ +# Benchmark + +## What is compared + +Here we are gonna compare HiC matrices (output of the pipeline) with different parameters to see the effect off those parameters on the matrices, on the HiC contacts. + +When we checked the number of contacts depending of the alignment and filtering parameters of the pipeline we observed some differences (see plot below). We want to see more precisely which parameters impacts the number of contacts and if we can observe some kind of particular pattern of contact. + + + +## Comparisons + +We compare only for the hicstuff workflow. Each matrix figure is a diff between 2 conditions : **(log<sub>2</sub>(matrix1/matrix2))** + +### Alignment + +We have different alignement options in the pipeline : +- "Normal" is the basic alignment, with `bowtie2`, +- "Iterative" is the iterative alignment from hicstuff with option `--iterative`, +- "Cutsite" is the normal alignment with read preprocessing form hicstuff with option `--cutgsite`, +- "Parasplit" is also with preprocessing but with the new module we made, with option `--parasplit`. + +Here we have matrices with **4kb** resolution, **normalized to 1** (to the total of reads) and with the **diagonal down to 0**. + +Normal vs iterative | Normal vs parasplit | Normal vs cutsite +:-------------------------:|:-------------------------:|:-------------------------: + |  |  + +We can zoom on a chromosome to better see the details and a **resolution of 1kb**: + +Normal vs iterative | Normal vs parasplit | Normal vs cutsite +:-------------------------:|:-------------------------:|:-------------------------: + |  |  + +The iterative alignment gives more reads in general but less around the diagonal, in opposite with cutsite and parasplit which give more contacts, especially around the diagonal. + +Parasplit vs cutsite | Chr3 | Chr3 not normalized +:-------------------------:|:------------------------:|:------------------------: + |  |  + +When we look in detail the difference between parasplit and cutsite, we observed that parasplit gives more contacts, which is obvious when we look at the no normalized matrices diff (right). + +For the alignment, we seem to get more contact with parasplit. + +### Filtering options + +We have different combinations possible for filtering options : +- "noFilter" is for when no filtering options are applied, +- "filter" is for hicstuff `--filter_event` options, which filter "weird, loop and uncut reads", +- "filter_pcr" is for hicstuff `--filter_pcr` option, which remove duplicated reads based on their start position, +- "filter_filterpcr" is when both those filter are applied in a run. + +We are gonna look at the matrices diff for normal alignment, no filtering against each of the other possibilities : + +No filter vs filter | No filter vs filter pcr | No filter vs both +:-------------------------:|:-------------------------:|:-------------------------: + |  |  + +No filter vs filter | No filter vs filter pcr | No filter vs both +:-------------------------:|:-------------------------:|:-------------------------: + |  |  + +We can see a red line around the diagonal for the filtering, and some random adding and losing of contacts for the duplicate filtering. For the combination it doesn't seem different at first look. +If we look at these matrices without normalization: + +No filter vs filter not normalized | No filter vs filter pcr not normalized | No filter vs both not normalized +:-------------------------:|:-------------------------:|:-------------------------: + |  |  + +Without the normalization, we can see that the filter has an impact only on the diagonal, which makes sense because its targeting the loops. +The duplicate filtering removes contacts everywhere, we cannot see a pattern and give an explanation. It's probably due to the selection of duplicate, which is based on the start position of reads (see usage : *PCR duplicates will be filtered based on genomic positions pairs where both reads have exactly the same coordinates are considered duplicates and only one of those will be conserved.*) + +We can check that the filtering options have the same effect regardless of the mapping situation : + +No filter vs filter (iterative) | No filter vs filter pcr (iterative) | No filter vs both (iterative) +:-------------------------:|:-------------------------:|:-------------------------: + |  |  + + +No filter vs filter (parasplit) | No filter vs filter pcr (parasplit) | No filter vs both (parasplit) +:-------------------------:|:-------------------------:|:-------------------------: + |  |  + + +No filter vs filter (cutsite) | No filter vs filter pcr (cutsite) | No filter vs both (cutsite) +:-------------------------:|:-------------------------:|:-------------------------: + |  |  + +We still have the line on the diagonal for the filtering option, same as with normal alignment, regardless of the color which is depending of which condition is place as the first or second matrix in the formula, and intensity of color which is higher for parasplit and cutsite, due to their high number of contact. diff --git a/docs/images/102.png b/docs/images/102.png new file mode 100644 index 0000000000000000000000000000000000000000..76238058cee653675b1cc0640fc6646eae7892ba Binary files /dev/null and b/docs/images/102.png differ diff --git a/docs/images/104.png b/docs/images/104.png new file mode 100644 index 0000000000000000000000000000000000000000..9c299338b76c0e6d0dbe87e658cbccbfe2da1d03 Binary files /dev/null and b/docs/images/104.png differ diff --git a/docs/images/109.png b/docs/images/109.png new file mode 100644 index 0000000000000000000000000000000000000000..6b66ebcdcba5321b81983635db08cd31030cb035 Binary files /dev/null and b/docs/images/109.png differ diff --git a/docs/images/14.png b/docs/images/14.png new file mode 100644 index 0000000000000000000000000000000000000000..11e4acfebba3b68ca4d9e4cbfe5820b02c08d85b Binary files /dev/null and b/docs/images/14.png differ diff --git a/docs/images/16.png b/docs/images/16.png new file mode 100644 index 0000000000000000000000000000000000000000..7dbf6a200d17c4fb8bf45ceeb38928991f4f84ef Binary files /dev/null and b/docs/images/16.png differ diff --git a/docs/images/17.png b/docs/images/17.png new file mode 100644 index 0000000000000000000000000000000000000000..2ff0770c2ac9c2bd22aa0d80031c823eb6b29122 Binary files /dev/null and b/docs/images/17.png differ diff --git a/docs/images/18.png b/docs/images/18.png new file mode 100644 index 0000000000000000000000000000000000000000..7412c635eba70ec4634f6d94ea17bcb4482f311c Binary files /dev/null and b/docs/images/18.png differ diff --git a/docs/images/21.png b/docs/images/21.png new file mode 100644 index 0000000000000000000000000000000000000000..1049994eeb4241bae528182808d2fccf377f1919 Binary files /dev/null and b/docs/images/21.png differ diff --git a/docs/images/24.png b/docs/images/24.png new file mode 100644 index 0000000000000000000000000000000000000000..c20bd270b7a9a970f571ad11d15470e9fbbe0c81 Binary files /dev/null and b/docs/images/24.png differ diff --git a/docs/images/28.png b/docs/images/28.png new file mode 100644 index 0000000000000000000000000000000000000000..560db30890b87bdd270afd9905c171960f57f8d0 Binary files /dev/null and b/docs/images/28.png differ diff --git a/docs/images/30.png b/docs/images/30.png new file mode 100644 index 0000000000000000000000000000000000000000..1f844f2ed666ef7f2f64eae3c1e44d8ff728e8fe Binary files /dev/null and b/docs/images/30.png differ diff --git a/docs/images/35.png b/docs/images/35.png new file mode 100644 index 0000000000000000000000000000000000000000..45a34da089018369098f3c99f3b8d74e59a93b38 Binary files /dev/null and b/docs/images/35.png differ diff --git a/docs/images/36.png b/docs/images/36.png new file mode 100644 index 0000000000000000000000000000000000000000..1463e935150f3c58638bd64f5e25720097f26450 Binary files /dev/null and b/docs/images/36.png differ diff --git a/docs/images/40.png b/docs/images/40.png new file mode 100644 index 0000000000000000000000000000000000000000..c59773d124dbfbc630c5a83be15676bf0dc3ab92 Binary files /dev/null and b/docs/images/40.png differ diff --git a/docs/images/42.png b/docs/images/42.png new file mode 100644 index 0000000000000000000000000000000000000000..cb195def1d888b322a28bc41b97212111c04d13c Binary files /dev/null and b/docs/images/42.png differ diff --git a/docs/images/64.png b/docs/images/64.png new file mode 100644 index 0000000000000000000000000000000000000000..792af64fbad7350ef74224c5804ba40f86bf8f49 Binary files /dev/null and b/docs/images/64.png differ diff --git a/docs/images/70.png b/docs/images/70.png new file mode 100644 index 0000000000000000000000000000000000000000..2b80b32d6f59d133a6fc244864db45b837f2cae2 Binary files /dev/null and b/docs/images/70.png differ diff --git a/docs/images/87.png b/docs/images/87.png new file mode 100644 index 0000000000000000000000000000000000000000..b74aaba5e7bf55145f5113b7b2e26c6bf62f4261 Binary files /dev/null and b/docs/images/87.png differ diff --git a/docs/images/chr3_102.png b/docs/images/chr3_102.png new file mode 100644 index 0000000000000000000000000000000000000000..3d601ece6c2cec0766dedbc960af7a6f544260b9 Binary files /dev/null and b/docs/images/chr3_102.png differ diff --git a/docs/images/chr3_104.png b/docs/images/chr3_104.png new file mode 100644 index 0000000000000000000000000000000000000000..74f86610c6ca0370e71a4ae449e91faa5713e574 Binary files /dev/null and b/docs/images/chr3_104.png differ diff --git a/docs/images/chr3_104_noNorm.png b/docs/images/chr3_104_noNorm.png new file mode 100644 index 0000000000000000000000000000000000000000..3e11073c1c4a9ab317b03a1f87581e6465e564c6 Binary files /dev/null and b/docs/images/chr3_104_noNorm.png differ diff --git a/docs/images/chr3_109.png b/docs/images/chr3_109.png new file mode 100644 index 0000000000000000000000000000000000000000..df1f1650787f6c89728ea329ca92a8b9fd6f9f59 Binary files /dev/null and b/docs/images/chr3_109.png differ diff --git a/docs/images/chr3_14.png b/docs/images/chr3_14.png new file mode 100644 index 0000000000000000000000000000000000000000..1b8338c26c9d5644258ffdda9ae3c23fe39b2420 Binary files /dev/null and b/docs/images/chr3_14.png differ diff --git a/docs/images/chr3_17.png b/docs/images/chr3_17.png new file mode 100644 index 0000000000000000000000000000000000000000..4d8fd16731ed6b2240788b3f35631877a50367c8 Binary files /dev/null and b/docs/images/chr3_17.png differ diff --git a/docs/images/chr3_18.png b/docs/images/chr3_18.png new file mode 100644 index 0000000000000000000000000000000000000000..5d834e7fbd87e15d5ca32ae39f7b908b69a54542 Binary files /dev/null and b/docs/images/chr3_18.png differ diff --git a/docs/images/chr3_18_noNorm.png b/docs/images/chr3_18_noNorm.png new file mode 100644 index 0000000000000000000000000000000000000000..9b88242b7623e68396bdbd2915b642e739ccdde0 Binary files /dev/null and b/docs/images/chr3_18_noNorm.png differ diff --git a/docs/images/chr3_21.png b/docs/images/chr3_21.png new file mode 100644 index 0000000000000000000000000000000000000000..62006b9738167952c085c03fea42e8896137beb3 Binary files /dev/null and b/docs/images/chr3_21.png differ diff --git a/docs/images/chr3_21_noNorm.png b/docs/images/chr3_21_noNorm.png new file mode 100644 index 0000000000000000000000000000000000000000..173cca7195f1aa8dd485430523432a08145f6864 Binary files /dev/null and b/docs/images/chr3_21_noNorm.png differ diff --git a/docs/images/chr3_24.png b/docs/images/chr3_24.png new file mode 100644 index 0000000000000000000000000000000000000000..8a231e29f25d6114a989f3b181c721d3f8c5305b Binary files /dev/null and b/docs/images/chr3_24.png differ diff --git a/docs/images/chr3_28.png b/docs/images/chr3_28.png new file mode 100644 index 0000000000000000000000000000000000000000..4666bbd03bd19eb4079b6270909d588631c994e7 Binary files /dev/null and b/docs/images/chr3_28.png differ diff --git a/docs/images/chr3_30.png b/docs/images/chr3_30.png new file mode 100644 index 0000000000000000000000000000000000000000..43abee71095a9d6f32b8fb29b7aaaebba852001b Binary files /dev/null and b/docs/images/chr3_30.png differ diff --git a/docs/images/chr3_30_noNorm.png b/docs/images/chr3_30_noNorm.png new file mode 100644 index 0000000000000000000000000000000000000000..ddd93254cbe192507ac144780dd35a5a3b71a425 Binary files /dev/null and b/docs/images/chr3_30_noNorm.png differ diff --git a/docs/images/chr3_36.png b/docs/images/chr3_36.png new file mode 100644 index 0000000000000000000000000000000000000000..0eff0f21eb3801921e05d9b0fa247494f6ec5890 Binary files /dev/null and b/docs/images/chr3_36.png differ diff --git a/docs/images/chr3_40.png b/docs/images/chr3_40.png new file mode 100644 index 0000000000000000000000000000000000000000..1e4b5b7fe18b735092f77c0b3b4ed62ecbdb5c40 Binary files /dev/null and b/docs/images/chr3_40.png differ diff --git a/docs/images/chr3_42.png b/docs/images/chr3_42.png new file mode 100644 index 0000000000000000000000000000000000000000..ebacf15e1c885d0019f4ac22983bc73c46198fa2 Binary files /dev/null and b/docs/images/chr3_42.png differ diff --git a/docs/images/chr3_64.png b/docs/images/chr3_64.png new file mode 100644 index 0000000000000000000000000000000000000000..7d7ceb303ac1b11ef9709437dec682d5172236a3 Binary files /dev/null and b/docs/images/chr3_64.png differ diff --git a/docs/images/chr3_70.png b/docs/images/chr3_70.png new file mode 100644 index 0000000000000000000000000000000000000000..18bbffcbe43cd21e49d750a70f602b246a97e620 Binary files /dev/null and b/docs/images/chr3_70.png differ diff --git a/docs/images/chr3_87.png b/docs/images/chr3_87.png new file mode 100644 index 0000000000000000000000000000000000000000..4077599c30e871e30b342a24e08752f8667783ba Binary files /dev/null and b/docs/images/chr3_87.png differ diff --git a/docs/images/contacts_points.png b/docs/images/contacts_points.png new file mode 100644 index 0000000000000000000000000000000000000000..1649a4252b9f387a8f625fa3df3eac7934c85ceb Binary files /dev/null and b/docs/images/contacts_points.png differ