From 59e7c332f8effd0f5baecb24ed4bdef029eea2d8 Mon Sep 17 00:00:00 2001 From: GD <gd.dev@libertymail.net> Date: Fri, 28 Oct 2022 11:29:50 +0200 Subject: [PATCH] highlight questions to be treated in the report --- Practical_c.Rmd | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/Practical_c.Rmd b/Practical_c.Rmd index c1116eb..1871056 100644 --- a/Practical_c.Rmd +++ b/Practical_c.Rmd @@ -796,7 +796,7 @@ ggplot(yeast_av_data, aes(factor(YDL200C_427), A101)) + geom_boxplot() + </p> </details> -<div class="pencadre"> +<div class="red_pencadre"> How to account for the cell cycle in the previous representations? Is it important? </div> @@ -1095,7 +1095,7 @@ anova(lm(A101 ~ factor(YDL200C_427), data = yeast_av_subdata)) </details> -<div class="pencadre"> +<div class="red_pencadre"> What should we check before interpreting the results of the ANOVA? </div> @@ -1214,14 +1214,14 @@ Then $\mu$ and $B$ are estimated by a **least square linear regression** (see [h </details> -<div class="pencadre"> +<div class="red_pencadre"> **Interpretation of the ANOVA results:** do you find a significant effect of the respective SNPs on the considered morphological trait? Comment the results in relation with your intuition from the analysis of the descriptive statistics [above](#data-description)? </div> <details><summary>Solution</summary> <p> -After verifying the normality and homoskedasticity of the residuals (**if either is not verified, we cannot use the results from the ANOVA significance test because it assumes a Gaussian model**), we find a significant effect of SNP `YAL069W_1` and a non-significant effect of SNP `YDL200C_427` onto the morphological trait `A101` (when focusing on the `C` cell cycle phase), which confirms our intuition from the graphical representation. +After verifying the normality and homoskedasticity of the residuals (**if it is not verified, we cannot use the results from the ANOVA significance test because it assumes a Gaussian model**), we find a significant effect of SNP `YAL069W_1` and a non-significant effect of SNP `YDL200C_427` onto the morphological trait `A101` (when focusing on the `C` cell cycle phase), which confirms our intuition from the graphical representation. --- @@ -1593,7 +1593,7 @@ The exploration of the SNP data by linear dimension reduction approaches did not To do so, we are going to run an ANOVA for each SNP. -<div class="pencadre"> +<div class="red_pencadre"> Given the number of SNPs in the data (i.e. `r nrow(gt_data)`), what could be the risk when running such an analysis? </div> @@ -1602,7 +1602,7 @@ Given the number of SNPs in the data (i.e. `r nrow(gt_data)`), what could be the We are going to do thousands of tests, computing and using thousands of p-values to assess the potential significant effect of each SNP on the considered morphological trait. -We have a non-negligible risk to wrong reject the null hypothesis for many of the SNPs, and conclude to a non-existing significant effect. +We have a non-negligible risk to wrongly reject the null hypothesis for many of the SNPs, and conclude to a non-existing significant effect. Thus, we have to use p-values correction (or adjustment) procedure adapted to the case of multiple testing. @@ -1660,7 +1660,7 @@ ggplot(test_result) + geom_point(aes(x=SNP_index, y=p_values)) + -<div class="pencadre"> +<div class="red_pencadre"> What can you say about these results? </div> @@ -1720,7 +1720,7 @@ ggplot( ``` -<div class="pencadre"> +<div class="red_pencadre"> What can you say about the different corrections? </div> @@ -1791,7 +1791,7 @@ test_result %>% </details> -<div class="pencadre"> +<div class="red_pencadre"> What can we do with these results? @@ -1816,7 +1816,7 @@ What can we do with these results? ## Full data analysis -<div class="pencadre"> +<div class="red_pencadre"> Open (and optional) question: run the previous analysis to find SNPs with significant effect on the morphological trait `A101` with the full dataset `yeast_data`, i.e. without the average by strain for the morphological traits. In this case, you will have to account for the `strain_id` factor in the ANOVA. -- GitLab