How to account for the cell cycle in the previous representations? Is it important?
How to account for the cell cycle in the previous representations? Is it important?
</div>
</div>
...
@@ -1095,7 +1095,7 @@ anova(lm(A101 ~ factor(YDL200C_427), data = yeast_av_subdata))
...
@@ -1095,7 +1095,7 @@ anova(lm(A101 ~ factor(YDL200C_427), data = yeast_av_subdata))
</details>
</details>
<div class="pencadre">
<div class="red_pencadre">
What should we check before interpreting the results of the ANOVA?
What should we check before interpreting the results of the ANOVA?
</div>
</div>
...
@@ -1214,14 +1214,14 @@ Then $\mu$ and $B$ are estimated by a **least square linear regression** (see [h
...
@@ -1214,14 +1214,14 @@ Then $\mu$ and $B$ are estimated by a **least square linear regression** (see [h
</details>
</details>
<div class="pencadre">
<div class="red_pencadre">
**Interpretation of the ANOVA results:** do you find a significant effect of the respective SNPs on the considered morphological trait? Comment the results in relation with your intuition from the analysis of the descriptive statistics [above](#data-description)?
**Interpretation of the ANOVA results:** do you find a significant effect of the respective SNPs on the considered morphological trait? Comment the results in relation with your intuition from the analysis of the descriptive statistics [above](#data-description)?
</div>
</div>
<details><summary>Solution</summary>
<details><summary>Solution</summary>
<p>
<p>
After verifying the normality and homoskedasticity of the residuals (**if either is not verified, we cannot use the results from the ANOVA significance test because it assumes a Gaussian model**), we find a significant effect of SNP `YAL069W_1` and a non-significant effect of SNP `YDL200C_427` onto the morphological trait `A101` (when focusing on the `C` cell cycle phase), which confirms our intuition from the graphical representation.
After verifying the normality and homoskedasticity of the residuals (**if it is not verified, we cannot use the results from the ANOVA significance test because it assumes a Gaussian model**), we find a significant effect of SNP `YAL069W_1` and a non-significant effect of SNP `YDL200C_427` onto the morphological trait `A101` (when focusing on the `C` cell cycle phase), which confirms our intuition from the graphical representation.
---
---
...
@@ -1593,7 +1593,7 @@ The exploration of the SNP data by linear dimension reduction approaches did not
...
@@ -1593,7 +1593,7 @@ The exploration of the SNP data by linear dimension reduction approaches did not
To do so, we are going to run an ANOVA for each SNP.
To do so, we are going to run an ANOVA for each SNP.
<div class="pencadre">
<div class="red_pencadre">
Given the number of SNPs in the data (i.e. `r nrow(gt_data)`), what could be the risk when running such an analysis?
Given the number of SNPs in the data (i.e. `r nrow(gt_data)`), what could be the risk when running such an analysis?
</div>
</div>
...
@@ -1602,7 +1602,7 @@ Given the number of SNPs in the data (i.e. `r nrow(gt_data)`), what could be the
...
@@ -1602,7 +1602,7 @@ Given the number of SNPs in the data (i.e. `r nrow(gt_data)`), what could be the
We are going to do thousands of tests, computing and using thousands of p-values to assess the potential significant effect of each SNP on the considered morphological trait.
We are going to do thousands of tests, computing and using thousands of p-values to assess the potential significant effect of each SNP on the considered morphological trait.
We have a non-negligible risk to wrong reject the null hypothesis for many of the SNPs, and conclude to a non-existing significant effect.
We have a non-negligible risk to wrongly reject the null hypothesis for many of the SNPs, and conclude to a non-existing significant effect.
Thus, we have to use p-values correction (or adjustment) procedure adapted to the case of multiple testing.
Thus, we have to use p-values correction (or adjustment) procedure adapted to the case of multiple testing.
@@ -1816,7 +1816,7 @@ What can we do with these results?
...
@@ -1816,7 +1816,7 @@ What can we do with these results?
## Full data analysis
## Full data analysis
<div class="pencadre">
<div class="red_pencadre">
Open (and optional) question: run the previous analysis to find SNPs with significant effect on the morphological trait `A101` with the full dataset `yeast_data`, i.e. without the average by strain for the morphological traits.
Open (and optional) question: run the previous analysis to find SNPs with significant effect on the morphological trait `A101` with the full dataset `yeast_data`, i.e. without the average by strain for the morphological traits.
In this case, you will have to account for the `strain_id` factor in the ANOVA.
In this case, you will have to account for the `strain_id` factor in the ANOVA.