Skip to content
Snippets Groups Projects
Commit 59e7c332 authored by GD's avatar GD
Browse files

highlight questions to be treated in the report

parent 23cbf9bb
No related branches found
No related tags found
No related merge requests found
...@@ -796,7 +796,7 @@ ggplot(yeast_av_data, aes(factor(YDL200C_427), A101)) + geom_boxplot() + ...@@ -796,7 +796,7 @@ ggplot(yeast_av_data, aes(factor(YDL200C_427), A101)) + geom_boxplot() +
</p> </p>
</details> </details>
<div class="pencadre"> <div class="red_pencadre">
How to account for the cell cycle in the previous representations? Is it important? How to account for the cell cycle in the previous representations? Is it important?
</div> </div>
...@@ -1095,7 +1095,7 @@ anova(lm(A101 ~ factor(YDL200C_427), data = yeast_av_subdata)) ...@@ -1095,7 +1095,7 @@ anova(lm(A101 ~ factor(YDL200C_427), data = yeast_av_subdata))
</details> </details>
<div class="pencadre"> <div class="red_pencadre">
What should we check before interpreting the results of the ANOVA? What should we check before interpreting the results of the ANOVA?
</div> </div>
...@@ -1214,14 +1214,14 @@ Then $\mu$ and $B$ are estimated by a **least square linear regression** (see [h ...@@ -1214,14 +1214,14 @@ Then $\mu$ and $B$ are estimated by a **least square linear regression** (see [h
</details> </details>
<div class="pencadre"> <div class="red_pencadre">
**Interpretation of the ANOVA results:** do you find a significant effect of the respective SNPs on the considered morphological trait? Comment the results in relation with your intuition from the analysis of the descriptive statistics [above](#data-description)? **Interpretation of the ANOVA results:** do you find a significant effect of the respective SNPs on the considered morphological trait? Comment the results in relation with your intuition from the analysis of the descriptive statistics [above](#data-description)?
</div> </div>
<details><summary>Solution</summary> <details><summary>Solution</summary>
<p> <p>
After verifying the normality and homoskedasticity of the residuals (**if either is not verified, we cannot use the results from the ANOVA significance test because it assumes a Gaussian model**), we find a significant effect of SNP `YAL069W_1` and a non-significant effect of SNP `YDL200C_427` onto the morphological trait `A101` (when focusing on the `C` cell cycle phase), which confirms our intuition from the graphical representation. After verifying the normality and homoskedasticity of the residuals (**if it is not verified, we cannot use the results from the ANOVA significance test because it assumes a Gaussian model**), we find a significant effect of SNP `YAL069W_1` and a non-significant effect of SNP `YDL200C_427` onto the morphological trait `A101` (when focusing on the `C` cell cycle phase), which confirms our intuition from the graphical representation.
--- ---
...@@ -1593,7 +1593,7 @@ The exploration of the SNP data by linear dimension reduction approaches did not ...@@ -1593,7 +1593,7 @@ The exploration of the SNP data by linear dimension reduction approaches did not
To do so, we are going to run an ANOVA for each SNP. To do so, we are going to run an ANOVA for each SNP.
<div class="pencadre"> <div class="red_pencadre">
Given the number of SNPs in the data (i.e. `r nrow(gt_data)`), what could be the risk when running such an analysis? Given the number of SNPs in the data (i.e. `r nrow(gt_data)`), what could be the risk when running such an analysis?
</div> </div>
...@@ -1602,7 +1602,7 @@ Given the number of SNPs in the data (i.e. `r nrow(gt_data)`), what could be the ...@@ -1602,7 +1602,7 @@ Given the number of SNPs in the data (i.e. `r nrow(gt_data)`), what could be the
We are going to do thousands of tests, computing and using thousands of p-values to assess the potential significant effect of each SNP on the considered morphological trait. We are going to do thousands of tests, computing and using thousands of p-values to assess the potential significant effect of each SNP on the considered morphological trait.
We have a non-negligible risk to wrong reject the null hypothesis for many of the SNPs, and conclude to a non-existing significant effect. We have a non-negligible risk to wrongly reject the null hypothesis for many of the SNPs, and conclude to a non-existing significant effect.
Thus, we have to use p-values correction (or adjustment) procedure adapted to the case of multiple testing. Thus, we have to use p-values correction (or adjustment) procedure adapted to the case of multiple testing.
...@@ -1660,7 +1660,7 @@ ggplot(test_result) + geom_point(aes(x=SNP_index, y=p_values)) + ...@@ -1660,7 +1660,7 @@ ggplot(test_result) + geom_point(aes(x=SNP_index, y=p_values)) +
<div class="pencadre"> <div class="red_pencadre">
What can you say about these results? What can you say about these results?
</div> </div>
...@@ -1720,7 +1720,7 @@ ggplot( ...@@ -1720,7 +1720,7 @@ ggplot(
``` ```
<div class="pencadre"> <div class="red_pencadre">
What can you say about the different corrections? What can you say about the different corrections?
</div> </div>
...@@ -1791,7 +1791,7 @@ test_result %>% ...@@ -1791,7 +1791,7 @@ test_result %>%
</details> </details>
<div class="pencadre"> <div class="red_pencadre">
What can we do with these results? What can we do with these results?
...@@ -1816,7 +1816,7 @@ What can we do with these results? ...@@ -1816,7 +1816,7 @@ What can we do with these results?
## Full data analysis ## Full data analysis
<div class="pencadre"> <div class="red_pencadre">
Open (and optional) question: run the previous analysis to find SNPs with significant effect on the morphological trait `A101` with the full dataset `yeast_data`, i.e. without the average by strain for the morphological traits. Open (and optional) question: run the previous analysis to find SNPs with significant effect on the morphological trait `A101` with the full dataset `yeast_data`, i.e. without the average by strain for the morphological traits.
In this case, you will have to account for the `strain_id` factor in the ANOVA. In this case, you will have to account for the `strain_id` factor in the ANOVA.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment