What can you say about the covariation structure of the data ?
</div>
To explore the PCA algorithm we are first going to focus on 2 two continuous variable in this data set: the bill length and depth (`bill_length_mm`, `bill_depth_mm`) for the female penguins (`sex`).
...
...
@@ -200,7 +204,7 @@ The package `factoextra` provides us with functions to manipulate and plot the o
You can do this with the `fviz_pca_ind` function.
<div class="pencadre">
<div class="red_pencadre">
Compare the `fviz_pca_ind` output with the `bill_length_mm` and `bill_depth_mm` scatter plot
</div>
...
...
@@ -265,8 +269,8 @@ diy_data_f <- data_f %>%
</p>
</details>
<div class="pencadre">
! Explain the importance of the centering and scaling steps of the data
<div class="red_pencadre">
Explain the importance of the centering and scaling steps of the data
Explain the discrepancy between these results and $k=9$
</div>
## Graph-based clustering
We are going to use the `cluster_louvain()` function to perform a graph-based clustering.
...
...
@@ -505,6 +511,12 @@ str(data_knn_F)
</p>
</details>
<div class="red_pencadre">
Why do we need a knn ?
</div>
The `cluster_louvain()` function implements the multi-level modularity optimization algorithm for finding community structure in a graph. Use this function on `data_knn` to create a `data_louvain` variable.
You can check the clustering results with `membership(data_louvain)`.
...
...
@@ -559,7 +571,7 @@ data_umap$layout %>%
geom_point(aes(x = V1, y = V2, color = cell_type))