Skip to content
Snippets Groups Projects
Verified Commit 2ba458d3 authored by Laurent Modolo's avatar Laurent Modolo
Browse files

Fixe solution rendering and add img

parent fc0a5a16
Branches
No related tags found
No related merge requests found
......@@ -85,7 +85,7 @@ We are going to work on the famous Palmer penguins dataset. This dataset is an i
The `palmerpenguins` data contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica.
![<https://allisonhorst.github.io/palmerpenguins/reference/figures/lter_penguins.png>](https://allisonhorst.github.io/palmerpenguins/reference/figures/lter_penguins.png)
![img/lter_penguins.png](img/lter_penguins.png)
The `palmerpenguins` library load the `penguins` dataset into your R environment. If you are not familiar with `tibble`, you just have to know that they are equivalent to `data.frame`.
......@@ -129,10 +129,11 @@ dim(data)
If you are not familiar with the `%>%` operator or pipe in R: It takes the output of the function on the left and pass it as the first argument of the function on the right.
</p>
</details>
For the sake of this practical, we are going to focus on the continuous variables in the data `bill_length_mm`, `bill_depth_mm`, `flipper_length_mm` and `body_mass_g`.
![<https://allisonhorst.github.io/palmerpenguins/reference/figures/culmen_depth.png>](https://allisonhorst.github.io/palmerpenguins/reference/figures/culmen_depth.png)
![img/culmen_depth.png](img/culmen_depth.png)
The function `pairs` renders scatter plots of each possible pairs of variables
......@@ -159,6 +160,7 @@ data_f <- data %>%
select(c(bill_length_mm, bill_depth_mm)) # we select the two columns
```
</p>
</details>
```{r}
data %>%
......@@ -203,6 +205,7 @@ map(data_f, sd)
```
`map` apply a function to each element of a list or vector.
</p>
</details>
The package `factoextra` provides us with functions to manipulate and plot the output of the `prcomp` function. The most common usage of the PCA results is to display the individuals on the first factorial plan.
......@@ -222,6 +225,7 @@ fviz_pca_ind(data_f_pca,
)
```
</p>
</details>
<div class="pencadre">
What are the percentages in the Dim1 and Dim2 axes ?
......@@ -269,6 +273,7 @@ diy_pca <- data_f %>%
)
```
</p>
</details>
### Data projection
......@@ -343,6 +348,7 @@ first_pc_projection_code <- function(line_slope, x, y){
}
```
</p>
</details>
### Evaluation of the projection
......@@ -396,6 +402,7 @@ S_dist = projection_x^2 + projection_y^2,
Residuals = sqrt((projection_x - bill_length_mm)^2 + (projection_y - bill_depth_mm)^2)
```
</p>
</details>
### Optimal PCA projection
......@@ -412,6 +419,7 @@ diy_cov <- diy_pca %>% as.matrix() %>% cov()
diy_cov
```
</p>
</details>
<div class="pencadre">
You can use the `eigen()` function to get the eigenvalues and vectors of the covariance matrix.
......@@ -423,6 +431,7 @@ You can use the `eigen()` function to get the eigenvalues and vectors of the cov
eigen(diy_cov)
```
</p>
</details>
<div class="pencadre">
......@@ -528,6 +537,7 @@ For the slope value:
geom_abline(slope = eigen(diy_cov)$vector[2, 1] / eigen(diy_cov)$vector[1, 1], color = "red") +
```
</p>
</details>
<div class="pencadre">
Do you have the same results as your neighbors ?
......@@ -637,6 +647,7 @@ For the slope value:
geom_abline(slope = eigen(diy_cov)$vector[2, 2] / eigen(diy_cov)$vector[1, 2], color = "blue") +
```
</p>
</details>
For the PCA construction we want, a PC2 orthogonal to the PC1.
......@@ -777,7 +788,8 @@ point_projection <- function(diy_cov, x, y, PC){
a %*% scaled_b
}
```
<\p>
</p>
</details>
### Comparison with `prcomp`
......@@ -820,6 +832,7 @@ data_f_pca <- data %>%
prcomp(scale = TRUE)
```
</p>
</details>
```{r}
species_f <- data %>% filter(sex == "female") %>% pull(species) # we get the species variable
......@@ -874,6 +887,7 @@ data %>%
pc_var / sum(pc_var)
```
</p>
</details>
### Scree plot
......@@ -900,6 +914,7 @@ tibble(
```
</p>
</details>
### Variable contribution
......@@ -913,6 +928,7 @@ What does it mean to find a slope of `0.5` for PC1 in the `bill_length_mm`, `bil
<p>
It means that if `bill_depth_mm` contribute for 1 to PC1, `bill_length_mm` contribute for 0.5
</p>
</details>
As the number of variables increases, so is the complexity of the linear combinations for each PC.
We can represent the variable axis in the new PCA axis, this representation is called the correlation circle.
......@@ -964,6 +980,7 @@ res_var$contrib # contribution to the axes
res_var$cos2 # quality of the representation
```
</p>
</details>
<div class="pencadre">
......
img/culmen_depth.png

778 KiB

img/iter_penguins.png

1.2 MiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment