@@ -71,13 +71,13 @@ library(palmerpenguins) # we load the data
## loading the data
We are going to work on the famous Palmer penguins dataset. This dataset is an integrative study of the breeding ecology and population structure of Pygoscelis penguins along the western Antarctic Peninsula. These data were collected from 2007 - 2009 by Dr. Kristen Gorman with the Palmer Station Long Term Ecological Research Program, part of the US Long Term Ecological Research Network.
We are going to work on the famous Palmer penguins dataset. This dataset is an integrative study of the breeding ecology and population structure of Pygoscelis penguins along the western Antarctic Peninsula. These data were collected from 2007 to 2009 by Dr. Kristen Gorman with the Palmer Station Long Term Ecological Research Program (part of the US Long Term Ecological Research Network).
The `palmerpenguins` data contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica.

The `palmerpenguins` library load the `penguins` dataset into your R environment. If you are not familiar with `tibble`, you just have to know that they are equivalent to `data.frame`.
The `palmerpenguins` library load the `penguins` dataset into your R environment. If you are not familiar with `tibble`, you just have to know that they are equivalent to `data.frame` (but easier to work with).
```{r}
penguins
...
...
@@ -95,7 +95,7 @@ The data is tidy:
- Each observation has its own row.
- Each value must have its own cell.
Meeting these 3 criteria for your data will simplify most of your data processing and analysis.
Meeting these 3 criteria for your data will simplify your data processing and analysis as most of the algorithm expect this format.
```{r}
summary(penguins)
...
...
@@ -138,6 +138,12 @@ What can you say about the covariation structure of the data ?
To explore the PCA algorithm we are first going to focus on 2 two continuous variable in this data set: the bill length and depth (`bill_length_mm`, `bill_depth_mm`) for the female penguins (`sex`).
```{r}
data %>%
filter(sex == "female") %>%
ggplot() +
geom_point(aes(x = bill_length_mm, y = bill_depth_mm, color = species))
```
<div class="pencadre">
Using the `filter` and `select` functions, create a `data_f` data set that meet the
...
...
@@ -155,13 +161,6 @@ data_f <- data %>%
</p>
</details>
```{r}
data %>%
filter(sex == "female") %>%
ggplot() +
geom_point(aes(x = bill_length_mm, y = bill_depth_mm, color = species))
```
## Performing your first PCA
The `prcomp` and `princomp` functions are implementations of the PCA methods