diff --git a/Practical_b.Rmd b/Practical_b.Rmd index 76ac751bd2c3d29359671bd8588725a85c5db92d..ac577a641b3d728868025616392fd861a84cd80b 100644 --- a/Practical_b.Rmd +++ b/Practical_b.Rmd @@ -584,120 +584,44 @@ What can you say about the axes of this plot ? ## Implementing your own $k$-means clustering algorithm +You are going to implement your own $k$-means algorithm in this section. The $k$-means algorithm follow the following steps: - Assign point to the cluster with the nearest centroid - Compute the new cluster centroids +Justify each of your function. + <div class="pencadre"> Think about the starting state of your algorithm and the stopping condition </div> -<details><summary>Solution</summary> -<p> -We have no prior information about the centroid, we can randomly draw them. -We are going to iterate over the two-step of the algorithm until the centroids stay the same. -</p> -</details> - <div class="pencadre"> -Start by implementing a `kmeans_initiation(x, k)` function for your algorithm, returning $k$ centroids +Start by implementing a `kmeans_initiation(x, k)` function, returning $k$ centroids as a starting point. </div> -<details><summary>Solution</summary> -<p> -```{r} -kmeans_initiation <- function(x, k) { - centroid <- matrix(0, k, ncol(x)) - for (i in 1:ncol(x)) { - centroid[, i] <- runif(k, min = min(x[, i]), max = max(x[, i])) - } - return(centroid) -} -``` -</p> -</details> - <div class="pencadre"> -Implement a `compute_distance(x, centroid)` function for your algorithm, the distance of each point (row of x) to each centroid, based on the squared Euclidian distance. +Implement a `compute_distance(x, centroid)` function that compute the distance of each point (row of x) to each centroid </div> -<details><summary>Solution</summary> -<p> -```{r} -compute_distance <- function(x, centroid) { - distance <- matrix(0, nrow(x), nrow(centroid)) - for (i in 1:ncol(distance)) { - distance[, i] <- rowSums((x - centroid[i, ])^2) - } - return(distance) -} -``` -</p> -</details> - <div class="pencadre"> -Implement a `cluster_assignment(distance)` function for your algorithm, returning the assignment of each point (row of x), based on the squared Euclidian distance. +Implement a `cluster_assignment(distance)` function returning the assignment of each point (row of x), based on the results of your `compute_distance(x, centroid)` function. </div> -<details><summary>Solution</summary> -<p> -```{r} -cluster_assignment <- function(distance) { - cluster <- c() - for (i in 1:nrow(distance)) { - cluster[i] <- which(distance[i, ] == min(distance[i, ]))[1] - } - return(cluster) -} -``` -</p> -</details> - - <div class="pencadre"> -Implement a `centroid_update(x, cluster, k)` function for your algorithm, returning the updated centroid for your clusters. +Implement a `centroid_update(x, cluster, k)` function returning the updated centroid for your clusters. </div> -<details><summary>Solution</summary> -<p> -```{r} -centroid_update <- function(x, cluster, k) { - centroid <- matrix(0, k, ncol(x)) - for (i in 1:k) { - centroid[i, ] <- mean(x[cluster[i], ]) - } - return(centroid) -} -``` -</p> -</details> +<div class="pencadre"> +Implement a `metric_example(x, k)` function to compute a criteria of the goodness st of your clustering. +</div> <div class="pencadre"> -Implement a `kmeans_example(x, k)` function for your algorithm, wrapping everything and test it. +Implement a `kmeans_example(x, k)` function, wrapping everything and test it. </div> -<details><summary>Solution</summary> -<p> -```{r} -kmeans_example <- function(x, k) { - centroid <- kmeans_initiation(x, k) - stop_condition <- T - while (stop_condition) { - old_centroid <- centroid - cluster <- cluster_assignment(compute_distance(x, centroid)) - centroid <- centroid_update(x, cluster, k) - if (max(centroid - old_centroid) < 5) { - stop_condition <- F - } - } - return(cluster) -} -``` -</p> -</details> -```{r, echo = F} +```{r, eval = F} data_pca %>% fviz_pca_ind( geom = "point",