Skip to content
Snippets Groups Projects
Verified Commit 0405fcc8 authored by Laurent Modolo's avatar Laurent Modolo
Browse files

Practical_b.Rmd: reformat implementing your own kmeans

parent 2c98dea0
Branches
No related tags found
No related merge requests found
......@@ -584,120 +584,44 @@ What can you say about the axes of this plot ?
## Implementing your own $k$-means clustering algorithm
You are going to implement your own $k$-means algorithm in this section.
The $k$-means algorithm follow the following steps:
- Assign point to the cluster with the nearest centroid
- Compute the new cluster centroids
Justify each of your function.
<div class="pencadre">
Think about the starting state of your algorithm and the stopping condition
</div>
<details><summary>Solution</summary>
<p>
We have no prior information about the centroid, we can randomly draw them.
We are going to iterate over the two-step of the algorithm until the centroids stay the same.
</p>
</details>
<div class="pencadre">
Start by implementing a `kmeans_initiation(x, k)` function for your algorithm, returning $k$ centroids
Start by implementing a `kmeans_initiation(x, k)` function, returning $k$ centroids as a starting point.
</div>
<details><summary>Solution</summary>
<p>
```{r}
kmeans_initiation <- function(x, k) {
centroid <- matrix(0, k, ncol(x))
for (i in 1:ncol(x)) {
centroid[, i] <- runif(k, min = min(x[, i]), max = max(x[, i]))
}
return(centroid)
}
```
</p>
</details>
<div class="pencadre">
Implement a `compute_distance(x, centroid)` function for your algorithm, the distance of each point (row of x) to each centroid, based on the squared Euclidian distance.
Implement a `compute_distance(x, centroid)` function that compute the distance of each point (row of x) to each centroid
</div>
<details><summary>Solution</summary>
<p>
```{r}
compute_distance <- function(x, centroid) {
distance <- matrix(0, nrow(x), nrow(centroid))
for (i in 1:ncol(distance)) {
distance[, i] <- rowSums((x - centroid[i, ])^2)
}
return(distance)
}
```
</p>
</details>
<div class="pencadre">
Implement a `cluster_assignment(distance)` function for your algorithm, returning the assignment of each point (row of x), based on the squared Euclidian distance.
Implement a `cluster_assignment(distance)` function returning the assignment of each point (row of x), based on the results of your `compute_distance(x, centroid)` function.
</div>
<details><summary>Solution</summary>
<p>
```{r}
cluster_assignment <- function(distance) {
cluster <- c()
for (i in 1:nrow(distance)) {
cluster[i] <- which(distance[i, ] == min(distance[i, ]))[1]
}
return(cluster)
}
```
</p>
</details>
<div class="pencadre">
Implement a `centroid_update(x, cluster, k)` function for your algorithm, returning the updated centroid for your clusters.
Implement a `centroid_update(x, cluster, k)` function returning the updated centroid for your clusters.
</div>
<details><summary>Solution</summary>
<p>
```{r}
centroid_update <- function(x, cluster, k) {
centroid <- matrix(0, k, ncol(x))
for (i in 1:k) {
centroid[i, ] <- mean(x[cluster[i], ])
}
return(centroid)
}
```
</p>
</details>
<div class="pencadre">
Implement a `metric_example(x, k)` function to compute a criteria of the goodness st of your clustering.
</div>
<div class="pencadre">
Implement a `kmeans_example(x, k)` function for your algorithm, wrapping everything and test it.
Implement a `kmeans_example(x, k)` function, wrapping everything and test it.
</div>
<details><summary>Solution</summary>
<p>
```{r}
kmeans_example <- function(x, k) {
centroid <- kmeans_initiation(x, k)
stop_condition <- T
while (stop_condition) {
old_centroid <- centroid
cluster <- cluster_assignment(compute_distance(x, centroid))
centroid <- centroid_update(x, cluster, k)
if (max(centroid - old_centroid) < 5) {
stop_condition <- F
}
}
return(cluster)
}
```
</p>
</details>
```{r, echo = F}
```{r, eval = F}
data_pca %>%
fviz_pca_ind(
geom = "point",
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment