Explain the discrepancy between these results and $k=9$
</div>
### Implementing your own $k$-means clustering algorithm
The $k$-means algorithm follow the following steps:
- Assign point to the cluster with the nearest centroid
- Compute the new cluster centroids
<div class="pencadre">
Think about the starting state of your algorithm and the stopping condition
</div>
<details><summary>Solution</summary>
<p>
We have no prior information about the centroid, we can randomly draw them
We are going to iterate over the two step of the algorithm until the centroids stay the same
</p>
</details>
<div class="pencadre">
Start by implementing an `kmeans_initiation(x, k)` function for your algorithm, returning $k$ centroids
</div>
<details><summary>Solution</summary>
<p>
```{r}
kmeans_initiation <- function(x, k) {
centroid <- matrix(0, k, ncol(x))
for (i in 1:ncol(x)) {
centroid[, i] <- runif(k, min = min(x[, i]), max = max(x[, i]))
}
return(centroid)
}
```
</p>
</details>
<div class="pencadre">
Implement an `compute_distance(x, centroid)` function for your algorithm, the distance of each point (row of x) to each centroid, based on the squared Euclidian distance
</div>
<details><summary>Solution</summary>
<p>
```{r}
compute_distance <- function(x, centroid) {
distance <- matrix(0, nrow(x), nrow(centroid))
for (i in 1:ncol(distance)) {
distance[, i] <- rowSums((x - centroid[i, ])^2)
}
return(distance)
}
```
</p>
</details>
<div class="pencadre">
Implement an `cluster_assignment(distance)` function for your algorithm, returning the assignment of each point (row of x), based on the squared Euclidian distance