Skip to content
Snippets Groups Projects
Verified Commit 7892871b authored by Laurent Modolo's avatar Laurent Modolo
Browse files

Practical b: add kmeans implementation

parent 59e7c332
No related branches found
No related tags found
No related merge requests found
......@@ -480,6 +480,129 @@ fviz_nbclust(data_pca$x[, 1:3], hcut, method = "silhouette")
Explain the discrepancy between these results and $k=9$
</div>
### Implementing your own $k$-means clustering algorithm
The $k$-means algorithm follow the following steps:
- Assign point to the cluster with the nearest centroid
- Compute the new cluster centroids
<div class="pencadre">
Think about the starting state of your algorithm and the stopping condition
</div>
<details><summary>Solution</summary>
<p>
We have no prior information about the centroid, we can randomly draw them
We are going to iterate over the two step of the algorithm until the centroids stay the same
</p>
</details>
<div class="pencadre">
Start by implementing an `kmeans_initiation(x, k)` function for your algorithm, returning $k$ centroids
</div>
<details><summary>Solution</summary>
<p>
```{r}
kmeans_initiation <- function(x, k) {
centroid <- matrix(0, k, ncol(x))
for (i in 1:ncol(x)) {
centroid[, i] <- runif(k, min = min(x[, i]), max = max(x[, i]))
}
return(centroid)
}
```
</p>
</details>
<div class="pencadre">
Implement an `compute_distance(x, centroid)` function for your algorithm, the distance of each point (row of x) to each centroid, based on the squared Euclidian distance
</div>
<details><summary>Solution</summary>
<p>
```{r}
compute_distance <- function(x, centroid) {
distance <- matrix(0, nrow(x), nrow(centroid))
for (i in 1:ncol(distance)) {
distance[, i] <- rowSums((x - centroid[i, ])^2)
}
return(distance)
}
```
</p>
</details>
<div class="pencadre">
Implement an `cluster_assignment(distance)` function for your algorithm, returning the assignment of each point (row of x), based on the squared Euclidian distance
</div>
<details><summary>Solution</summary>
<p>
```{r}
cluster_assignment <- function(distance) {
cluster <- c()
for (i in 1:nrow(distance)) {
cluster[i] <- which(distance[i, ] == min(distance[i, ]))[1]
}
return(cluster)
}
```
</p>
</details>
<div class="pencadre">
Implement an `centroid_update(x, cluster, k)` function for your algorithm, returning the updated centroid for your clusters
</div>
<details><summary>Solution</summary>
<p>
```{r}
centroid_update <- function(x, cluster, k) {
centroid <- matrix(0, k, ncol(x))
for (i in 1:k) {
centroid[i, ] <- mean(x[cluster[i], ])
}
return(centroid)
}
```
</p>
</details>
<div class="pencadre">
Implement an `kmeans_example(x, k)` function for your algorithm, wrapping everything and test it
</div>
<details><summary>Solution</summary>
<p>
```{r}
kmeans_example <- function(x, k) {
centroid <- kmeans_initiation(x, k)
stop_condition <- T
while (stop_condition) {
old_centroid <- centroid
cluster <- cluster_assignment(compute_distance(x, centroid))
centroid <- centroid_update(x, cluster, k)
if (max(centroid - old_centroid) < 5) {
stop_condition <- F
}
}
return(cluster)
}
```
</p>
</details>
```{r, echo = F}
data_pca %>%
fviz_pca_ind(
geom = "point",
col.ind = as.factor(kmeans_example(data_pca$x[,1:2], k = 9))
)
```
## Graph-based clustering
We are going to use the `cluster_louvain()` function to perform a graph-based clustering.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment