How to resolve the algorithm K-means++ clustering step by step in the Kotlin programming language
How to resolve the algorithm K-means++ clustering step by step in the Kotlin programming language
Table of Contents
Problem Statement
The task is to implement the K-means++ algorithm. Produce a function which takes two arguments: the number of clusters K, and the dataset to classify. K is a positive integer and the dataset is a list of points in the Cartesian plane. The output is a list of clusters (related sets of points, according to the algorithm). For extra credit (in order): Extra credit is only awarded if the examples given demonstrate the feature in question. To earn credit for 1. and 2., visualize 6 clusters of 30,000 points in ℝ2. It is not necessary to provide visualization for spaces higher than ℝ2 but the examples should demonstrate features 3. and 4. if the solution has them.
Let's start with the solution:
Step by Step solution about How to resolve the algorithm K-means++ clustering step by step in the Kotlin programming language
This code implements the K-means++ algorithm, which is a clustering algorithm that aims to find K clusters in a set of data points. The code is written in Kotlin programming language.
The algorithm starts by selecting K initial cluster centers. These centers are chosen randomly, but with a probability that is proportional to the distance between the center and the other data points. This ensures that the initial centers are spread out across the data set.
Once the initial centers have been chosen, the algorithm assigns each data point to the closest center. The data points are then used to update the cluster centers by computing the average position of all the data points in each cluster.
This process is repeated until the cluster centers no longer change, or until a maximum number of iterations has been reached.
The code uses a variety of helper functions to perform these tasks. The dist2
function computes the squared distance between two points. The nearest
function finds the closest cluster center to a given data point. The kpp
function selects the initial cluster centers using the K-means++ algorithm. The lloyd
function implements the Lloyd's algorithm, which is a variant of the K-means algorithm that uses a centroid-based approach to update the cluster centers.
The printEps
function prints the results of the clustering algorithm in the PostScript format. This allows the results to be visualized using a PostScript viewer.
The main function generates a set of data points and then runs the K-means++ algorithm on the data set. The results are printed to the standard output in the PostScript format.
Here is a more detailed explanation of the code:
-
The
Point
data class represents a point in a 2-dimensional space. It has three properties:x
,y
, andgroup
. Thex
andy
properties store the coordinates of the point, and thegroup
property stores the cluster that the point belongs to. -
The
genXY
function generates a list of points in a 2-dimensional space. The points are generated randomly, but with a probability that is proportional to the distance between the point and the center of the space. This ensures that the points are spread out across the space. -
The
dist2
function computes the squared distance between two points. -
The
nearest
function finds the closest cluster center to a given data point. It does this by computing the squared distance between the data point and each cluster center, and then selecting the cluster center with the smallest squared distance. -
The
kpp
function selects the initial cluster centers using the K-means++ algorithm. It does this by selecting the first cluster center randomly, and then selecting the subsequent cluster centers with a probability that is proportional to the distance between the cluster center and the other data points. -
The
lloyd
function implements the Lloyd's algorithm, which is a variant of the K-means algorithm that uses a centroid-based approach to update the cluster centers. It does this by computing the average position of all the data points in each cluster, and then setting the cluster center to the average position. -
The
printEps
function prints the results of the clustering algorithm in the PostScript format. This allows the results to be visualized using a PostScript viewer. -
The
main
function generates a set of data points and then runs the K-means++ algorithm on the data set. The results are printed to the standard output in the PostScript format.
Source code in the kotlin programming language
// version 1.2.21
import java.util.Random
import kotlin.math.*
data class Point(var x: Double, var y: Double, var group: Int)
typealias LPoint = List<Point>
typealias MLPoint = MutableList<Point>
val origin get() = Point(0.0, 0.0, 0)
val r = Random()
val hugeVal = Double.POSITIVE_INFINITY
const val RAND_MAX = Int.MAX_VALUE
const val PTS = 100_000
const val K = 11
const val W = 400
const val H = 400
fun rand() = r.nextInt(RAND_MAX)
fun randf(m: Double) = m * rand() / (RAND_MAX - 1)
fun genXY(count: Int, radius: Double): LPoint {
val pts = List(count) { origin }
/* note: this is not a uniform 2-d distribution */
for (i in 0 until count) {
val ang = randf(2.0 * PI)
val r = randf(radius)
pts[i].x = r * cos(ang)
pts[i].y = r * sin(ang)
}
return pts
}
fun dist2(a: Point, b: Point): Double {
val x = a.x - b.x
val y = a.y - b.y
return x * x + y * y
}
fun nearest(pt: Point, cent: LPoint, nCluster: Int): Pair<Int, Double> {
var minD = hugeVal
var minI = pt.group
for (i in 0 until nCluster) {
val d = dist2(cent[i], pt)
if (minD > d) {
minD = d
minI = i
}
}
return minI to minD
}
fun kpp(pts: LPoint, len: Int, cent: MLPoint) {
val nCent = cent.size
val d = DoubleArray(len)
cent[0] = pts[rand() % len].copy()
for (nCluster in 1 until nCent) {
var sum = 0.0
for (j in 0 until len) {
d[j] = nearest(pts[j], cent, nCluster).second
sum += d[j]
}
sum = randf(sum)
for (j in 0 until len) {
sum -= d[j]
if (sum > 0.0) continue
cent[nCluster] = pts[j].copy()
break
}
}
for (j in 0 until len) pts[j].group = nearest(pts[j], cent, nCent).first
}
fun lloyd(pts: LPoint, len: Int, nCluster: Int): LPoint {
val cent = MutableList(nCluster) { origin }
kpp(pts, len, cent)
do {
/* group element for centroids are used as counters */
for (i in 0 until nCluster) {
with (cent[i]) { x = 0.0; y = 0.0; group = 0 }
}
for (j in 0 until len) {
val p = pts[j]
val c = cent[p.group]
with (c) { group++; x += p.x; y += p.y }
}
for (i in 0 until nCluster) {
val c = cent[i]
c.x /= c.group
c.y /= c.group
}
var changed = 0
/* find closest centroid of each point */
for (j in 0 until len) {
val p = pts[j]
val minI = nearest(p, cent, nCluster).first
if (minI != p.group) {
changed++
p.group = minI
}
}
}
while (changed > (len shr 10)) /* stop when 99.9% of points are good */
for (i in 0 until nCluster) cent[i].group = i
return cent
}
fun printEps(pts: LPoint, len: Int, cent: LPoint, nCluster: Int) {
val colors = DoubleArray(nCluster * 3)
for (i in 0 until nCluster) {
colors[3 * i + 0] = (3 * (i + 1) % 11) / 11.0
colors[3 * i + 1] = (7 * i % 11) / 11.0
colors[3 * i + 2] = (9 * i % 11) / 11.0
}
var minX = hugeVal
var minY = hugeVal
var maxX = -hugeVal
var maxY = -hugeVal
for (j in 0 until len) {
val p = pts[j]
if (maxX < p.x) maxX = p.x
if (minX > p.x) minX = p.x
if (maxY < p.y) maxY = p.y
if (minY > p.y) minY = p.y
}
val scale = minOf(W / (maxX - minX), H / (maxY - minY))
val cx = (maxX + minX) / 2.0
val cy = (maxY + minY) / 2.0
print("%%!PS-Adobe-3.0\n%%%%BoundingBox: -5 -5 %${W + 10} ${H + 10}\n")
print("/l {rlineto} def /m {rmoveto} def\n")
print("/c { .25 sub exch .25 sub exch .5 0 360 arc fill } def\n")
print("/s { moveto -2 0 m 2 2 l 2 -2 l -2 -2 l closepath ")
print(" gsave 1 setgray fill grestore gsave 3 setlinewidth")
print(" 1 setgray stroke grestore 0 setgray stroke }def\n")
val f1 = "%g %g %g setrgbcolor"
val f2 = "%.3f %.3f c"
val f3 = "\n0 setgray %g %g s"
for (i in 0 until nCluster) {
val c = cent[i]
println(f1.format(colors[3 * i], colors[3 * i + 1], colors[3 * i + 2]))
for (j in 0 until len) {
val p = pts[j]
if (p.group != i) continue
println(f2.format((p.x - cx) * scale + W / 2, (p.y - cy) * scale + H / 2))
}
println(f3.format((c.x - cx) * scale + W / 2, (c.y - cy) * scale + H / 2))
}
print("\n%%%%EOF")
}
fun main(args: Array<String>) {
val v = genXY(PTS, 10.0)
val c = lloyd(v, PTS, K)
printEps(v, PTS, c, K)
}
You may also check:How to resolve the algorithm Discordian date step by step in the F# programming language
You may also check:How to resolve the algorithm Base64 decode data step by step in the SenseTalk programming language
You may also check:How to resolve the algorithm Search a list step by step in the Smalltalk programming language
You may also check:How to resolve the algorithm String length step by step in the Modula-3 programming language
You may also check:How to resolve the algorithm Middle three digits step by step in the Oforth programming language