How to resolve the algorithm K-means++ clustering step by step in the Java programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm K-means++ clustering step by step in the Java programming language

Table of Contents

Problem Statement

The task is to implement the K-means++ algorithm. Produce a function which takes two arguments: the number of clusters K, and the dataset to classify. K is a positive integer and the dataset is a list of points in the Cartesian plane. The output is a list of clusters (related sets of points, according to the algorithm). For extra credit (in order): Extra credit is only awarded if the examples given demonstrate the feature in question. To earn credit for 1. and 2., visualize 6 clusters of 30,000 points in ℝ2. It is not necessary to provide visualization for spaces higher than ℝ2 but the examples should demonstrate features 3. and 4. if the solution has them.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm K-means++ clustering step by step in the Java programming language

K-Means with K++ Initialization

Overview: This Java code implements the K-Means clustering algorithm with the K++ initialization method. K-Means is an unsupervised machine learning algorithm used to partition a set of data points into K distinct clusters based on their similarity. K++ is a variant of the standard K-Means initialization that aims to select initial centroids that are well-separated, leading to better final clustering results.

Implementation Details:

1. Problem Setup:

  • The KMeansWithKpp class is defined to encapsulate the K-Means algorithm with K++ initialization.
  • The constructor sets up the input data points (points), the number of clusters (k), and creates an array for the centroids (centroids).

2. Initialization Functions:

  • distance() calculates the Euclidean distance between two points.
  • nearest() finds the nearest centroid to a given point.
  • nearestDistance() finds the distance to the nearest centroid.

3. K++ Initialization (kpp()):

  • The K++ initialization method is implemented in the kpp() function.
  • The first centroid is selected randomly.
  • Subsequent centroids are selected with a probability proportional to the square of their distance to the already-selected centroids.
  • This process ensures that the initial centroids are well-separated, improving the clustering quality.

4. K-Means Algorithm (kMeans()):

  • The kMeans() function performs the K-Means clustering algorithm.
  • It iteratively assigns each point to the nearest centroid, recomputes the centroid locations, and updates the point assignments.
  • The process continues until a certain maximum number of iterations or a convergence criterion is met.

5. Data Point Class (Point):

  • The Point class represents a single data point in the plane, with X and Y coordinates (x and y).
  • It also has a group field that tracks which cluster the point belongs to.
  • The class includes helper methods to generate random data points in 2D planes or polar coordinates.

How it Works:

1. K++ Initialization:

  • The algorithm starts by selecting the first centroid randomly.
  • It then iterates through the remaining data points and selects each subsequent centroid from the points that are farthest from the previously selected centroids.

2. K-Means Clustering:

  • After the initial centroids are selected, the algorithm assigns each data point to the nearest centroid.
  • It then recomputes the centroids as the average location of the points in each cluster.
  • The data points are assigned again, and this process repeats until the centroids no longer change significantly.

Usage: This code can be used to perform K-Means clustering on a given set of data points. You can create an instance of the KMeansWithKpp class, provide the data points, and specify the number of clusters. The kMeans() function can then be called to perform the clustering. The Point objects will have their group field set to indicate which cluster they belong to.

Source code in the java programming language

import java.util.Random;

public class KMeansWithKpp{
		// Variables Needed
		public Point[] points;
		public Point[] centroids;
		Random rand;
		public int n;
		public int k;

		// hide default constructor
		private KMeansWithKpp(){
		}

		KMeansWithKpp(Point[] p, int clusters){
				points = p;
				n = p.length;
				k = Math.max(1, clusters);
				centroids = new Point[k];
				rand = new Random();
		}


		private static double distance(Point a, Point b){
				return (a.x - b.x) * (a.x - b.x) + (a.y - b.y) * (a.y - b.y);
		}

		private static int nearest(Point pt, Point[] others, int len){
				double minD = Double.MAX_VALUE;
				int index = pt.group;
				len = Math.min(others.length, len);
				double dist;
				for (int i = 0; i < len; i++) {
						if (minD > (dist = distance(pt, others[i]))) {
								minD = dist;
								index = i;
						}
				}
				return index;
		}

		private static double nearestDistance(Point pt, Point[] others, int len){
				double minD = Double.MAX_VALUE;
				len = Math.min(others.length, len);
				double dist;
				for (int i = 0; i < len; i++) {
						if (minD > (dist = distance(pt, others[i]))) {
								minD = dist;
						}
				}
				return minD;
		}

		private void kpp(){
				centroids[0] = points[rand.nextInt(n)];
				double[] dist = new double[n];
				double sum = 0;
				for (int i = 1; i < k; i++) {
						for (int j = 0; j < n; j++) {
								dist[j] = nearestDistance(points[j], centroids, i);
								sum += dist[j];
						}
						sum = (sum * rand.nextInt(Integer.MAX_VALUE)) / Integer.MAX_VALUE;
						for (int j = 0; j < n; j++) {
								if ((sum -= dist[j]) > 0)
										continue;
								centroids[i].x = points[j].x;
								centroids[i].y = points[j].y;
						}
				}
				for (int i = 0; i < n; i++) {
						points[i].group = nearest(points[i], centroids, k);
				}
		}

		public void kMeans(int maxTimes){
				if (k == 1 || n <= 0) {
						return;
				}
				if(k >= n){
						for(int i =0; i < n; i++){
								points[i].group = i;
						}
						return;
				}
				maxTimes = Math.max(1, maxTimes);
				int changed;
				int bestPercent = n/1000;
				int minIndex;
				kpp();
				do {
						for (Point c : centroids) {
								c.x = 0.0;
								c.y = 0.0;
								c.group = 0;
						}
						for (Point pt : points) {
								if(pt.group < 0 || pt.group > centroids.length){
										pt.group = rand.nextInt(centroids.length);
								}
								centroids[pt.group].x += pt.x;
								centroids[pt.group].y = pt.y;
								centroids[pt.group].group++;
						}
						for (Point c : centroids) {
								c.x /= c.group;
								c.y /= c.group;
						}
						changed = 0;
						for (Point pt : points) {
								minIndex = nearest(pt, centroids, k);
								if (k != pt.group) {
										changed++;
										pt.group = minIndex;
								}
						}
						maxTimes--;
				} while (changed > bestPercent && maxTimes > 0);
		}
}


// A class for point(x,y) in plane

class Point{
		public double x;
		public double y;
		public int group;

		Point(){
				x = y = 0.0;
				group = 0;
		}

		/*
			Generates a random points on 2D Plane within given X-axis and Y-axis
		 */
		public Point[] getRandomPlaneData(double minX, double maxX, double minY, double maxY, int size){
				if (size <= 0)
						return null;
				double xdiff, ydiff;
				xdiff = maxX - minX;
				ydiff = maxY - minY;
				if (minX > maxX) {
						xdiff = minX - maxX;
						minX = maxX;
				}
				if (maxY < minY) {
						ydiff = minY - maxY;
						minY = maxY;
				}
				Point[] data = new Point[size];
				Random rand = new Random();
				for (int i = 0; i < size; i++) {
						data[i].x = minX + (xdiff * rand.nextInt(Integer.MAX_VALUE)) / Integer.MAX_VALUE;
						data[i].y = minY + (ydiff * rand.nextInt(Integer.MAX_VALUE)) / Integer.MAX_VALUE;
				}
				return data;
		}

		/*
	             Generate Random Polar Coordinates within given radius
		 */
		public Point[] getRandomPolarData(double radius, int size){
				if (size <= 0) {
						return null;
				}
				Point[] data = new Point[size];
				double radi, arg;
				Random rand = new Random();
				for (int i = 0; i < size; i++) {
						radi = (radius * rand.nextInt(Integer.MAX_VALUE)) / Integer.MAX_VALUE;
						arg = (2 * Math.PI * rand.nextInt(Integer.MAX_VALUE)) / Integer.MAX_VALUE;
						data[i].x = radi * Math.cos(arg);
						data[i].y = radi * Math.sin(arg);
				}
				return data;
		}
		
}


  

You may also check:How to resolve the algorithm Evolutionary algorithm step by step in the COBOL programming language
You may also check:How to resolve the algorithm Classes step by step in the Forth programming language
You may also check:How to resolve the algorithm Flipping bits game step by step in the MiniScript programming language
You may also check:How to resolve the algorithm Program name step by step in the Rust programming language
You may also check:How to resolve the algorithm URL decoding step by step in the Racket programming language