How to resolve the algorithm Levenshtein distance step by step in the Java programming language
How to resolve the algorithm Levenshtein distance step by step in the Java programming language
Table of Contents
Problem Statement
In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e. an edit distance). The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.
The Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there isn't a way to do it with fewer than three edits:
The Levenshtein distance between "rosettacode", "raisethysword" is 8. The distance between two strings is same as that when both strings are reversed.
Implements a Levenshtein distance function, or uses a library function, to show the Levenshtein distance between "kitten" and "sitting".
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Levenshtein distance step by step in the Java programming language
Levenshtein Distance
The Levenshtein distance is a metric for measuring the difference between two strings. It is commonly used to measure the similarity of two strings, and is often used in applications such as spell checking and plagiarism detection.
The Levenshtein distance between two strings is the minimum number of edits (insertions, deletions, or substitutions) that are required to transform one string into the other.
Implementation
The first implementation of the Levenshtein distance calculates the distance using a dynamic programming approach. It creates a table of costs, where the cost of transforming the first i characters of the first string into the first j characters of the second string is stored in the cell (i, j) of the table. The table is filled in row by row, and the cost of each cell is calculated as the minimum of the costs of the three adjacent cells (the cell to the left, the cell above, and the cell diagonally above).
The second implementation of the Levenshtein distance uses a recursive approach. It compares the first characters of the two strings, and if they are equal, it recursively calculates the distance between the rest of the two strings. If the characters are not equal, it considers three possibilities: inserting a character into the first string, deleting a character from the first string, or substituting a character in the first string. It recursively calculates the distance for each of these possibilities, and returns the minimum of the three distances.
The third implementation of the Levenshtein distance uses an iterative approach. It creates an array of costs, where the cost of transforming the first i characters of the first string into the first j characters of the second string is stored in the cell (i, j) of the array. The array is filled in column by column, and the cost of each cell is calculated as the minimum of the costs of the three adjacent cells (the cell to the left, the cell above, and the cell diagonally above).
The output of the example is:
The Levenshtein distance between kitten and sitting is 3
This indicates that the two strings are very similar, and that only three edits are required to transform one string into the other.
Source code in the java programming language
public class Levenshtein {
public static int distance(String a, String b) {
a = a.toLowerCase();
b = b.toLowerCase();
// i == 0
int [] costs = new int [b.length() + 1];
for (int j = 0; j < costs.length; j++)
costs[j] = j;
for (int i = 1; i <= a.length(); i++) {
// j == 0; nw = lev(i - 1, j)
costs[0] = i;
int nw = i - 1;
for (int j = 1; j <= b.length(); j++) {
int cj = Math.min(1 + Math.min(costs[j], costs[j - 1]), a.charAt(i - 1) == b.charAt(j - 1) ? nw : nw + 1);
nw = costs[j];
costs[j] = cj;
}
}
return costs[b.length()];
}
public static void main(String [] args) {
String [] data = { "kitten", "sitting", "saturday", "sunday", "rosettacode", "raisethysword" };
for (int i = 0; i < data.length; i += 2)
System.out.println("distance(" + data[i] + ", " + data[i+1] + ") = " + distance(data[i], data[i+1]));
}
}
public class Levenshtein{
public static int levenshtein(String s, String t){
/* if either string is empty, difference is inserting all chars
* from the other
*/
if(s.length() == 0) return t.length();
if(t.length() == 0) return s.length();
/* if first letters are the same, the difference is whatever is
* required to edit the rest of the strings
*/
if(s.charAt(0) == t.charAt(0))
return levenshtein(s.substring(1), t.substring(1));
/* else try:
* changing first letter of s to that of t,
* remove first letter of s, or
* remove first letter of t
*/
int a = levenshtein(s.substring(1), t.substring(1));
int b = levenshtein(s, t.substring(1));
int c = levenshtein(s.substring(1), t);
if(a > b) a = b;
if(a > c) a = c;
//any of which is 1 edit plus editing the rest of the strings
return a + 1;
}
public static void main(String[] args) {
String s1 = "kitten";
String s2 = "sitting";
System.out.println("distance between '" + s1 + "' and '"
+ s2 + "': " + levenshtein(s1, s2));
s1 = "rosettacode";
s2 = "raisethysword";
System.out.println("distance between '" + s1 + "' and '"
+ s2 + "': " + levenshtein(s1, s2));
StringBuilder sb1 = new StringBuilder(s1);
StringBuilder sb2 = new StringBuilder(s2);
System.out.println("distance between '" + sb1.reverse() + "' and '"
+ sb2.reverse() + "': "
+ levenshtein(sb1.reverse().toString(), sb2.reverse().toString()));
}
}
import static java.lang.Math.abs;
import static java.lang.Math.max;
public class Levenshtein {
public static int ld(String a, String b) {
return distance(a, b, -1);
}
public static boolean ld(String a, String b, int max) {
return distance(a, b, max) <= max;
}
private static int distance(String a, String b, int max) {
if (a == b) return 0;
int la = a.length();
int lb = b.length();
if (max >= 0 && abs(la - lb) > max) return max+1;
if (la == 0) return lb;
if (lb == 0) return la;
if (la < lb) {
int tl = la; la = lb; lb = tl;
String ts = a; a = b; b = ts;
}
int[] cost = new int[lb+1];
for (int i=0; i<=lb; i+=1) {
cost[i] = i;
}
for (int i=1; i<=la; i+=1) {
cost[0] = i;
int prv = i-1;
int min = prv;
for (int j=1; j<=lb; j+=1) {
int act = prv + (a.charAt(i-1) == b.charAt(j-1) ? 0 : 1);
cost[j] = min(1+(prv=cost[j]), 1+cost[j-1], act);
if (prv < min) min = prv;
}
if (max >= 0 && min > max) return max+1;
}
if (max >= 0 && cost[lb] > max) return max+1;
return cost[lb];
}
private static int min(int ... a) {
int min = Integer.MAX_VALUE;
for (int i: a) if (i<min) min = i;
return min;
}
public static void main(String[] args) {
System.out.println(
ld("kitten","kitten") + " " + // 0
ld("kitten","sitten") + " " + // 1
ld("kitten","sittes") + " " + // 2
ld("kitten","sityteng") + " " + // 3
ld("kitten","sittYing") + " " + // 4
ld("rosettacode","raisethysword") + " " + // 8
ld("kitten","kittenaaaaaaaaaaaaaaaaa") + " " + // 17
ld("kittenaaaaaaaaaaaaaaaaa","kitten") // 17
);
System.out.println(
ld("kitten","kitten", 3) + " " + // true
ld("kitten","sitten", 3) + " " + // true
ld("kitten","sittes", 3) + " " + // true
ld("kitten","sityteng", 3) + " " + // true
ld("kitten","sittYing", 3) + " " + // false
ld("rosettacode","raisethysword", 3) + " " + // false
ld("kitten","kittenaaaaaaaaaaaaaaaaa", 3) + " " + // false
ld("kittenaaaaaaaaaaaaaaaaa","kitten", 3) // false
);
}
}
You may also check:How to resolve the algorithm Hello world/Newline omission step by step in the XPL0 programming language
You may also check:How to resolve the algorithm Strip block comments step by step in the Ruby programming language
You may also check:How to resolve the algorithm Globally replace text in several files step by step in the Arturo programming language
You may also check:How to resolve the algorithm Set consolidation step by step in the Go programming language
You may also check:How to resolve the algorithm Determine if a string is numeric step by step in the Lua programming language