How to resolve the algorithm Levenshtein distance step by step in the AWK programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm Levenshtein distance step by step in the AWK programming language

Table of Contents

Problem Statement

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e. an edit distance). The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.

The Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there isn't a way to do it with fewer than three edits:

The Levenshtein distance between   "rosettacode",   "raisethysword"   is   8. The distance between two strings is same as that when both strings are reversed.

Implements a Levenshtein distance function, or uses a library function, to show the Levenshtein distance between   "kitten"   and   "sitting".

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Levenshtein distance step by step in the AWK programming language

Source code in the awk programming language

#!/usr/bin/awk -f

BEGIN {
    a = "kitten";
    b = "sitting";
    d = levenshteinDistance(a, b);
    p = d == 1 ? "" : "s";
    printf("%s -> %s after %d edit%s\n", a, b, d, p);
    exit;
}

function levenshteinDistance(s1, s2,
    s1First, s2First, s1Rest, s2Rest,
    distA, distB, distC, minDist) {

    # If either string is empty,
    # then distance is insertion of the other's characters.
    if (length(s1) == 0) return length(s2);
    if (length(s2) == 0) return length(s1);

    # Rest of process uses first characters 
    # and remainder of each string.
    s1First = substr(s1, 1, 1);
    s2First = substr(s2, 1, 1);
    s1Rest = substr(s1, 2, length(s1));
    s2Rest = substr(s2, 2, length(s2));

    # If leading characters are the same, 
    # then distance is that between the rest of the strings.
    if (s1First == s2First) {
        return levenshteinDistance(s1Rest, s2Rest);
    }

    # Find the distances between sub strings.
    distA = levenshteinDistance(s1Rest, s2);
    distB = levenshteinDistance(s1, s2Rest);
    distC = levenshteinDistance(s1Rest, s2Rest);

    # Return the minimum distance between substrings.    
    minDist = distA;
    if (distB < minDist) minDist = distB;
    if (distC < minDist) minDist = distC;
    return minDist + 1; # Include change for the first character.
}


#!/usr/bin/awk -f

function levdist(str1, str2,	l1, l2, tog, arr, i, j, a, b, c) {
	if (str1 == str2) {
		return 0
	} else if (str1 == "" || str2 == "") {
		return length(str1 str2)
	} else if (substr(str1, 1, 1) == substr(str2, 1, 1)) {
		a = 2
		while (substr(str1, a, 1) == substr(str2, a, 1)) a++
		return levdist(substr(str1, a), substr(str2, a))
	} else if (substr(str1, l1=length(str1), 1) == substr(str2, l2=length(str2), 1)) {
		b = 1
		while (substr(str1, l1-b, 1) == substr(str2, l2-b, 1)) b++
		return levdist(substr(str1, 1, l1-b), substr(str2, 1, l2-b))
	}
	for (i = 0; i <= l2; i++) arr[0, i] = i
	for (i = 1; i <= l1; i++) {
		arr[tog = ! tog, 0] = i
		for (j = 1; j <= l2; j++) {
			a = arr[! tog, j  ] + 1
			b = arr[  tog, j-1] + 1
			c = arr[! tog, j-1] + (substr(str1, i, 1) != substr(str2, j, 1))
			arr[tog, j] = (((a<=b)&&(a<=c)) ? a : ((b<=a)&&(b<=c)) ? b : c)
		}
	}
	return arr[tog, j-1]
}


  

You may also check:How to resolve the algorithm Look-and-say sequence step by step in the jq programming language
You may also check:How to resolve the algorithm Comments step by step in the ActionScript programming language
You may also check:How to resolve the algorithm Metaprogramming step by step in the M2000 Interpreter programming language
You may also check:How to resolve the algorithm Delete a file step by step in the PARI/GP programming language
You may also check:How to resolve the algorithm Perfect numbers step by step in the Dart programming language