How to resolve the algorithm Levenshtein distance step by step in the C++ programming language
How to resolve the algorithm Levenshtein distance step by step in the C++ programming language
Table of Contents
Problem Statement
In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e. an edit distance). The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.
The Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there isn't a way to do it with fewer than three edits:
The Levenshtein distance between "rosettacode", "raisethysword" is 8. The distance between two strings is same as that when both strings are reversed.
Implements a Levenshtein distance function, or uses a library function, to show the Levenshtein distance between "kitten" and "sitting".
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Levenshtein distance step by step in the C++ programming language
This code is a C++ implementation of the Levenshtein distance algorithm, which calculates the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another. The algorithm is named after Vladimir Levenshtein, who first published it in 1965.
The first code block is a naive implementation of the algorithm, which uses a dynamic programming approach to calculate the minimum number of edits required to transform one string into another. The algorithm starts by creating a matrix of size (m+1) x (n+1)
, where m
and n
are the lengths of the two input strings. The matrix is initialized with the number of insertions or deletions required to transform an empty string into the first or second string, respectively.
The algorithm then iterates over the two input strings, and for each character in the first string, it iterates over the characters in the second string. For each pair of characters, the algorithm calculates the minimum number of edits required to transform the substring of the first string up to the current character into the substring of the second string up to the current character. The minimum number of edits is calculated by taking the minimum of the following three values:
- The minimum number of edits required to transform the substring of the first string up to the previous character into the substring of the second string up to the current character, plus one (for an insertion)
- The minimum number of edits required to transform the substring of the first string up to the current character into the substring of the second string up to the previous character, plus one (for a deletion)
- The minimum number of edits required to transform the substring of the first string up to the previous character into the substring of the second string up to the previous character, plus one (for a substitution)
The algorithm stores the minimum number of edits required to transform each substring of the first string into each substring of the second string in the matrix. The final value in the matrix is the minimum number of edits required to transform the entire first string into the entire second string.
The second code block is a more efficient implementation of the Levenshtein distance algorithm, which uses a rolling array to calculate the minimum number of edits required to transform one string into another. The rolling array is a one-dimensional array of size n+1
, where n
is the length of the second string. The array stores the minimum number of edits required to transform the substring of the first string up to the current character into each substring of the second string.
The algorithm starts by initializing the rolling array with the number of insertions or deletions required to transform an empty string into the first or second string, respectively. The algorithm then iterates over the two input strings, and for each character in the first string, it updates the rolling array by calculating the minimum number of edits required to transform the substring of the first string up to the current character into each substring of the second string. The minimum number of edits is calculated in the same way as in the naive implementation.
The final value in the rolling array is the minimum number of edits required to transform the entire first string into the entire second string.
The Levenshtein distance algorithm is a versatile tool that can be used for a variety of tasks, including spell checking, text differencing, and machine translation.
Source code in the cpp programming language
#include <string>
#include <iostream>
using namespace std;
// Compute Levenshtein Distance
// Martin Ettl, 2012-10-05
size_t uiLevenshteinDistance(const string &s1, const string &s2)
{
const size_t
m(s1.size()),
n(s2.size());
if( m==0 ) return n;
if( n==0 ) return m;
// allocation below is not ISO-compliant,
// it won't work with -pedantic-errors.
size_t costs[n + 1];
for( size_t k=0; k<=n; k++ ) costs[k] = k;
size_t i { 0 };
for (char const &c1 : s1)
{
costs[0] = i+1;
size_t corner { i },
j { 0 };
for (char const &c2 : s2)
{
size_t upper { costs[j+1] };
if( c1 == c2 ) costs[j+1] = corner;
else {
size_t t(upper<corner? upper: corner);
costs[j+1] = (costs[j]<t?costs[j]:t)+1;
}
corner = upper;
j++;
}
i++;
}
return costs[n];
}
int main()
{
string s0 { "rosettacode" },
s1 { "raisethysword" };
cout << "distance between " << s0 << " and " << s1 << " : "
<< uiLevenshteinDistance(s0,s1) << endl;
return 0;
}
#include <algorithm>
#include <iostream>
#include <numeric>
#include <string>
#include <vector>
template <typename StringType>
size_t levenshtein_distance(const StringType& s1, const StringType& s2) {
const size_t m = s1.size();
const size_t n = s2.size();
if (m == 0)
return n;
if (n == 0)
return m;
std::vector<size_t> costs(n + 1);
std::iota(costs.begin(), costs.end(), 0);
size_t i = 0;
for (auto c1 : s1) {
costs[0] = i + 1;
size_t corner = i;
size_t j = 0;
for (auto c2 : s2) {
size_t upper = costs[j + 1];
costs[j + 1] = (c1 == c2) ? corner
: 1 + std::min(std::min(upper, corner), costs[j]);
corner = upper;
++j;
}
++i;
}
return costs[n];
}
int main() {
std::wstring s0 = L"rosettacode";
std::wstring s1 = L"raisethysword";
std::wcout << L"distance between " << s0 << L" and " << s1 << L": "
<< levenshtein_distance(s0, s1) << std::endl;
return 0;
}
You may also check:How to resolve the algorithm Loops/While step by step in the XLISP programming language
You may also check:How to resolve the algorithm Compile-time calculation step by step in the Ada programming language
You may also check:How to resolve the algorithm Average loop length step by step in the Liberty BASIC programming language
You may also check:How to resolve the algorithm Topological sort step by step in the M2000 Interpreter programming language
You may also check:How to resolve the algorithm Zeckendorf arithmetic step by step in the Perl programming language