How to resolve the algorithm Jaro similarity step by step in the CLU programming language
How to resolve the algorithm Jaro similarity step by step in the CLU programming language
Table of Contents
Problem Statement
The Jaro distance is a measure of edit distance between two strings; its inverse, called the Jaro similarity, is a measure of two strings' similarity: the higher the value, the more similar the strings are. The score is normalized such that 0 equates to no similarities and 1 is an exact match.
The Jaro similarity
d
j
{\displaystyle d_{j}}
of two given strings
s
1
{\displaystyle s_{1}}
and
s
2
{\displaystyle s_{2}}
is Where:
Two characters from
s
1
{\displaystyle s_{1}}
and
s
2
{\displaystyle s_{2}}
respectively, are considered matching only if they are the same and not farther apart than
⌊
max (
|
s
1
|
,
|
s
2
|
)
2
⌋
− 1
{\displaystyle \left\lfloor {\frac {\max(|s_{1}|,|s_{2}|)}{2}}\right\rfloor -1}
characters. Each character of
s
1
{\displaystyle s_{1}}
is compared with all its matching characters in
s
2
{\displaystyle s_{2}}
. Each difference in position is half a transposition; that is, the number of transpositions is half the number of characters which are common to the two strings but occupy different positions in each one.
Given the strings
s
1
{\displaystyle s_{1}}
DWAYNE and
s
2
{\displaystyle s_{2}}
DUANE we find:
We find a Jaro score of:
Implement the Jaro algorithm and show the similarity scores for each of the following pairs:
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Jaro similarity step by step in the CLU programming language
Source code in the clu programming language
max = proc [T: type] (a, b: T) returns (T)
where T has lt: proctype (T,T) returns (bool)
if a
end max
min = proc [T: type] (a, b: T) returns (T)
where T has lt: proctype (T,T) returns (bool)
if a
end min
jaro = proc (s1, s2: string) returns (real)
s1_len: int := string$size(s1)
s2_len: int := string$size(s2)
if s1_len = 0 & s2_len = 0 then return(1.0)
elseif s1_len = 0 | s2_len = 0 then return(0.0)
end
dist: int := max[int](s1_len, s2_len)/2 - 1
s1_match: array[bool] := array[bool]$fill(1,s1_len,false)
s2_match: array[bool] := array[bool]$fill(1,s2_len,false)
matches: real := 0.0
transpositions: real := 0.0
for i: int in int$from_to(1, s1_len) do
start: int := max[int](1, i-dist)
end_: int := min[int](i+dist, s2_len)
for k: int in int$from_to(start, end_) do
if s2_match[k] then continue end
if s1[i] ~= s2[k] then continue end
s1_match[i] := true
s2_match[k] := true
matches := matches + 1.0
break
end
end
if matches=0.0 then return(0.0) end
k: int := 1
for i: int in int$from_to(1, s1_len) do
if ~s1_match[i] then continue end
while ~s2_match[k] do k := k + 1 end
if s1[i] ~= s2[k] then
transpositions := transpositions + 1.0
end
k := k+1
end
transpositions := transpositions / 2.0
return( ((matches / real$i2r(s1_len)) +
(matches / real$i2r(s2_len)) +
((matches - transpositions) / matches)) / 3.0)
end jaro
start_up = proc ()
po: stream := stream$primary_output()
stream$putl(po, f_form(jaro("MARTHA", "MARHTA"), 1, 6))
stream$putl(po, f_form(jaro("DIXON", "DICKSONX"), 1, 6))
stream$putl(po, f_form(jaro("JELLYFISH", "SMELLYFISH"), 1, 6))
end start_up
You may also check:How to resolve the algorithm Bulls and cows step by step in the Seed7 programming language
You may also check:How to resolve the algorithm Tree traversal step by step in the UNIX Shell programming language
You may also check:How to resolve the algorithm Fractal tree step by step in the Rust programming language
You may also check:How to resolve the algorithm Dynamic variable names step by step in the Z80 Assembly programming language
You may also check:How to resolve the algorithm Comments step by step in the Slate programming language