How to resolve the algorithm Jaro similarity step by step in the jq programming language
How to resolve the algorithm Jaro similarity step by step in the jq programming language
Table of Contents
Problem Statement
The Jaro distance is a measure of edit distance between two strings; its inverse, called the Jaro similarity, is a measure of two strings' similarity: the higher the value, the more similar the strings are. The score is normalized such that 0 equates to no similarities and 1 is an exact match.
The Jaro similarity
d
j
{\displaystyle d_{j}}
of two given strings
s
1
{\displaystyle s_{1}}
and
s
2
{\displaystyle s_{2}}
is Where:
Two characters from
s
1
{\displaystyle s_{1}}
and
s
2
{\displaystyle s_{2}}
respectively, are considered matching only if they are the same and not farther apart than
⌊
max (
|
s
1
|
,
|
s
2
|
)
2
⌋
− 1
{\displaystyle \left\lfloor {\frac {\max(|s_{1}|,|s_{2}|)}{2}}\right\rfloor -1}
characters. Each character of
s
1
{\displaystyle s_{1}}
is compared with all its matching characters in
s
2
{\displaystyle s_{2}}
. Each difference in position is half a transposition; that is, the number of transpositions is half the number of characters which are common to the two strings but occupy different positions in each one.
Given the strings
s
1
{\displaystyle s_{1}}
DWAYNE and
s
2
{\displaystyle s_{2}}
DUANE we find:
We find a Jaro score of:
Implement the Jaro algorithm and show the similarity scores for each of the following pairs:
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Jaro similarity step by step in the jq programming language
Source code in the jq programming language
def jaro($s1; $s2):
($s1|length) as $le1
| ($s2|length) as $le2
| if $le1 == 0 and $le2 == 0 then 1
elif $le1 == 0 or $le2 == 0 then 0
else ((((if $le2 > $le1 then $le2 else $le1 end) / 2) | floor) - 1) as $dist
| {matches: 0, matches2: [], matches2: [], transpos: 0 }
| reduce range(0; $le1) as $i (.;
(($i - $dist) | if . < 0 then 0 else . end) as $start
| (($i + $dist + 1) | if . > $le2 then $le2 else . end) as $stop
| .k = $start
| until(.k >= $stop;
if (.matches2[.k] or $s1[$i:$i+1] != $s2[.k:.k+1])|not
then .matches1[$i] = true
| .matches2[.k] = true
| .matches += 1
| .k = $stop
else .k += 1
end) )
| if .matches == 0 then 0
else .k = 0
| reduce range(0; $le1) as $i (.;
if .matches1[$i]
then until(.k >= $le2 or .matches2[.k]; .k += 1)
| if .k < $le2 and ($s1[$i:$i+1] != $s2[.k:.k+1]) then .transpos += 1 else . end
| .k += 1
else .
end )
| .transpos /= 2
| (.matches/$le1 + .matches/$le2 + ((.matches - .transpos)/.matches)) / 3
end
end ;
def task:
[["MARTHA","MARHTA"],
["DIXON", "DICKSONX"],
["JELLYFISH","SMELLYFISH"],
["ABC","DEF"]][]
| (jaro(.[0]; .[1]) * 1000 | floor / 1000) as $d
| "jaro(\(.[0]); \(.[1])) => \($d)";
task
You may also check:How to resolve the algorithm Concurrent computing step by step in the Factor programming language
You may also check:How to resolve the algorithm Filter step by step in the jq programming language
You may also check:How to resolve the algorithm Pernicious numbers step by step in the Racket programming language
You may also check:How to resolve the algorithm Apply a callback to an array step by step in the Nial programming language
You may also check:How to resolve the algorithm Anagrams/Deranged anagrams step by step in the Picat programming language