How to resolve the algorithm Jaro-Winkler distance step by step in the Elm programming language
How to resolve the algorithm Jaro-Winkler distance step by step in the Elm programming language
Table of Contents
Problem Statement
The Jaro-Winkler distance is a metric for measuring the edit distance between words. It is similar to the more basic Levenstein distance but the Jaro distance also accounts for transpositions between letters in the words. With the Winkler modification to the Jaro metric, the Jaro-Winkler distance also adds an increase in similarity for words which start with the same letters (prefix). The Jaro-Winkler distance is a modification of the Jaro similarity metric, which measures the similarity between two strings. The Jaro similarity is 1.0 when strings are identical and 0 when strings have no letters in common. Distance measures such as the Jaro distance or Jaro-Winkler distance, on the other hand, are 0 when strings are identical and 1 when they have no letters in common. The Jaro similarity between two strings s1 and s2, simj, is defined as Where:
The Winkler modification to Jaro is to check for identical prefixes of the strings. If we define the number of initial (prefix) characters in common as: l = the length of a common prefix between strings, up to 4 characters and, additionally, select a multiplier (Winkler suggested 0.1) for the relative importance of the prefix for the word similarity: p = 0.1 The Jaro-Winkler similarity can then be defined as simw = simj + lp(1 - simj) Where: Winkler suggested this be 0.1.
The Jaro-Winkler distance between strings, which is 0.0 for identical strings, is then defined as dw = 1 - simw String metrics such as Jaro-Winkler distance are useful in applications such as spelling checkers, because letter transpositions are common typing errors and humans tend to misspell the middle portions of words more often than their beginnings. This may help a spelling checker program to generate better alternatives for misspelled word replacement. Using a dictionary of your choice and the following list of 9 commonly misspelled words: "accomodate", "definately", "goverment", "occured", "publically", "recieve", "seperate", "untill", "wich"
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Jaro-Winkler distance step by step in the Elm programming language
Source code in the elm programming language
module JaroWinkler exposing (similarity)
commonPrefixLength : List a -> List a -> Int -> Int
commonPrefixLength xs ys counter =
case ( xs, ys ) of
( x :: xs_, y :: ys_ ) ->
if x == y then
commonPrefixLength xs_ ys_ (counter + 1)
else
counter
_ ->
counter
similarity : String -> String -> Float
similarity s1 s2 =
let
chars1 =
String.toList s1
chars2 =
String.toList s2
jaroScore =
jaro chars1 chars2
l =
toFloat <| min (commonPrefixLength chars1 chars2 0) 4
p =
0.1
in
jaroScore + (l * p * (1.0 - jaroScore))
containtsInNextN : Int -> a -> List a -> Bool
containtsInNextN i a items =
case ( i, items ) of
( 0, _ ) ->
False
( _, [] ) ->
False
( _, item :: rest ) ->
if item == a then
True
else
containtsInNextN (i - 1) a rest
exists : Int -> Int -> List a -> a -> Bool
exists startAt endAt items i =
if endAt < startAt then
False
else if startAt == 0 then
case items of
first :: rest ->
if i == first then
True
else
exists 0 (endAt - 1) rest i
[] ->
False
else
exists 0 (endAt - startAt) (List.drop startAt items) i
existsInWindow : a -> List a -> Int -> Int -> Bool
existsInWindow item items offset radius =
let
startAt =
max 0 (offset - radius)
endAt =
min (offset + radius) (List.length items - 1)
in
exists startAt endAt items item
transpositions : List a -> List a -> Int -> Int
transpositions xs ys counter =
case ( xs, ys ) of
( [], _ ) ->
counter
( _, [] ) ->
counter
( x :: xs_, y :: ys_ ) ->
if x /= y then
transpositions xs_ ys_ (counter + 1)
else
transpositions xs_ ys_ counter
commonItems : List a -> List a -> Int -> List a
commonItems items1 items2 radius =
items1
|> List.indexedMap
(\index item ->
if existsInWindow item items2 index radius then
[ item ]
else
[]
)
|> List.concat
jaro : List Char -> List Char -> Float
jaro chars1 chars2 =
let
minLenth =
min (List.length chars1) (List.length chars2)
matchRadius =
minLenth // 2 + (minLenth |> modBy 2)
c1 =
commonItems chars1 chars2 matchRadius
c2 =
commonItems chars2 chars1 matchRadius
c1length =
toFloat (List.length c1)
c2length =
toFloat (List.length c2)
mismatches =
transpositions c1 c2 0
transpositionScore =
(toFloat mismatches + abs (c1length - c2length)) / 2.0
s1length =
toFloat (List.length chars1)
s2length =
toFloat (List.length chars2)
tLength =
max c1length c2length
result =
(c1length / s1length + c2length / s2length + (tLength - transpositionScore) / tLength) / 3.0
in
if isNaN result then
0.0
else
result
You may also check:How to resolve the algorithm Colour bars/Display step by step in the Raku programming language
You may also check:How to resolve the algorithm Generate random chess position step by step in the REXX programming language
You may also check:How to resolve the algorithm Palindrome detection step by step in the Mathematica/Wolfram Language programming language
You may also check:How to resolve the algorithm Elementary cellular automaton/Random number generator step by step in the Pascal programming language
You may also check:How to resolve the algorithm Forest fire step by step in the C programming language