How to resolve the algorithm Jaro-Winkler distance step by step in the Elm programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm Jaro-Winkler distance step by step in the Elm programming language

Table of Contents

Problem Statement

The Jaro-Winkler distance is a metric for measuring the edit distance between words. It is similar to the more basic Levenstein distance but the Jaro distance also accounts for transpositions between letters in the words. With the Winkler modification to the Jaro metric, the Jaro-Winkler distance also adds an increase in similarity for words which start with the same letters (prefix). The Jaro-Winkler distance is a modification of the Jaro similarity metric, which measures the similarity between two strings. The Jaro similarity is 1.0 when strings are identical and 0 when strings have no letters in common. Distance measures such as the Jaro distance or Jaro-Winkler distance, on the other hand, are 0 when strings are identical and 1 when they have no letters in common. The Jaro similarity between two strings s1 and s2, simj, is defined as Where:

The Winkler modification to Jaro is to check for identical prefixes of the strings. If we define the number of initial (prefix) characters in common as: l = the length of a common prefix between strings, up to 4 characters and, additionally, select a multiplier (Winkler suggested 0.1) for the relative importance of the prefix for the word similarity: p   =   0.1 The Jaro-Winkler similarity can then be defined as simw = simj + lp(1 - simj) Where: Winkler suggested this be 0.1.

The Jaro-Winkler distance between strings, which is 0.0 for identical strings, is then defined as dw = 1 - simw String metrics such as Jaro-Winkler distance are useful in applications such as spelling checkers, because letter transpositions are common typing errors and humans tend to misspell the middle portions of words more often than their beginnings. This may help a spelling checker program to generate better alternatives for misspelled word replacement. Using a dictionary of your choice and the following list of 9 commonly misspelled words: "accomodate", "definately", "goverment​", "occured", "publically", "recieve​", "seperate", "untill", "wich​"

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Jaro-Winkler distance step by step in the Elm programming language

Source code in the elm programming language

module JaroWinkler exposing (similarity)


commonPrefixLength : List a -> List a -> Int -> Int
commonPrefixLength xs ys counter =
    case ( xs, ys ) of
        ( x :: xs_, y :: ys_ ) ->
            if x == y then
                commonPrefixLength xs_ ys_ (counter + 1)

            else
                counter

        _ ->
            counter

similarity : String -> String -> Float
similarity s1 s2 =
    let
        chars1 =
            String.toList s1

        chars2 =
            String.toList s2

        jaroScore =
            jaro chars1 chars2

        l =
            toFloat <| min (commonPrefixLength chars1 chars2 0) 4

        p =
            0.1
    in
    jaroScore + (l * p * (1.0 - jaroScore))


containtsInNextN : Int -> a -> List a -> Bool
containtsInNextN i a items =
    case ( i, items ) of
        ( 0, _ ) ->
            False

        ( _, [] ) ->
            False

        ( _, item :: rest ) ->
            if item == a then
                True

            else
                containtsInNextN (i - 1) a rest


exists : Int -> Int -> List a -> a -> Bool
exists startAt endAt items i =
    if endAt < startAt then
        False

    else if startAt == 0 then
        case items of
            first :: rest ->
                if i == first then
                    True

                else
                    exists 0 (endAt - 1) rest i

            [] ->
                False

    else
        exists 0 (endAt - startAt) (List.drop startAt items) i


existsInWindow : a -> List a -> Int -> Int -> Bool
existsInWindow item items offset radius =
    let
        startAt =
            max 0 (offset - radius)

        endAt =
            min (offset + radius) (List.length items - 1)
    in
    exists startAt endAt items item


transpositions : List a -> List a -> Int -> Int
transpositions xs ys counter =
    case ( xs, ys ) of
        ( [], _ ) ->
            counter

        ( _, [] ) ->
            counter

        ( x :: xs_, y :: ys_ ) ->
            if x /= y then
                transpositions xs_ ys_ (counter + 1)

            else
                transpositions xs_ ys_ counter


commonItems : List a -> List a -> Int -> List a
commonItems items1 items2 radius =
    items1
        |> List.indexedMap
            (\index item ->
                if existsInWindow item items2 index radius then
                    [ item ]

                else
                    []
            )
        |> List.concat


jaro : List Char -> List Char -> Float
jaro chars1 chars2 =
    let
        minLenth =
            min (List.length chars1) (List.length chars2)

        matchRadius =
            minLenth // 2 + (minLenth |> modBy 2)

        c1 =
            commonItems chars1 chars2 matchRadius

        c2 =
            commonItems chars2 chars1 matchRadius

        c1length =
            toFloat (List.length c1)

        c2length =
            toFloat (List.length c2)

        mismatches =
            transpositions c1 c2 0

        transpositionScore =
            (toFloat mismatches + abs (c1length - c2length)) / 2.0

        s1length =
            toFloat (List.length chars1)

        s2length =
            toFloat (List.length chars2)

        tLength =
            max c1length c2length

        result =
            (c1length / s1length + c2length / s2length + (tLength - transpositionScore) / tLength) / 3.0
    in
    if isNaN result then
        0.0

    else
        result


  

You may also check:How to resolve the algorithm Colour bars/Display step by step in the Raku programming language
You may also check:How to resolve the algorithm Generate random chess position step by step in the REXX programming language
You may also check:How to resolve the algorithm Palindrome detection step by step in the Mathematica/Wolfram Language programming language
You may also check:How to resolve the algorithm Elementary cellular automaton/Random number generator step by step in the Pascal programming language
You may also check:How to resolve the algorithm Forest fire step by step in the C programming language