How to resolve the algorithm Jaro similarity step by step in the jq programming language

Published on 12 May 2024 09:40 PM
#Jq

How to resolve the algorithm Jaro similarity step by step in the jq programming language

Table of Contents

Problem Statement

The Jaro distance is a measure of edit distance between two strings; its inverse, called the Jaro similarity, is a measure of two strings' similarity: the higher the value, the more similar the strings are. The score is normalized such that   0   equates to no similarities and   1   is an exact match.

The Jaro similarity

d

j

{\displaystyle d_{j}}

of two given strings

s

1

{\displaystyle s_{1}}

and

s

2

{\displaystyle s_{2}}

is Where:

Two characters from

s

1

{\displaystyle s_{1}}

and

s

2

{\displaystyle s_{2}}

respectively, are considered matching only if they are the same and not farther apart than

max (

|

s

1

|

,

|

s

2

|

)

2

− 1

{\displaystyle \left\lfloor {\frac {\max(|s_{1}|,|s_{2}|)}{2}}\right\rfloor -1}

characters. Each character of

s

1

{\displaystyle s_{1}}

is compared with all its matching characters in

s

2

{\displaystyle s_{2}}

. Each difference in position is half a transposition; that is, the number of transpositions is half the number of characters which are common to the two strings but occupy different positions in each one.

Given the strings

s

1

{\displaystyle s_{1}}

DWAYNE   and

s

2

{\displaystyle s_{2}}

DUANE   we find:

We find a Jaro score of:

Implement the Jaro algorithm and show the similarity scores for each of the following pairs:

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Jaro similarity step by step in the jq programming language

Source code in the jq programming language

def jaro($s1; $s2):
    ($s1|length) as $le1
    | ($s2|length) as $le2
    | if $le1 == 0 and $le2 == 0 then 1
      elif $le1 == 0  or $le2 == 0 then 0
      else ((((if $le2 > $le1 then $le2 else $le1 end) / 2) | floor) - 1) as $dist
      | {matches: 0, matches2: [], matches2: [], transpos: 0 }
      | reduce range(0; $le1) as $i (.;
            (($i - $dist)     | if . < 0    then 0    else . end) as $start
          | (($i + $dist + 1) | if . > $le2 then $le2 else . end) as $stop
          | .k = $start
	  | until(.k >= $stop;
              if (.matches2[.k] or $s1[$i:$i+1] != $s2[.k:.k+1])|not
              then .matches1[$i] = true
              | .matches2[.k] = true
              | .matches += 1
	      | .k = $stop
	      else .k += 1
	      end) )
      | if .matches == 0 then 0
        else .k = 0
        | reduce range(0; $le1) as $i (.;
            if .matches1[$i]
	    then until(.k >= $le2 or .matches2[.k]; .k += 1)
	    | if .k < $le2 and ($s1[$i:$i+1] != $s2[.k:.k+1]) then .transpos += 1 else . end
      	    | .k += 1
	    else .
	    end )
        | .transpos /= 2
        | (.matches/$le1 + .matches/$le2 + ((.matches - .transpos)/.matches)) / 3
        end
      end ;

def task:
  [["MARTHA","MARHTA"],
   ["DIXON", "DICKSONX"],
   ["JELLYFISH","SMELLYFISH"],
   ["ABC","DEF"]][]
  | (jaro(.[0]; .[1]) * 1000 | floor / 1000) as $d
  | "jaro(\(.[0]); \(.[1])) => \($d)";

task

  

You may also check:How to resolve the algorithm Concurrent computing step by step in the Factor programming language
You may also check:How to resolve the algorithm Filter step by step in the jq programming language
You may also check:How to resolve the algorithm Pernicious numbers step by step in the Racket programming language
You may also check:How to resolve the algorithm Apply a callback to an array step by step in the Nial programming language
You may also check:How to resolve the algorithm Anagrams/Deranged anagrams step by step in the Picat programming language