How to resolve the algorithm Jaro similarity step by step in the PARI/GP programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm Jaro similarity step by step in the PARI/GP programming language

Table of Contents

Problem Statement

The Jaro distance is a measure of edit distance between two strings; its inverse, called the Jaro similarity, is a measure of two strings' similarity: the higher the value, the more similar the strings are. The score is normalized such that   0   equates to no similarities and   1   is an exact match.

The Jaro similarity

d

j

{\displaystyle d_{j}}

of two given strings

s

1

{\displaystyle s_{1}}

and

s

2

{\displaystyle s_{2}}

is Where:

Two characters from

s

1

{\displaystyle s_{1}}

and

s

2

{\displaystyle s_{2}}

respectively, are considered matching only if they are the same and not farther apart than

max (

|

s

1

|

,

|

s

2

|

)

2

− 1

{\displaystyle \left\lfloor {\frac {\max(|s_{1}|,|s_{2}|)}{2}}\right\rfloor -1}

characters. Each character of

s

1

{\displaystyle s_{1}}

is compared with all its matching characters in

s

2

{\displaystyle s_{2}}

. Each difference in position is half a transposition; that is, the number of transpositions is half the number of characters which are common to the two strings but occupy different positions in each one.

Given the strings

s

1

{\displaystyle s_{1}}

DWAYNE   and

s

2

{\displaystyle s_{2}}

DUANE   we find:

We find a Jaro score of:

Implement the Jaro algorithm and show the similarity scores for each of the following pairs:

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Jaro similarity step by step in the PARI/GP programming language

Source code in the pari/gp programming language

\\Jaro distance between 2 strings s1 and s2.
\\ 4/12/16 aev
jaroDist(s1,s2)={
my(vt1=Vecsmall(s1),vt2=Vecsmall(s2),n1=#s1,n2=#s2,d,
   md=max(n1,n2)\2-1,cs,ce,mc=0,tr=0,k=1,ds,
   s1m=vector(n1,z,0),s2m=vector(n2,z,0));
if(!n1||!n2, return(0));
for(i=1,n1,
  cs=max(1,i-md);
  ce=min(i+md+1,n2);
  for(j=cs,ce,
    if(s2m[j],next);
    if(vt1[i]!=vt2[j], next);
    mc++; s1m[i]=1; s2m[j]=1; break;
  );\\fend j
);\\fend i
if(!mc, return(0));
for(i=1,n1,
  if(!s1m[i], next);
  while(!s2m[k], k++);
  if(vt1[i]!=vt2[k], tr++);
  k++
);\\fend i
d=(mc/n1+mc/n2+(mc-tr/2)/mc)/3.0;
ds=Strprintf("%.5f",d);
print(" *** Jaro distance is: ",ds," for strings: ",s1,", ",s2);
return(d);
}

{ \\ Testing:
jaroDist("MARTHA","MARHTA"); 
jaroDist("DIXON","DICKSONX");
jaroDist("JELLYFISH","SMELLYFISH");
jaroDist("DWAYNE","DUANE");
}

  

You may also check:How to resolve the algorithm Secure temporary file step by step in the F# programming language
You may also check:How to resolve the algorithm Array concatenation step by step in the Gosu programming language
You may also check:How to resolve the algorithm Fibonacci sequence step by step in the Logo programming language
You may also check:How to resolve the algorithm Pentomino tiling step by step in the Raku programming language
You may also check:How to resolve the algorithm Generate random chess position step by step in the Scala programming language