How to resolve the algorithm Jaro similarity step by step in the Turbo-Basic XL programming language
How to resolve the algorithm Jaro similarity step by step in the Turbo-Basic XL programming language
Table of Contents
Problem Statement
The Jaro distance is a measure of edit distance between two strings; its inverse, called the Jaro similarity, is a measure of two strings' similarity: the higher the value, the more similar the strings are. The score is normalized such that 0 equates to no similarities and 1 is an exact match.
The Jaro similarity
d
j
{\displaystyle d_{j}}
of two given strings
s
1
{\displaystyle s_{1}}
and
s
2
{\displaystyle s_{2}}
is Where:
Two characters from
s
1
{\displaystyle s_{1}}
and
s
2
{\displaystyle s_{2}}
respectively, are considered matching only if they are the same and not farther apart than
⌊
max (
|
s
1
|
,
|
s
2
|
)
2
⌋
− 1
{\displaystyle \left\lfloor {\frac {\max(|s_{1}|,|s_{2}|)}{2}}\right\rfloor -1}
characters. Each character of
s
1
{\displaystyle s_{1}}
is compared with all its matching characters in
s
2
{\displaystyle s_{2}}
. Each difference in position is half a transposition; that is, the number of transpositions is half the number of characters which are common to the two strings but occupy different positions in each one.
Given the strings
s
1
{\displaystyle s_{1}}
DWAYNE and
s
2
{\displaystyle s_{2}}
DUANE we find:
We find a Jaro score of:
Implement the Jaro algorithm and show the similarity scores for each of the following pairs:
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Jaro similarity step by step in the Turbo-Basic XL programming language
Source code in the turbo-basic programming language
10 DIM Word_1$(20), Word_2$(20), Z$(20)
11 CLS
20 Word_1$="MARTHA" : Word_2$="MARHTA" : ? Word_1$;" - ";Word_2$ : EXEC _JWD_ : ?
30 Word_1$="DIXON" : Word_2$="DICKSONX" : ? Word_1$;" - ";Word_2$ : EXEC _JWD_ : ?
40 Word_1$="JELLYFISH" : Word_2$="SMELLYFISH" : ? Word_1$;" - ";Word_2$ : EXEC _JWD_ : ?
11000 END
12000 REM JaroWinklerDistance INPUT(Word_1$, Word_2$) USE(Z$, I, J, K, L, M, N, S1, S2, Min, Max) RETURN(FLOAT Result)
12000 PROC _JWD_
12010 Result=0 : S1=LEN(Word_1$) : S2=LEN(Word_2$)
12020 IF S1>S2 THEN Z$=Word_1$ : Word_1$=Word_2$ : Word_2$=Z$ : M=S1 : S1=S2 : S2=M
12030 J=1: M=0 : N=0 : L=INT(S2/2) : Z$=Word_2$
12040 FOR I=1 TO S1
12050 IF Word_1$(I,I)=Word_2$(J,J) THEN M=M+1: Word_2$(J,J)=" ": GO# JMP_JWD
12060 Max=1 : IF Max<(I-L) THEN Max=I-L
12070 Min=S2 : IF Min>(I+L-1) THEN Min=I+L-1
12080 FOR K=Max TO Min
12090 IF Word_1$(I,I)=Word_2$(K,K) THEN N=N+1: M=M+1: Word_2$(K,K)=" ": IF K>J THEN J=K
12100 NEXT K
12110 #JMP_JWD : IF J
12120 NEXT I
12130 IF M=0
12140 Result=0 : REM jaro distance
12150 ELSE
12160 N=INT(N/2)
12170 Result=(M/S1+M/S2+((M-N)/M))/3. : REM jaro distance
12180 ENDIF
12190 ? "Jaro Distance=";Result
12200 Min=S1 : IF Min>S2 THEN Min=S2
12210 M=Min : IF M>3 THEN M=3
12220 M=M+1 : L=0 : Word_2$=Z$ : IF M>Min THEN M=Min
12230 FOR I=1 TO M
12240 IF Word_1$(I,I)=Word_2$(I,I)
12250 L=L+1
12260 ELSE
12270 EXIT
12280 ENDIF
12290 NEXT I
12300 Result=Result + (L*0.1*(1.0 - Result)) : REM Winkler
12310 ? "Jaro Winkler Distance=";Result
12320 ENDPROC
You may also check:How to resolve the algorithm Remove duplicate elements step by step in the MATLAB programming language
You may also check:How to resolve the algorithm Averages/Arithmetic mean step by step in the OCaml programming language
You may also check:How to resolve the algorithm Anagrams/Deranged anagrams step by step in the EchoLisp programming language
You may also check:How to resolve the algorithm Damm algorithm step by step in the Nim programming language
You may also check:How to resolve the algorithm Fivenum step by step in the Perl programming language