How to resolve the algorithm Jaro similarity step by step in the Clojure programming language
How to resolve the algorithm Jaro similarity step by step in the Clojure programming language
Table of Contents
Problem Statement
The Jaro distance is a measure of edit distance between two strings; its inverse, called the Jaro similarity, is a measure of two strings' similarity: the higher the value, the more similar the strings are. The score is normalized such that 0 equates to no similarities and 1 is an exact match.
The Jaro similarity
d
j
{\displaystyle d_{j}}
of two given strings
s
1
{\displaystyle s_{1}}
and
s
2
{\displaystyle s_{2}}
is Where:
Two characters from
s
1
{\displaystyle s_{1}}
and
s
2
{\displaystyle s_{2}}
respectively, are considered matching only if they are the same and not farther apart than
⌊
max (
|
s
1
|
,
|
s
2
|
)
2
⌋
− 1
{\displaystyle \left\lfloor {\frac {\max(|s_{1}|,|s_{2}|)}{2}}\right\rfloor -1}
characters. Each character of
s
1
{\displaystyle s_{1}}
is compared with all its matching characters in
s
2
{\displaystyle s_{2}}
. Each difference in position is half a transposition; that is, the number of transpositions is half the number of characters which are common to the two strings but occupy different positions in each one.
Given the strings
s
1
{\displaystyle s_{1}}
DWAYNE and
s
2
{\displaystyle s_{2}}
DUANE we find:
We find a Jaro score of:
Implement the Jaro algorithm and show the similarity scores for each of the following pairs:
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Jaro similarity step by step in the Clojure programming language
Source code in the clojure programming language
(ns test-project-intellij.core
(:gen-class))
(defn find-matches [s t]
" find match locations in the two strings "
" s_matches is set to true wherever there is a match in t and t_matches is set conversely "
(let [s_len (count s)
t_len (count t)
match_distance (int (- (/ (max s_len t_len) 2) 1))
matches 0
transpositions 0
fn-start (fn [i] (max 0 (- i match_distance))) ; function to compute starting position
fn-end (fn [i] (min (+ i match_distance 1) (- t_len 1))) ] ; function to compute end position
(loop [i 0
start (fn-start i)
end (fn-end i)
k start
s_matches (vec (repeat (count s) false))
t_matches (vec (repeat (count t) false))
matches 0]
(if (< i s_len)
(if (<= k end)
(if (get t_matches k)
; continue with next k
(recur i start end (inc k) s_matches t_matches matches)
(if (= (get s i) (get t k))
; match a position, so update matches, s_matches, t_matches to reflect match
(recur (inc i) (fn-start (inc i)) (fn-end (inc i)) (fn-start (inc i)) (assoc s_matches i true) (assoc t_matches k true) (inc matches))
; no match so try next k
(recur i start end (inc k) s_matches t_matches matches)))
; End of k iterations, so increment i and set k to start based upon i
(recur (inc i) (fn-start (inc i)) (fn-end (inc i)) (fn-start (inc i)) s_matches t_matches matches))
; End of i iterations
[matches s_matches t_matches]))))
(defn count-transpositions [s t s_matches t_matches]
" Utility function to count the number of transpositions "
(let [s_len (count s)]
(loop [i 0
k 0
transpositions 0]
(if (< i s_len)
; still elements in s (since i < s_len)
(if (not (get s_matches i nil))
; skip to next i since there are no matches in s
(recur (inc i) k transpositions)
; checking for match in t
(if (not (get t_matches k nil))
; keeping looping around as long as there are no matches in t
(recur i (inc k) transpositions)
(if (not= (get s i) (get t k))
; increment transposition count (if strings don't equal at match location)
(recur (inc i) (inc k) (inc transpositions))
; was a match, so advance i and k without increasing transpositions count
(recur (inc i) (inc k) transpositions))))
; Return count
transpositions))))
(defn jaro [s t]
" Main Jaro Distance routine"
(if (= s t)
1
(let [[matches s_matches t_matches] (find-matches s t)]
(if (= 0 matches)
0
(let [s_len (count s)
t_len (count t)
transpositions (count-transpositions s t s_matches t_matches)]
(float (/ (+ (/ matches s_len) (/ matches t_len) (/ (- matches (/ transpositions 2)) matches)) 3)))))))
(println (jaro "MARTHA" "MARHTA"))
(println (jaro "DIXON" "DICKSONX"))
(println (jaro "JELLYFISH" "SMELLYFISH"))
You may also check:How to resolve the algorithm Compare length of two strings step by step in the Racket programming language
You may also check:How to resolve the algorithm Undefined values step by step in the Java programming language
You may also check:How to resolve the algorithm Totient function step by step in the RPL programming language
You may also check:How to resolve the algorithm Arithmetic/Complex step by step in the Futhark programming language
You may also check:How to resolve the algorithm Substring step by step in the Component Pascal programming language