How to resolve the algorithm Jaro similarity step by step in the Clojure programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm Jaro similarity step by step in the Clojure programming language

Table of Contents

Problem Statement

The Jaro distance is a measure of edit distance between two strings; its inverse, called the Jaro similarity, is a measure of two strings' similarity: the higher the value, the more similar the strings are. The score is normalized such that   0   equates to no similarities and   1   is an exact match.

The Jaro similarity

d

j

{\displaystyle d_{j}}

of two given strings

s

1

{\displaystyle s_{1}}

and

s

2

{\displaystyle s_{2}}

is Where:

Two characters from

s

1

{\displaystyle s_{1}}

and

s

2

{\displaystyle s_{2}}

respectively, are considered matching only if they are the same and not farther apart than

max (

|

s

1

|

,

|

s

2

|

)

2

− 1

{\displaystyle \left\lfloor {\frac {\max(|s_{1}|,|s_{2}|)}{2}}\right\rfloor -1}

characters. Each character of

s

1

{\displaystyle s_{1}}

is compared with all its matching characters in

s

2

{\displaystyle s_{2}}

. Each difference in position is half a transposition; that is, the number of transpositions is half the number of characters which are common to the two strings but occupy different positions in each one.

Given the strings

s

1

{\displaystyle s_{1}}

DWAYNE   and

s

2

{\displaystyle s_{2}}

DUANE   we find:

We find a Jaro score of:

Implement the Jaro algorithm and show the similarity scores for each of the following pairs:

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Jaro similarity step by step in the Clojure programming language

Source code in the clojure programming language

(ns test-project-intellij.core
  (:gen-class))

(defn find-matches [s t]
  " find match locations in the two strings "
  " s_matches is set to true wherever there is a match in t and t_matches is set conversely "
  (let [s_len (count s)
        t_len (count t)
        match_distance (int (- (/ (max s_len t_len) 2) 1))
        matches 0
        transpositions 0
        fn-start (fn [i] (max 0 (- i match_distance)))              ; function to compute starting position
        fn-end (fn [i] (min (+ i match_distance 1) (- t_len 1))) ]  ; function to compute end position
    (loop [i 0
           start (fn-start i)
           end (fn-end i)
           k start
           s_matches (vec (repeat (count s) false))
           t_matches (vec (repeat (count t) false))
           matches 0]

      (if (< i s_len)

        (if (<= k end)

          (if (get t_matches k)
            ; continue with next k
            (recur i start end (inc k) s_matches t_matches matches)

            (if (= (get s i) (get t k))
              ; match a position, so update matches, s_matches, t_matches to reflect match
              (recur (inc i) (fn-start (inc i)) (fn-end (inc i)) (fn-start (inc i)) (assoc s_matches i true) (assoc t_matches k true) (inc matches))
              ; no match so try next k
              (recur i start end (inc k) s_matches t_matches matches)))

          ; End of k iterations, so increment i and set k to start based upon i
          (recur (inc i) (fn-start (inc i)) (fn-end (inc i)) (fn-start (inc i)) s_matches t_matches matches))

        ; End of i iterations
        [matches s_matches t_matches]))))

(defn count-transpositions [s t s_matches t_matches]
  " Utility function to count the number of transpositions "
  (let [s_len (count s)]
    (loop [i 0
           k 0
           transpositions 0]

      (if (< i s_len)
        ; still elements in s (since i < s_len)
        (if (not (get s_matches i nil))
          ; skip to next i since there are no matches in s
          (recur (inc i) k transpositions)
          ; checking for match in t
          (if (not (get t_matches k nil))
            ; keeping looping around as long as there are no matches in t
            (recur i (inc k) transpositions)
            (if (not= (get s i) (get t k))
              ; increment transposition count (if strings don't equal at match location)
              (recur (inc i) (inc k) (inc transpositions))
              ; was a match, so advance i and k without increasing transpositions count
              (recur (inc i) (inc k) transpositions))))
        ; Return count
        transpositions))))

(defn jaro [s t]
  " Main Jaro Distance routine"
  (if (= s t)
    1
    (let [[matches s_matches t_matches]  (find-matches s t)]
      (if (= 0 matches)
        0
        (let [s_len (count s)
              t_len (count t)
              transpositions (count-transpositions s t s_matches t_matches)]
          (float (/ (+ (/ matches s_len) (/ matches t_len) (/ (- matches (/ transpositions 2)) matches)) 3)))))))


(println (jaro "MARTHA" "MARHTA"))
(println (jaro "DIXON" "DICKSONX"))
(println (jaro "JELLYFISH" "SMELLYFISH"))


  

You may also check:How to resolve the algorithm Compare length of two strings step by step in the Racket programming language
You may also check:How to resolve the algorithm Undefined values step by step in the Java programming language
You may also check:How to resolve the algorithm Totient function step by step in the RPL programming language
You may also check:How to resolve the algorithm Arithmetic/Complex step by step in the Futhark programming language
You may also check:How to resolve the algorithm Substring step by step in the Component Pascal programming language