How to resolve the algorithm Jaro similarity step by step in the Emacs Lisp programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm Jaro similarity step by step in the Emacs Lisp programming language

Table of Contents

Problem Statement

The Jaro distance is a measure of edit distance between two strings; its inverse, called the Jaro similarity, is a measure of two strings' similarity: the higher the value, the more similar the strings are. The score is normalized such that   0   equates to no similarities and   1   is an exact match.

The Jaro similarity

d

j

{\displaystyle d_{j}}

of two given strings

s

1

{\displaystyle s_{1}}

and

s

2

{\displaystyle s_{2}}

is Where:

Two characters from

s

1

{\displaystyle s_{1}}

and

s

2

{\displaystyle s_{2}}

respectively, are considered matching only if they are the same and not farther apart than

max (

|

s

1

|

,

|

s

2

|

)

2

− 1

{\displaystyle \left\lfloor {\frac {\max(|s_{1}|,|s_{2}|)}{2}}\right\rfloor -1}

characters. Each character of

s

1

{\displaystyle s_{1}}

is compared with all its matching characters in

s

2

{\displaystyle s_{2}}

. Each difference in position is half a transposition; that is, the number of transpositions is half the number of characters which are common to the two strings but occupy different positions in each one.

Given the strings

s

1

{\displaystyle s_{1}}

DWAYNE   and

s

2

{\displaystyle s_{2}}

DUANE   we find:

We find a Jaro score of:

Implement the Jaro algorithm and show the similarity scores for each of the following pairs:

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Jaro similarity step by step in the Emacs Lisp programming language

Source code in the emacs programming language

(let ()
  (defun jaro (s1 s2)
    (let (mw mflags1 mflags2 fn-reset-mflags fn-reset-all-mflags fn-cnt-trans)
      (setq mflags1 (make-vector (length s1) nil))
      (setq mflags2 (make-vector (length s2) nil))
      (setq mw (1- (/ (max (length s1) (length s2)) 2)))
      (setq fn-reset-mflags
	    (lambda (idx)
	      (let ((start (max 0 (- idx mw)))
		    (end (min (1- (length s2)) (+ idx mw))))
		(cl-loop for i from start to end do
			 (when (and (not (elt mflags1 idx))
				    (not (elt mflags2 i)))
			   (when (equal (elt s1 idx) (elt s2 i))
			     (aset mflags1 idx 't)
			     (aset mflags2 i 't) ) ) ) ) ) )
      (setq fn-reset-all-mflags
	    (lambda ()
	      (dotimes (idx (length s1))
		(funcall fn-reset-mflags idx) ) ) )
      
      (setq fn-cnt-trans
	    (lambda ()
	      (let ((cur2 0) (transposition 0))
		(dotimes (cur1 (length s1))
		  (when (aref mflags1 cur1)
		    (while (not (aref mflags2 cur2))
		      (setq cur2 (1+ cur2)) )
		    (when (not (equal (aref s1 cur1)
				      (aref s2 cur2)))
		      (setq transposition (1+ transposition)) )
		    (setq cur2 (1+ cur2))
		    )
		  )
		transposition ) ) )
      
      (funcall fn-reset-all-mflags)
      (let ((m (seq-count (lambda (f) f) mflags1))
	    (tr (funcall fn-cnt-trans)))
	;;(message "matches: %s, transposition: %s, |s1|: %d |s2|: %d" m tr (length s1) (length s2))
	(if (= m 0)
	    0
	  (progn (/ (+ (/ (float m) (length s1)) (/ (float m) (length s2)) (/ (float (- m (/ (float tr) 2))) m) ) 3))
	  ) ) ) )

  (let ((params '(("MARTHA" "MARHTA")
		  ("DIXON" "DICKSONX")
		  ("JELLYFISH" "SMELLYFISH"))))
    (dolist (p params)
      (message "jaro(%s, %s) = %f"
	       (nth 0 p) (nth 1 p)
	       (jaro (nth 0 p) (nth 1 p)))
      )
    )
  )


  

You may also check:How to resolve the algorithm Handle a signal step by step in the Clojure programming language
You may also check:How to resolve the algorithm Case-sensitivity of identifiers step by step in the AutoHotkey programming language
You may also check:How to resolve the algorithm Address of a variable step by step in the Component Pascal programming language
You may also check:How to resolve the algorithm Set step by step in the Sidef programming language
You may also check:How to resolve the algorithm Reverse a string step by step in the GAP programming language