How to resolve the algorithm Jaro similarity step by step in the Java programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm Jaro similarity step by step in the Java programming language

Table of Contents

Problem Statement

The Jaro distance is a measure of edit distance between two strings; its inverse, called the Jaro similarity, is a measure of two strings' similarity: the higher the value, the more similar the strings are. The score is normalized such that   0   equates to no similarities and   1   is an exact match.

The Jaro similarity

d

j

{\displaystyle d_{j}}

of two given strings

s

1

{\displaystyle s_{1}}

and

s

2

{\displaystyle s_{2}}

is Where:

Two characters from

s

1

{\displaystyle s_{1}}

and

s

2

{\displaystyle s_{2}}

respectively, are considered matching only if they are the same and not farther apart than

max (

|

s

1

|

,

|

s

2

|

)

2

− 1

{\displaystyle \left\lfloor {\frac {\max(|s_{1}|,|s_{2}|)}{2}}\right\rfloor -1}

characters. Each character of

s

1

{\displaystyle s_{1}}

is compared with all its matching characters in

s

2

{\displaystyle s_{2}}

. Each difference in position is half a transposition; that is, the number of transpositions is half the number of characters which are common to the two strings but occupy different positions in each one.

Given the strings

s

1

{\displaystyle s_{1}}

DWAYNE   and

s

2

{\displaystyle s_{2}}

DUANE   we find:

We find a Jaro score of:

Implement the Jaro algorithm and show the similarity scores for each of the following pairs:

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Jaro similarity step by step in the Java programming language

The code snippet you provided is an implementation of the Jaro distance algorithm in Java.

The Jaro distance is a measure of similarity between two strings. It is similar to the Levenshtein distance, but it is less sensitive to transpositions.

The Jaro distance is calculated as follows:

  1. Find the number of matching characters between the two strings.
  2. Find the number of transpositions between the two strings.
  3. Calculate the Jaro distance as follows:
jaro = (matches / s_len) + (matches / t_len) + ((matches - transpositions/2) / matches) / 3

where:

  • s_len is the length of the first string
  • t_len is the length of the second string
  • matches is the number of matching characters
  • transpositions is the number of transpositions

The Jaro distance is a value between 0 and 1. A value of 0 indicates that the two strings are completely different, while a value of 1 indicates that the two strings are identical.

The following table shows the Jaro distance for some pairs of strings:

Strings Jaro distance
MARTHA, MARHTA 0.9444
DIXON, DICKSONX 0.8556
JELLYFISH, SMELLYFISH 0.4333

The Jaro distance can be used to find similar strings in a database, or to measure the similarity of two text documents.

Source code in the java programming language

public class JaroDistance {
    public static double jaro(String s, String t) {
        int s_len = s.length();
        int t_len = t.length();

        if (s_len == 0 && t_len == 0) return 1;

        int match_distance = Integer.max(s_len, t_len) / 2 - 1;

        boolean[] s_matches = new boolean[s_len];
        boolean[] t_matches = new boolean[t_len];

        int matches = 0;
        int transpositions = 0;

        for (int i = 0; i < s_len; i++) {
            int start = Integer.max(0, i-match_distance);
            int end = Integer.min(i+match_distance+1, t_len);

            for (int j = start; j < end; j++) {
                if (t_matches[j]) continue;
                if (s.charAt(i) != t.charAt(j)) continue;
                s_matches[i] = true;
                t_matches[j] = true;
                matches++;
                break;
            }
        }

        if (matches == 0) return 0;

        int k = 0;
        for (int i = 0; i < s_len; i++) {
            if (!s_matches[i]) continue;
            while (!t_matches[k]) k++;
            if (s.charAt(i) != t.charAt(k)) transpositions++;
            k++;
        }

        return (((double)matches / s_len) +
                ((double)matches / t_len) +
                (((double)matches - transpositions/2.0) / matches)) / 3.0;
    }

    public static void main(String[] args) {
        System.out.println(jaro(   "MARTHA",      "MARHTA"));
        System.out.println(jaro(    "DIXON",    "DICKSONX"));
        System.out.println(jaro("JELLYFISH",  "SMELLYFISH"));
    }
}


  

You may also check:How to resolve the algorithm Euler's sum of powers conjecture step by step in the Picat programming language
You may also check:How to resolve the algorithm Loops/Downward for step by step in the MAXScript programming language
You may also check:How to resolve the algorithm Prime decomposition step by step in the Scheme programming language
You may also check:How to resolve the algorithm Spelling of ordinal numbers step by step in the J programming language
You may also check:How to resolve the algorithm Currency step by step in the Bracmat programming language