How to resolve the algorithm Soundex step by step in the MUMPS programming language
Published on 12 May 2024 09:40 PM
How to resolve the algorithm Soundex step by step in the MUMPS programming language
Table of Contents
Problem Statement
Soundex is an algorithm for creating indices for words based on their pronunciation.
The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling (from the soundex Wikipedia article). There is a major issue in many of the implementations concerning the separation of two consonants that have the same soundex code! According to the official Rules [[1]]. So check for instance if Ashcraft is coded to A-261.
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Soundex step by step in the MUMPS programming language
Source code in the mumps programming language
SOUNDEX(X,NARA=0)
;Converts a string to its Soundex value.
;Empty strings return "0000". Non-alphabetic ASCII characters are ignored.
;X is the name to be converted to Soundex
;NARA is a flag, defaulting to zero, for which implementation to perform.
;If NARA is 0, do what seems to be the Knuth implementation
;If NARA is a positive integer, do the NARA implementation.
; This varies the soundex rule for "W" and "H", and adds variants for prefixed names separated by carets.
; http://www.archives.gov/publications/general-info-leaflets/55-census.html
;Y is the string to be returned
;UP is the list of upper case letters
;LO is the list of lower case letters
;PREFIX is a list of prefixes to be stripped off
;X1 is the upper case version of X
;X2 is the name without a prefix
;Y2 is the soundex of a name without a prefix
;C is a loop variable
;DX is a list of Soundex values, in alphabetical order. Underscores are used for the NARA variation letters
;XD is a partially processed translation of X into soundex values
NEW Y,UP,LO,PREFIX,X1,X2,Y2,C,DX,XD
SET UP="ABCDEFGHIJKLMNOPQRSTUVWXYZ" ;Upper case characters
SET LO="abcdefghijklmnopqrstuvwxyz" ;Lower case characters
SET DX=" 123 12_ 22455 12623 1_2 2" ;Soundex values
SET PREFIX="VAN^CO^DE^LA^LE" ;Prefixes that could create an alternate soundex value
SET Y="" ;Y is the value to be returned
SET X1=$TRANSLATE(X,LO,UP) ;Make local copy, and force all letters to be upper case
SET XD=$TRANSLATE(X1,UP,DX) ;Soundex values for string
;
SET Y=$EXTRACT(X1,1,1) ;Get first character
FOR C=2:1:$LENGTH(X1) QUIT:$L(Y)>=4 DO
. ;ignore doubled letters OR and side-by-side soundex values OR same soundex on either side of "H" or "W"
. QUIT:($EXTRACT(X1,C,C)=$EXTRACT(X1,C-1,C-1))
. QUIT:($EXTRACT(XD,C,C)=$EXTRACT(XD,C-1,C-1))
. ;ignore non-alphabetic characters
. QUIT:UP'[($EXTRACT(X1,C,C))
. QUIT:NARA&(($EXTRACT(XD,C-1,C-1)="_")&(C>2))&($EXTRACT(XD,C,C)=$EXTRACT(XD,C-2,C-2))
. QUIT:" _"[$EXTRACT(XD,C,C)
. SET Y=Y_$EXTRACT(XD,C,C)
; Pad with "0" so string length is 4
IF $LENGTH(Y)<4 FOR C=$L(Y):1:3 SET Y=Y_"0"
IF NARA DO
. FOR C=1:1:$LENGTH(PREFIX,"^") DO
. . IF $EXTRACT(X1,1,$LENGTH($PIECE(PREFIX,"^",C)))=$PIECE(PREFIX,"^",C) DO
. . . ;Take off the prefix, and any leading spaces
. . . SET X2=$EXTRACT(X1,$LENGTH($PIECE(PREFIX,"^",C))+1,$LENGTH(X1)-$PIECE(PREFIX,"^",C)) FOR QUIT:UP[$E(X2,1,1) SET X2=$E(X2,2,$L(X2))
. . . SET Y2=$$SOUNDEX(X2,NARA) SET Y=Y_"^"_Y2
KILL UP,LO,PREFIX,X1,X2,Y2,C,DX,XD
QUIT Y
You may also check:How to resolve the algorithm Hofstadter Q sequence step by step in the Haskell programming language
You may also check:How to resolve the algorithm Globally replace text in several files step by step in the jq programming language
You may also check:How to resolve the algorithm Mertens function step by step in the BCPL programming language
You may also check:How to resolve the algorithm Display a linear combination step by step in the Wren programming language
You may also check:How to resolve the algorithm Gray code step by step in the Amazing Hopper programming language