Searching for Names with a 'sounds like' routine

Tony Marston - 24th March 2001

Have you ever had to search for a name on a database without knowing the exact spelling? Tricky, isn't it? This problem was solved many years ago with the creation of a 'sounds like' routine which takes a character string and converts it into something known as a SOUNDEX KEY. In essence this takes the sounds of certain characters and assigns them a number, with similar sounds having the same number. Thus a search on 'MARSTON' will include 'MARSDON' and 'MARSDEN' in the result.

The format of the Soundex Key is 'Xnnn' where:

The rules for converting characters into numbers are as follows:

I had a version of this routine added to my COBOL development environment way back in 1989, but here it is converted for Uniface:

   string  pi_Name         : IN
   string  po_SoundexKey   : OUT
   string  lv_LookUp, lv_Char
   numeric lv_Num, lv_PrevNum

; establish list of letters and corresponding numbers
; (those letters not in the list do not have numbers)
lv_LookUp = "B=1;F=1;P=1;V=1;C=2;G=2;J=2;K=2;Q=2;S=2;X=2;Z=2;D=3;T=3;L=4;M=5;N=5;R=6"

uppercase pi_Name,pi_Name              ; must be uppercase

po_SoundexKey = pi_Name[1:1]           ; move first character
pi_Name = pi_Name[2]                   ; drop first character

while (pi_Name != "")                  ; until all chars have been examined
   lv_Char = pi_Name[1:1]              ; extract next character
   pi_Name = pi_Name[2]                ; drop it from input string
   ; convert this character (if it is in the list) into a number
   getitem/id lv_Num, lv_Lookup,lv_Char
   if ($status > 0)                    ; character found
       if (lv_Num != lv_PrevNum)       ; ignore if same as previous number
           po_SoundexKey = "%%po_SoundexKey%%lv_Num"   ; append to output
           lv_PrevNum    = lv_Num                      ; save number
       length po_SoundexKey
       if ($result = 4) break          ; stop here

while ($result < 4)
   po_SoundexKey = "%%po_SoundexKey%%%0"   ; pad with zeros until length = 4
   length po_SoundexKey



The best way to use this routine is to include the soundex key on the database along with the name and make it an index. Not only is this shorter than the name string, but it allows the name to be stored in a mixture of upper and lower case, which is not usual for an indexed field. The search form must then be modified to convert the user's input into a soundex key so that the search can be performed on this key and not the original name.

Tony Marston
24th March 2001