The Soundex Algorithm

Luke Otwell
4 min readOct 4, 2021

Searching like it’s 1925

The Soundex algorithm is a ubiquitous piece of code that if you haven’t dealt with directly, you’ve certainly used some descendant of it. The main purpose of the Soundex algorithm is to break down words (in English,) to their phonetic core in order to compare them, and more importantly, match them, with less specific criteria than perfect spelling, allowing for words to be compared phonetically, or, how the sound.

Have you ever wondered how the auto-correct on your phone or desktop computer is able to not only know when things are spelled wrong, but also what word you are trying to spell in it’s place? Well we have the Soundex algorithm & one of it’s many offshoots to credit for that (and for making us the laziest spellers so far in history.)

The Soundex algorithm works by breaking down a word to it’s core in a standardized way. The code always has the letter that the word begins with, and after only contains 3 numbers. The numbers are assigned to groups based on rules. For instance every vowel gets assigned as a 0, and usually deleted unless it is at the end. Also, all duplicates are removed as they are virtually redundant and the same phonetic sound roughly be produced either way. In a way, it is almost a way of compressing the data, though it would be impossible to try and reinterpret a Soundex back into it’s original form 100% of the time.

The most remarkable thing about this algorithm and why I wanted to look into more myself was the fact the it was invented before computers as we now know them were! Robert Russel and Margaret Odell first patented their Soundex code in 1918. A variation was used to help document past US censuses from 1890 to 1920. It became more popular in the 60s once popular (relatively speaking) computing publications started talking about it. Since then it’s been re-adapted, and added on to, but the core Soundex algorithm has had a big influence on phonetic algorithms.

Taken from a free Stanford lesson on Soundex

Here is the Soundex rule in JavaScript.

Soundex in JavaScript

The biggest help, and possibly part of the need for the Soundex algorithm is the documentation and comparison to family names. If we have all heard the story before, “Well, back in Sweden it was Olsson (as I’m typing this my auto-correct is even telling me to spell it different, ) but once my great-great-grandpa got here to Ellis Island, the immigration officer wrote Olson.” Or take into a time not so long ago, when many poorer people were not even educated enough to spell their own name.

The Soundex algorithm allowed the government to at least try and keep a better genealogical roadmap of their population and make it possible for curious persons to track down people from the past without incorrect spelling getting on the way.

Comparison of Soundex values on similar names.

Of course the Soundex algorithm has limitations and isn’t meant to be used for everything. A phonetic algorithm has a specific place in dealing with data. But even modern database programs like SQL come with a similar algorithm which you can call upon to find records.

The big take away for me was getting to see how algorithms have been around longer than the machines we implement them with have. There is a line of logic that exists outside of the context of computing that is the basis of all the algorithms we use. Computers and programming are just tools to perform this logic repetitively and much, much, quicker. When you are starting off learning to program, it’s really easy to lump algorithms and computing in the same category. Sometimes, we may even know what to do in order to solve a problem logically, but just don’t have the skill sets or fluency in programming to get it done yet. When you can break things down outside of the context of programming languages, the answer might be easier than you would assume. After all, this Soundex algorithm that has stood the test of time & is not even 30 lines of code!

--

--