# String Similarity Algorithms Compared

How we customised mail messages to users by choosing and implementing the most appropriate algorithm.

## Our Case

The Context

3. Appaloosa Store’s Blog
4. Kindle
5. Amazon Kindle
1. Appaloosa Store’s Blog
2. Kindle
`=> Apps ranked by number of downloads with a filter on different names.`
`=> Apps only ranked by number of downloads.`

## Research And Tests

We tested several algorithms to compare strings, and selected the one that would better fit our need.

• the Levenshtein distance between “beauties” and “beautiful” is also 3
• For us, humans, the “beauties”/”beautiful” pair is much more similar than the “foo”/”bar” pair. But the Levenshtein distance is the same.
`The result is 1/7 = 14%`
`The result is now 2/9 = 22%`
`The result is 7/11 = 63%`
`m = 6t = 2/2 =1 (2 couples of non matching characters, the 4-th and 5-th) { t/h ; h/t }|s1| = 6|s2| = 6`
`dj = (⅓) ( 6/6 + 6/6 + (6–1)/6) = ⅓ 17/6 = 0,944Jaro distance = 94,4%`
`dw = 0,944 + ( (0,1*3)(1–0,944)) = 0,944 + 0,3*0,056 = 0,961Jaro-Winkler distance = 96,1%`