String Similarity Algorithms Compared

Appaloosa Store
Apr 5, 2018 · 8 min read

Our Case

Applications in a store

Top 3 most downloaded apps
Top 3 most downloaded apps with a filter on string similarity

=> Apps ranked by number of downloads with a filter on different names.
=> Apps only ranked by number of downloads.

Research And Tests

Algorithms testing table
The result is 1/7 = 14%
The result is now 2/9 = 22%
The result is 7/11 = 63%
Jaro distance formula
m = 6
t = 2/2 =1 (2 couples of non matching characters, the 4-th and 5-th) { t/h ; h/t }
|s1| = 6
|s2| = 6
dj = (⅓) ( 6/6 + 6/6 + (6–1)/6) = ⅓ 17/6 = 0,944Jaro distance = 94,4%
Jaro-Winkler distance formula
dw = 0,944 + ( (0,1*3)(1–0,944)) = 0,944 + 0,3*0,056 = 0,961Jaro-Winkler distance = 96,1%

The metric


Appaloosa Store

Written by

Simple & Secure Enterprise App Store @ www.appaloosa.io

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade