Comparing two strings in python with textdistance

Umberto Grando
2 min readFeb 2, 2022

--

Hello World!

Lately i’ve been playing a lot with string comparisons for a few customer projects. For this task i’ve been using the library textdistance which is a powerful python library for comparing distance between two or more sequences by many algorithms.

Comparing the distance between two strings can be very useful when you are doing tasks like a fuzzy search or estimating the precision of an algorithm. But doing this task by hand can be time consuming, so that’s where textdistance comes into play.

As usual you’ll need to install the library:

pip install textdistance

Let’s have a look at how it works:

In the example we are checking the distance/similarity between two strings “Hello World!” and “Hello Word!”. As you can see we are going to use the levehnstein algorithm in this example, but we could use any of the 30+ algorithm included in the library, by changing it in the code.

For example if you want to use a different type of algorithm (like something based on tokens or sequences) you can find the name on the table on github:

So we need to write:

algs.jaccard.distance(string_a, string_b)

Now let’s have a look at the methods we’ve used to check the distance. We have used 3 different methods:

  • Distance
  • Similarity
  • Normalized Similarity

The normalized similarity is really useful because we can use it to have the percentage of similarity from 0 to 1 for every algorithm available in the library without knowing how the algorithm works.

Textdistance is an awesome library for testing purposes, but i do not recommend to use it in a production environment (without installing the extras) because as you can see here it’s not the fastest library available to do this job.

That’s all for today, as usual you’ll find the code on my Github and if you’d like check out my personal website.

--

--

Umberto Grando

I’m Umberto Grando, an IoT Specialist with a passion for programming, gaming and technology in general.