Natural Language Processing (Part 42)-Minimum edit distance

Coursesteach
5 min readMay 12, 2024

--

📚Chapter5: Autocorrect and Minimum Edit Distance

Introduction

Minimum Edit Distance has a wide variety of applications. It allows you to implement spelling correction, document similarity, machine translation, DNA sequencing and more. I will also show you the different types of edits.

Section 2- Usage of Minimum Edit Distance

You have already seen how to build autocorrect using edit distance. Let’s consider a slightly different problem.

  • What if you are given two words strings or even whole documents and you wanted to evaluate how similar they are.
  • Minimum edit distance can be used to do this. Given two strings, the minimum edit distance is the lowest number of operations needed to transform one string into the other. It has many applications. In an NLP for example, it could be used in spelling correction, document similarity and machine translation.
  • It can also be found in computational biology and DNA sequencing.

Section 3- Calculate Edit Distance

For calculating minimum Edit distance you will use three types of edits operations. All three operations that you already know inserts delete and replace for example to turn play and to stay what is the minimum number of credits required to make this happen?

Section 4- Replace P to S

Natural Language Processing with Probabilistic Models
Natural Language Processing with Probabilistic Models
Natural Language Processing with Probabilistic Models
Natural Language Processing with Probabilistic Models
Natural Language Processing with Probabilistic Models
Natural Language Processing with Probabilistic Models
Natural Language Processing with Probabilistic Models

So to turn p into s, you replace p with an s and to turn l into t you replace l with a t. Both a and a are the same. So do nothing and it’s the same with y and y.

So the total number of edits is 2. Until this point you’ve considered all edits operations to cost the same. That is a cost of one but now you will consider a different cost for each type of operation.

Natural Language Processing with Probabilistic Models

You will use these costs to calculate the edit distance which now represents the total edit costs. This is what you are trying to minimize. It is simply the sum of costs for the edits that were made. Inserts and deletes will have a cost of 1 and replace has a cost of 2. This makes intuitive sense if you think of replacement as it deletes followed by an insert. Calculate for this example. You have to replace edit at a cost of 2 each for a total added distance of 4. This is a relatively simple example and it was possible to find the minimum edit distance just by looking at it.

Natural Language Processing with Probabilistic Models

But imagine the number of operations between much longer strings and large couple of texts or even DNA strings. You can try and solve these problems by brute force adding one edit distance at a time and enumerating for all possibilities until one string changes to the other. But this could take a very, very long time. In fact by solving this way, the computational complexity increases exponentially as each string grows in length. A much faster way is by using a tabular approach, you will implement this next. The tabular approach I spoke about allows you to speed up the enumeration of all possible strings and edits. In doing so, you will also be learning about a new concept known as dynamic programming. We’ll see about this in the next blog

Please Follow and 👏 Clap for the story courses teach to see latest updates on this story

🚀 Elevate Your Data Skills with Coursesteach! 🚀

Ready to dive into Python, Machine Learning, Data Science, Statistics, Linear Algebra, Computer Vision, and Research? Coursesteach has you covered!

🔍 Python, 🤖 ML, 📊 Stats, ➕ Linear Algebra, 👁️‍🗨️ Computer Vision, 🔬 Research — all in one place!

Don’t Miss Out on This Exclusive Opportunity to Enhance Your Skill Set! Enroll Today 🌟 at

Natural Language Processing with Probabilistic models Course

Natural Language Processing course

🔍 Explore cutting-edge tools and Python libraries, access insightful slides and source code, and tap into a wealth of free online courses from top universities and organizations. Connect with like-minded individuals on Reddit, Facebook, and beyond, and stay updated with our YouTube channel and GitHub repository. Don’t wait — enroll now and unleash your NLP potential!”

Stay tuned for our upcoming articles where we will explore specific topics related to NLP in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and sharing with others!💻✌️

Note:if you are a NLP export and have some good suggestions to improve this blog to share, you write comments and contribute.

👉📚GitHub Repository

👉 📝Notebook

Ready to dive into data science and AI but unsure how to start? I’m here to help! Offering personalized research supervision and long-term mentoring. Let’s chat on Skype: themushtaq48 or email me at mushtaqmsit@gmail.com. Let’s kickstart your journey together!

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

To Do List

1- Collects Keys points from the blogs

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter

Source

In this blog post, we’ve gathered information from different places like the Natura Language Processing Course, research papers, technical blogs, official documents, YouTube videos, and others. We’ve given credit to each source under the related pictures, and we’ve included links to those sources

1- Natural Language Processing with Probabilistic Models (Coursera)

--

--