Natural Language Processing (Part 44)-Minimum edit distance algorithm II

Coursesteach
7 min readJun 16, 2024

--

📚Chapter5: Autocorrect and Minimum Edit Distance

Sections

Minimum edit distance algorithm II
Minimum edit distance algorithm III

Section 1- Minimum edit distance algorithm II

In this blog, we will demonstrate how to populate the previously mentioned table, which is used to convert one word into another, using code. By the conclusion of this blog, you will be presented with the final product. Previously, an intuitive method was employed to complete the upper left corner of the minimum edit distance table. Now, let me demonstrate how to apply a formulaic method to complete the remainder.

Figure 1

So you filled out some of the table already and it looks like this. Now, to fill out the rest of the table,

Before you do anything else, make sure to fill in all the cells in the left column and top row. And if you want to turn a play into empty string, just delete all the letters. You can fill out these cells top to bottom by following this formula.

For each cell, look at the cell above and at the cost of an extra delete edit, which will be 1. So, basically, all you have to do to turn the string p into an empty string is to one delete operation. As demonstrated in the previous example, to convert the string containing ‘p, l’ into an empty string, one must delete ‘p’ and then delete ‘l’. These are two deletion operations, and so forth.

At D[4,0], the minimum edit distance for converting ‘play’ to an empty string is simply the cost of four deletions, which is 4.

Figure 2

The concept from the first row can be applied by transforming an empty string into ‘stay’ by inserting one letter sequentially. This can be achieved by using a slightly different formula. Working from left to right, observe the preceding cell and add an additional insert cost of one.

Figure 3

In the previous example, the method to calculate this cell without using formulas was demonstrated. However, the solution can also be determined by using big,scary-looking formula. It builds upon the computations you’ve already made in just the same way as using the no formula method.

The distance to the orange cell will be the shortest distance to reach it from any of the .At first glance, it may appear somewhat abstract, but it can be deconstructed into more manageable segments.

For instance, if you’re moving from the cell above, you would incur a deletion cost. just like you did in the first column.

If you move from the cell to the left, you will incur an insertion cost, similar to what was applied in the top row.

When moving from the cell to the upper left, you will take one of two actions. If the two letters, source i and target j, are different, you will add a replacement cost. If they match, you add nothing, as no edit is required for letters that are the same.

Figure 4

So here for this cell you have the minimum of 1 + 1 which is 2. Another 1 + 1 which is 2. Since these two letters don’t match, you have 0 plus 2 which is also 2. Then, take the minimum value from these three, which is 2 in this instance, and enter it into the cell. This is the minimum edit distance from ‘p’ to ‘s’ using the defined formula and costs. You can complete the rest of the table in the same manner.

Figure 5

The m, n entry in the bottom right corner represents the minimum edit distance from ‘play’ to ‘stay’, which is 4.

Figure 6

Color coding or a heat map can indeed reveal intriguing patterns. Observe the transition from the middle square. Correct, once the change from ‘p, l’ to ‘s, d’ is made, the suffixes of both words become identical, ‘a, y,’ eliminating the need for further edits. This is why the number 4 continues along the diagonal.

Figure 7

Section 2- Minimum edit distance algorithm III

This portion provides a comprehensive guide on the minimum edit distance and demonstrates how to reconstruct the sequence of edits made. A few quick notes on the minimum edit distance implementation you learned. Measuring the edit distance by using the three edits; insert, delete, and replace with costs 1, 1 and 2 respectively is known as levenshtein distance.

There are also well-known alternatives that differ in their rules. Determining the minimum edit distance is not always sufficient to address the entire problem. It is often necessary to understand the sequence of edits that led to this distance, which can be achieved by maintaining a backtrace. which is simply a pointer in each cell letting you know where you came from to get there so you know the path taken across the table from the top left corner to the bottom right corner. This tells you the edits taken and is particularly useful in problems dealing with string alignment.

Summary -Learning Objective

What is autocorrect
Building the model
Minimum edit distance
Minimum edit distance algorithm

Finally, the use of a tabular method for computation, as opposed to brute force, is a technique referred to as dynamic programming.. IIntuitively, this means solving the smallest subproblem first, then reusing that result to tackle the next larger subproblem, saving and reapplying that result repeatedly.

Over the past few lessons, you have learned about various real-world applications of NLP that are likely part of your daily usage. You have explored autocorrect, understanding its functionality and the magic behind it. You have also delved into the step-by-step process of constructing a functional autocorrect model utilizing edit distance and word probabilities. You have explored the issue of string similarity and the metric known as minimum edit distance. Ultimately, you discovered an effective method to resolve the minimum edit distance problem through a tabular algorithmic approach called dynamic programming.

Please Follow and 👏 Clap for the story courses teach to see latest updates on this story

🚀 Elevate Your Data Skills with Coursesteach! 🚀

Ready to dive into Python, Machine Learning, Data Science, Statistics, Linear Algebra, Computer Vision, and Research? Coursesteach has you covered!

🔍 Python, 🤖 ML, 📊 Stats, ➕ Linear Algebra, 👁️‍🗨️ Computer Vision, 🔬 Research — all in one place!

Don’t Miss Out on This Exclusive Opportunity to Enhance Your Skill Set! Enroll Today 🌟 at

Natural Language Processing with Probabilistic models Course

Natural Language Processing course

🔍 Explore cutting-edge tools and Python libraries, access insightful slides and source code, and tap into a wealth of free online courses from top universities and organizations. Connect with like-minded individuals on Reddit, Facebook, and beyond, and stay updated with our YouTube channel and GitHub repository. Don’t wait — enroll now and unleash your NLP potential!”

Stay tuned for our upcoming articles where we will explore specific topics related to NLP in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and sharing with others!💻✌️

Note:if you are a NLP export and have some good suggestions to improve this blog to share, you write comments and contribute.

👉📚GitHub Repository

👉 📝Notebook

Ready to dive into data science and AI but unsure how to start? I’m here to help! Offering personalized research supervision and long-term mentoring. Let’s chat on Skype: themushtaq48 or email me at mushtaqmsit@gmail.com. Let’s kickstart your journey together!

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

To Do List

1- Collects Keys points from the blogs

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter

Source

1- Natural Language Processing with Probabilistic Models (Coursera)

--

--