Sort By Minimum Edit Distance / Levenshtein distance in PostGRE SQL and ASP.Net Core for Better Suggestions on Search

Abrar Jahin
6 min readOct 16, 2021

--

I am not familiar with writing kinds of stuff, so I have never tried any kind of blogging before. Then I thought, it would be better if I can share my experience with others so that others can know about them and can share their ideas about those problems and can share their thoughts if there is any better solution in there. These kinds of thoughts inspired me to write down some cherry-picks from my experiences. In this blogging, I like to share some of my different problems with the solution I have faced during different problem solving, development and deep learning model creation and training.

ASP.Net Core with PostGRE deployed in Ubuntu

As a first post, I like to share a well known dynamic problem technique used in the database which can improve the user experience for sorting texts in a database query.

Background

As a part of my current research project, I have to build a data entry system that can take acyclic Graph Data with multiple children and parents and store it in a Database (application can be found in this URL, currently in alpha version). The web application is build using this boilerplate -AbrarJahin/Asp.NetCore_3.1-PostGRE_Role-Claim_Management (github.com). This web application is using ASP.Net Core 3.1 and PostGRE 12 as a Database server and is deployed in a Ubuntu Ubuntu 20.04.3 LTS server.

Problem

In the application, there is an advanced search module (search can be tried from here) which is used for finding a `term`/text from a column of a table in PostGRE Database. For which Like query is used roughly in this way (made simple query for better understanding)-

SELECT <column_list> FROM <table_name> WHERE column_name LIKE ‘%<search_string>%’;

This query is providing data in random order, but we need data with specific ordering which is helpful for users to find his/her required data. So, I have added Order By clause in with the EF Core for different columns which generate SQL query roughly like this (made simple query for better understanding)-

SELECT <column_list> FROM <table_name> WHERE column_name LIKE ‘%<search_string>%’ ORDER BY <column_name> <asc/desc>;

For my use case, I am getting column name and order from user’s interaction from this table-

ASP.Net Core Datatable exaple with PostGRE
Datatable With ASP.Net Core PostGRE

where red rounds are button for changing ordering and column of sorting order.

Now the main problem appears, when the user search for a term/text and data is sorted depending on text ascending/descending and there are many results (like 150) and my desired text comes at any middle position, then it is annoying for the users to scroll/paginate a couple of times to find the actual text he is searching for. More problematic is sometimes he is scrolling a couple of times and he finds that the data is not there. For example, if he is searching for ‘statistics’ which term/text is not available there. When he/she need to scroll and search for the term from the results where the term is not in there.

Failure Search Example

And this is annoying for the user, right? So we need a better sorting algorithm rather than only text sorting. One of the best ways if we can use Minimum Edit Distance Dynamic Programming for sorting this data (a good explanation of programming minimum edit distance in DP can be found here). So, we will compare each text with user input and find the minimum edit distance between those 2 texts. Then we will sort the data depending on the edit distance rather than text.

There is another concern is the sorting should be done on the Database Server (for our case it is PostGRE) otherwise what we need to do is pull all data to the ASP.Net Core server, then sort all data which is very bad in terms of performance. So, we should check if there is any way of using a custom compare function for sorting in SQL.

Solution

Fortunately, there is some solution from PostGRE 9.6 and upward versions. There is support for crypto operations in the PostGRE server (more can be found here). For providing cryptographic support in PostGRE, there are some additional supports where there is support for the things we need, pgtrgm (more can be found here). So, I have to find support for pgtrgm in ASP.Net Core to use that in my project. But for that, there is also some complexity. I am describing the complexities step by step in here-

These are changes needed in the ASP.net Core application-

  1. Add pgtrgm support in ASP.Net Core. For that there is a nuget package which need to be installed. The package can be found in here. I have to install a compatible version of the plugin. As I was using ASP.Net Core 3.1, so, I have to install Npgsql.EntityFrameworkCore.PostgreSQL.Trigrams version 3.1.18 from Nuget Package Manager.
  2. Then we need to configure the plugin with DB Conext. For that, we need to edit the connection configuration in ConfigureServices of Startup.cs file from this-
public void ConfigureServices(IServiceCollection services){    ..............    ..............    services.AddDbContext<ApplicationDbContext>(options => {options.UseNpgsql(Configuration.GetConnectionString("DefaultConnection"));    });    ..............    ..............}

to this-

public void ConfigureServices(IServiceCollection services){............................services.AddDbContext<ApplicationDbContext>(options => {options.UseNpgsql(Configuration.GetConnectionString("DefaultConnection"), option => option.UseTrigrams());});............................}

3. Then calling the Trigram from code where sorting was previously like this-

var datatableShowableData = _context.Terms.OrderBy(c => c.Name);

And I need to change using like Trigram this-

string searchText = "Anything";
var datatableShowableData = _context.Terms.OrderBy(c => EF.Functions.TrigramsSimilarityDistance(c.Name, searchText));

More available functions for Trigram can be found in here.

Then we need to configure the server and database, this are the changes needed in Server and Database-

  1. Install the plugin in ubuntu using this command sudo apt install postgresql-contrib .
  2. Then enable the plugin for the PostGRE Database. For that, we need to run these commands in my ubuntu server-
#1. Install the PostGRE plugin in the Ubuntu Server
sudo apt install postgresql-contrib
#2. Enable the plugin for the Database- #a. Login as DB super admin
sudo su - postgres
#b. Select the Database
psql -d <db_name>
#c. Enable the Plugin
CREATE EXTENSION pg_trgm;
Running the commands in server

A demo image is added here. As for my server, the configurations are already done, I am seeing like this. But for your cases, the output will show little different things. So after configuring that, search results are sorted like this-

So the data are sorted with Levenshtein Distance and the performance is also great (you can check the demo performance from here which is a 1GHz pc with 1GB ram and data is over 66k).

This is a rough comparison for around 66k data row-

Performance in terms of request time

The red marked ones are for Levenshtein sort and others are for just text sort. This is not a real comparison of performance, but a rough estimation. In this example, there are other factors that exist like server configurations, network speed etc.
Hope this experience would help some developers. Please let me know what do you think about this article and if you know a better way, please share in the comment section. And please let me know what parts I should have improve. Thanks a lot for your time reading this article.

--

--