# Google Summer of Code: OpenMF Week 4

Hey! I am back with another update on Google Summer of Code. In this blog, I am sharing my work which I did in the fourth week of the coding period. In case if you haven’t seen my proposal till now 🤔🤨 here is a link for that — My-GSoC-Proposal.

This week I worked on keyword search to get all the cases that have the searched keyword. Basically main purpose for this was to find a word from the database if the admin wants to search while analyzing the mobile forensic case. For this, I made two APIs that have some similar kind of function but have different use in their own place. One API search the keyword from the whole database so that analyst can relate one case with other cases and another API works within the case to get the files that have that keyword.

Is this a string matching problem? Yes, more or less I have to implement the same thing. Given are some of the algorithms which I read before implementing.

1. Rabin Karp algorithm → Time complexity: average Θ(n + m), worst Θ((n−m)m) and Space O(1)
2. KMP algorithm → Time complexity: Θ(n) and Space Θ(m)
3. Boyer Moore string search algorithm → Time complexity: best Ω(n/m), worst O(mn) and Space Θ(k)
4. Two-way string matching algorithm → Time complexity: O(n+m) and Space O(1)

where n → length of the searchable text and m → length of the pattern

But later I found from Python 3.10, the inbuilt algorithm was updated to use an enhanced version of Crochemore and Perrin’s Two-Way string matching algorithm which is the optimal and best way to do it. So I implemented my functions accordingly.

# Crochemore and Perrin’s Two-Way string matching algorithm 🤔

In Crochemore and Perrin’s Two-Way string matching algorithm, there is a smaller string (the “pattern” or “needle”) that is searched in a longer string (the “text” or “haystack”), determining whether the needle is a substring of the haystack, and if so, at what index(es). This algorithm runs in O(len(needle) + len(haystack)) time and with O(1) space. Here is the sample code in C.

`/* Computing of the maximal suffix for <= */int maxSuf(char *x, int m, int *p) {   int ms, j, k;   char a, b;   ms = -1;   j = 0;   k = *p = 1;   while (j + k < m) {      a = x[j + k];      b = x[ms + k];      if (a < b) {         j += k;         k = 1;         *p = j - ms;      }      else         if (a == b)            if (k != *p)               ++k;            else {               j += *p;               k = 1;            }         else { /* a > b */            ms = j;            j = ms + 1;            k = *p = 1;         }   }   return(ms);} /* Computing of the maximal suffix for >= */int maxSufTilde(char *x, int m, int *p) {   int ms, j, k;   char a, b;   ms = -1;   j = 0;   k = *p = 1;   while (j + k < m) {      a = x[j + k];      b = x[ms + k];      if (a > b) {         j += k;         k = 1;         *p = j - ms;      }      else         if (a == b)            if (k != *p)               ++k;            else {               j += *p;               k = 1;            }         else { /* a < b */            ms = j;            j = ms + 1;            k = *p = 1;         }   }   return(ms);}/* Two Way string matching algorithm. */void TW(char *x, int m, char *y, int n) {   int i, j, ell, memory, p, per, q;   /* Preprocessing */   i = maxSuf(x, m, &p);   j = maxSufTilde(x, m, &q);   if (i > j) {      ell = i;      per = p;   }   else {      ell = j;      per = q;   }   /* Searching */   if (memcmp(x, x + per, ell + 1) == 0) {      j = 0;      memory = -1;      while (j <= n - m) {         i = MAX(ell, memory) + 1;         while (i < m && x[i] == y[i + j])            ++i;         if (i >= m) {            i = ell;            while (i > memory && x[i] == y[i + j])               --i;            if (i <= memory)               OUTPUT(j);            j += per;            memory = m - per - 1;         }         else {            j += (i - ell);            memory = -1;         }      }   }   else {      per = MAX(ell + 1, m - ell - 1) + 1;      j = 0;      while (j <= n - m) {         i = ell + 1;         while (i < m && x[i] == y[i + j])            ++i;         if (i >= m) {            i = ell;            while (i >= 0 && x[i] == y[i + j])               --i;            if (i < 0)               OUTPUT(j);            j += per;         }         else            j += (i - ell);      }   }}/*If you want to read more about this algorithm check out this link here.*/`

In the coming week, I will be working more on the analytics part of this project and try to complete everything as per my timeline. I will keep updating you about my work through these blogs. So keep reading, sharing, and supporting. 😊

Till then if you haven’t read my previous blog here is a link for that: Google Summer of Code: OpenMF Week 3. 😊😀

See you soon!!😊

Github: swapnalshahil 👈

Instagram: eulersgamma 😃

--

--

--

SCoRe Lab

## Why These Programming Languages Are Being Hated By Programmers? ## Encode x Internet Computer: Intro to Building on the IC in Rust [Video + Slides] ## How to Create Drupal Image Gallery ## Deploy Flask server to an EC2 instance ## How do we evaluate research software to meet different requirements? ## Part-of-Speech Tag a String, Filter to Adverbs in Go ## How to build a CI CD pipeline in AWS ECR and AWS EKS using Github actions and ArgoCD for Node.js  ## Swapnal Shahil

GSoC’22 Mentor || GSoC’21 @SCoReLab || B.Tech at IIT Guwahati'23

## CS373 Spring 2022 Week 7: Avi Ghayalod ## Melody Walker’s Obituary ## CS371p Spring 2022: Winnie Chang 