SCoRe Lab
Published in

SCoRe Lab

Google Summer of Code: OpenMF Week 4

Google Summer of Code with SCoRe Lab: Week 4

Hey! I am back with another update on Google Summer of Code. In this blog, I am sharing my work which I did in the fourth week of the coding period. In case if you haven’t seen my proposal till now 🤔🤨 here is a link for that — My-GSoC-Proposal.

This week I worked on keyword search to get all the cases that have the searched keyword. Basically main purpose for this was to find a word from the database if the admin wants to search while analyzing the mobile forensic case. For this, I made two APIs that have some similar kind of function but have different use in their own place. One API search the keyword from the whole database so that analyst can relate one case with other cases and another API works within the case to get the files that have that keyword.

Is this a string matching problem? Yes, more or less I have to implement the same thing. Given are some of the algorithms which I read before implementing.

  1. Rabin Karp algorithm → Time complexity: average Θ(n + m), worst Θ((n−m)m) and Space O(1)
  2. KMP algorithm → Time complexity: Θ(n) and Space Θ(m)
  3. Boyer Moore string search algorithm → Time complexity: best Ω(n/m), worst O(mn) and Space Θ(k)
  4. Two-way string matching algorithm → Time complexity: O(n+m) and Space O(1)

where n → length of the searchable text and m → length of the pattern

But later I found from Python 3.10, the inbuilt algorithm was updated to use an enhanced version of Crochemore and Perrin’s Two-Way string matching algorithm which is the optimal and best way to do it. So I implemented my functions accordingly.

Keyword search from the whole database
Keyword search from a particular case

Crochemore and Perrin’s Two-Way string matching algorithm 🤔

In Crochemore and Perrin’s Two-Way string matching algorithm, there is a smaller string (the “pattern” or “needle”) that is searched in a longer string (the “text” or “haystack”), determining whether the needle is a substring of the haystack, and if so, at what index(es). This algorithm runs in O(len(needle) + len(haystack)) time and with O(1) space. Here is the sample code in C.

/* Computing of the maximal suffix for <= */
int maxSuf(char *x, int m, int *p) {
int ms, j, k;
char a, b;

ms = -1;
j = 0;
k = *p = 1;
while (j + k < m) {
a = x[j + k];
b = x[ms + k];
if (a < b) {
j += k;
k = 1;
*p = j - ms;
}
else
if (a == b)
if (k != *p)
++k;
else {
j += *p;
k = 1;
}
else { /* a > b */
ms = j;
j = ms + 1;
k = *p = 1;
}
}
return(ms);
}

/* Computing of the maximal suffix for >= */
int maxSufTilde(char *x, int m, int *p) {
int ms, j, k;
char a, b;

ms = -1;
j = 0;
k = *p = 1;
while (j + k < m) {
a = x[j + k];
b = x[ms + k];
if (a > b) {
j += k;
k = 1;
*p = j - ms;
}
else
if (a == b)
if (k != *p)
++k;
else {
j += *p;
k = 1;
}
else { /* a < b */
ms = j;
j = ms + 1;
k = *p = 1;
}
}
return(ms);
}


/* Two Way string matching algorithm. */
void TW(char *x, int m, char *y, int n) {
int i, j, ell, memory, p, per, q;

/* Preprocessing */
i = maxSuf(x, m, &p);
j = maxSufTilde(x, m, &q);
if (i > j) {
ell = i;
per = p;
}
else {
ell = j;
per = q;
}

/* Searching */
if (memcmp(x, x + per, ell + 1) == 0) {
j = 0;
memory = -1;
while (j <= n - m) {
i = MAX(ell, memory) + 1;
while (i < m && x[i] == y[i + j])
++i;
if (i >= m) {
i = ell;
while (i > memory && x[i] == y[i + j])
--i;
if (i <= memory)
OUTPUT(j);
j += per;
memory = m - per - 1;
}
else {
j += (i - ell);
memory = -1;
}
}
}
else {
per = MAX(ell + 1, m - ell - 1) + 1;
j = 0;
while (j <= n - m) {
i = ell + 1;
while (i < m && x[i] == y[i + j])
++i;
if (i >= m) {
i = ell;
while (i >= 0 && x[i] == y[i + j])
--i;
if (i < 0)
OUTPUT(j);
j += per;
}
else
j += (i - ell);
}
}
}
/*If you want to read more about this algorithm check out this link here.*/

In the coming week, I will be working more on the analytics part of this project and try to complete everything as per my timeline. I will keep updating you about my work through these blogs. So keep reading, sharing, and supporting. 😊

Till then if you haven’t read my previous blog here is a link for that: Google Summer of Code: OpenMF Week 3. 😊😀

See you soon!!😊

Github: swapnalshahil 👈

LinkedIn: swapnalshahil 🤘

Instagram: eulersgamma 😃

--

--

--

Recommended from Medium

Why These Programming Languages Are Being Hated By Programmers?

Encode x Internet Computer: Intro to Building on the IC in Rust [Video + Slides]

How to Create Drupal Image Gallery

Deploy Flask server to an EC2 instance

How do we evaluate research software to meet different requirements?

GSoC’20 — Coding phase | Week 13

Part-of-Speech Tag a String, Filter to Adverbs in Go

How to build a CI CD pipeline in AWS ECR and AWS EKS using Github actions and ArgoCD for Node.js

GitOps CICD using AWS EKS and Github

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Swapnal Shahil

Swapnal Shahil

GSoC’22 Mentor || GSoC’21 @SCoReLab || B.Tech at IIT Guwahati'23

More from Medium

CS373 Spring 2022 Week 7: Avi Ghayalod

Introduction to Algorithms — Selection Sort, Bubble Sort

Melody Walker’s Obituary

CS371p Spring 2022: Winnie Chang