Build an OCR/Image Search application with MERN Stack

By Sundeep Charan Ramkumar

Sundeep Charan Ramkumar
UpSkillie
Published in
8 min readJan 16, 2020

--

We are going to explore what an OCR is, its working and the usage in our application. We would also view the process behind Instant Search using an indexing engine called Aloglia.

Table of Contents

1. Introduction

OCR (Optical Character Recognition) is a way of parsing text from images. Law-based firms use OCR, where they would require the need to extract specific text format in a digital format, and also helps in avoiding old parchments of text to get smudged over. An OCR framework (We are going to make use of Tesseract) requires a learning model, to match the words present in an image. Since there are more than 200 languages and it’s not feasible to cover all of them, we are going to build this application for images containing English markup. Let’s move on.

Project Description:

We would be having a bird’s eye view of what we are going to build

  • A user would upload an image with text on it
  • The image gets parsed through the OCR Engine and uploaded to AWS S3
  • Algolia Indexing mechanism fortifies the instant search functionality.
  • We deploy the application to Heroku.
  • MongoDB carries over the persistence.

Tech Stack Used:

  • React.js, Redux.js / Redux Saga — Front end
  • Node.js, Express.js — Back end
  • MongoDB, Algolia Indexing — Persistence
  • Git, Github, Heroku, CI/CD Pipelining — DevOps

Backend Implementation:

We are going to have just one model, on User with the following entities

Every user would have his/her name, email, accessToken (For Authentication purposes), public search key (For indexing purposes) and finally the corresponding image location along with its text content.

For Authentication and Authorization we would be using bCrypt for hashing and JWT Stateless Auth.

Next, we shall see what routes are we going to need.

  1. Registering users (POST api/user/register)
  2. Signing in users (POST api/user/sign in)
  3. Logout users (DELETE api/user/logout)
  4. Fetching user details (GET api/user Authentication — required)
  5. Uploading an image concerning the user (POST api/image/ Authentication — required)

Since this tutorial is not about JWT and hashing, We move on with the uploading part.

Step 1:

Utilizing Multer to fetch the files:

There are other ways to capture file uploads like Busboy, Formidable. I felt Multer is somewhat beginner-friendly, so I went on with it. The image is not going to be stored locally; hence the configuration is pretty easy to wire up.

Step 2:

Uploading Images to AWS S3:

Next, we move on towards uploading images to S3. I assume that you have already an AWS Account either on the Free Tier or the paid ones. Uploading an image is easy as 1 .. 2 .. 3

  1. Create a bucket by initializing an s3 instance with the help of API KEYS given
  1. Prepare the upload object by setting the required properties

The ACL property is where we, as creators, can tweak how the image’s permission can be. Right now it is set under public read access.

  1. Wire up the upload function with the following

Step 3:

Parsing the text in Images with Tesseract.js:

The process might sound tough, but it is not. All we have to do is download a language learning model depending upon your language requirement here and save it onto your project directory. Then we use the following snippet.

Step 4:

Indexing the parsed data onto Algolia:

Now that we have finished the parsing, It is time to index this data to aid the Instant search facility. We initialize the instance bypassing our provided API Keys, and we create an index name. If the intended name is already present in a sense, Algolia just connects with it. Else it creates the said index.

The following step is optional; however, it is required if we want to lock down the image search concerning the uploaded user

Create a filter by which the images must be viewed. In this case, using our database’s user Id would be well and good. We name it as viewableBy; however, we could name it as per our convenience. Then we would want to generate a secret API key for each user who wants to upload so that it ensures as an additional step to fetch only the images he/she had uploaded.

The last puzzle piece is to create an object and send it to Algolia.

That is pretty much it on the backend.

Front end implementation:

As far as the front end is concerned, all we have to take care of is to send the image to the backend and wire up the image searching functionalities. Again I am not going to discuss the Authentication part here; however, I can tell you that the way we captured and maintained the state is via Redux and Redux-Saga for asynchronous actions.

Step 1:

Preparing our file upload

To make things a little bit fancy, we are going to apply drag and drop image upload using react-dropzone. The uploaded images are limited to three, to avoid taking a long time in processing.

There are two set methods on the onDrop callback. One is for the file uploads state, and another one is for the previews state. The previews state does need an object URL to make it viewable.

Step 2:

Sending the files to our backend:

Now that the state has our files, we need to send them to our backend. But we can’t just send them right away, as multer accepts only form data. Hence we need to wrap all our files with a new FormData instance.

Please note that the mentioned images in formData are the name of the input we are going to send. This should be matching with the name that multer expects.

Step 3:

Instantiating Algolia in React

For speeding up the design process, I have used Material UI for React. We now come to the Dashboard Page where we create the SearchBar component with the Images grid at the bottom, concerning the searches received. Firstly we need to set up our Algolia instance, on the client-side.

Here the user.publicSearchKey is the public key that receives from the backend, which generated separate keys to separate users.

Step 4:

Preparing to receive Hits from search results:

Let’s move on to how part of integrating Instant Search. We would first require the react-instantsearch-dom dependency

Use the — prefix command only if you are trying to install from the project directory

Next, we need to wire two parts.

  1. Image search container
  2. Hits (Results) List

The image container wraps the Hits component and the SearchBox inside the InstantSearch container.

The Hits component does the magic of receiving the images from algolia because we had mentioned the search client as well as the indexName as attributes inside InstantSearch. The styles are up to the User’s choice; however, the link to my repository is below this article, where you can view the basic styles behind it.

Step 5:

Integrating the modal

It always leads to a head scratch, when we need to integrate a modal inside React, due to the component architecture and the parent-child relationship tree. Luckily, there is a technique that React gives us called Portals. React Portals allow us to escape from the root element, which is present in the original DOM to somewhere else. All we have to do is to wire that new root inside our component, and that is relatively straight forward.

Let me breakdown the above code chunk.

We first tell React the location of the new root element in DOM.

The custom component names are not components but just styled-components using CSS-IN-JS. With ReactDOM dependency, we tell React to inject the necessary components as the first parameter (children is received as props just to place whatever the child components are inside the Portal component), and the target destination as the second parameter. I designed the component to close in a way that if we click the Back button, or if we click anywhere but not at the image.

Now that we know the technique behind the modal. Let’s connect that with the image list.

We tell the portal to disable by onClick prop, and the child inside it is the image that has been clicked from the list. That’s just it.

Deployment:

Hurray! We have reached the end of the application. Sure this can be widely extended to our wildest dreams. But for educational purposes, this has been limited to the MVP. Let’s proceed with the deployment section.

Step 1:

Creating a Heroku app:

Download the Heroku-CLI here. Next, open your terminal concerning the project directory you are present in, and then create a Heroku app by logging in.

These commands would finish up with a random application’s URL. Let’s leave that for now, and focus on the deployment part on our app.

Step 2:

Tweaking our package.json file:

Whatever we are building on our front end, at the end of the day, it is just a normal Express app. Hence we need to make sure to point our static files to the folder where it is present. But to create that, we must insist Heroku run a script that executes whenever the build process is finished. We tell that inside the scripts property of package.json.

We set the environment variable NPM_CONFIG_PRODUCTION to false to run the react’s build script.

Step 3:

Preparing our server.js for deployment:

By default, Heroku has an environment variable called NODE_ENV. This variable depicts the mode of deployment. In Heroku, it is always set to production. We can use that to our advantage.

Here we say that, if the deployment is under production means, express should take the static directory as the build from root position. We derive the folder path via path module in Node.js. Next, we redirect whatever route that comes as a request to the index.html present in the build directory for allowing React to proceed with client-side routing.

Step 4: Pushing our modified code to Heroku Now we are ready to implement the changes onto Heroku. We change the remote to Heroku, to apply changes there, and then we type the following.

The build process would have been automatically induced due to the presence of post-build script. And that’s pretty much about it.

Further Improvements:

There is certainly a plethora of improvements, that can be done in terms of UX or functionalities, but what I would like put forward as an improvement is that the image OCR processing, as well as image upload to AWS S3, can also be done on the client-side. Do note that OCR conversion is processor heavy, hence it is on your decision to either keep it on the server-side or client-side. Another improvement that I would like to consider is organizing in terms of OOP techniques for the image uploads on the backend. So yeah, we have come to the end of this article, and I hope that you have learned quite a few things from this. The repository, along with the additional resources, are mentioned below.

Resources:

  1. bCrypt.js — https://www.npmjs.com/package/bcrypt
  2. Json Web Token — https://www.npmjs.com/package/jsonwebtoken
  3. Mutler — https://www.npmjs.com/package/multer
  4. AWS SDK — https://www.npmjs.com/package/aws-sdk
  5. Tesseract.js — https://www.npmjs.com/package/tesseract.js
  6. Algolia Instant Search — https://www.npmjs.com/package/algoliasearch
  7. React dropzone — https://www.npmjs.com/package/react-dropzone
  8. React Instant Search DOM — https://www.npmjs.com/package/react-instantsearch-dom
  9. Express — https://www.npmjs.com/package/express

--

--

Sundeep Charan Ramkumar
UpSkillie

A MERN Stack developer who builds single page/Ecommerce applications for a living