Extract word from your text data using Python’s built in Regular Expression Module

Image for post
Image for post
Regular Expressions in Python

Regular expression (RegEx) is an extremely powerful tool for processing and extracting character patterns from text. Regular Expressions are fast and helps you to avoid using unnecessary loops in your program to match and extract desired information.

In this post, we will show you how you can use regular expressions in Python to solve certain type of problems.

For going through this post, prior knowledge of regular expressions is not required.

Let’s understand how you can use RegEx to solve various problems in text processing. In this post we are focusing on extracting words from strings.

Using Regular Expressions in Python

To start using Regular Expressions in Python, you need to import Python’s re module. …


Image for post
Image for post
Photo by Benjamin Smith on Unsplash

Today, the extraction of information from scanned documents such as letters, write-ups, invoices, etc. has become an integral part of your business processes. To accomplish this task, you need to setup an OCR software to extract the information from these scanned documents or pdfs.

Here we will take you through the process of building and installing Tesseract 4.x on your Ubuntu 18.04 machine. There are two ways to install Tesseract 4.x.:

One is installing the Tesseract 4.0.0 beta version, it is easy to install and can be done using couple of commands.

Alternatively, you can install Tesseract 4.1.1 version, the latest stable release of Tesseract. In this post, we will guide you how to install each one of them on your Ubuntu 18.04 …

Quantrium Guides

Image for post
Image for post

In this guide, I will take you through the steps that I followed in order to train Tesseract using the Qt Box Editor and improved its prediction on certain types of images in which it was performing poorly. Before we begin, these are some of the tools I had at my disposal, and if you want to execute everything in this tutorial without any error, I would suggest you to have the same:

  • A Windows 10 PC with Tesseract installed
  • A Google cloud computing services account , with a Google compute engine instance with an Ubuntu 18.04.4 …

Quantrium Guides

Image for post
Image for post

Tesseract is an optical character recognition engine which can be used on various operating systems. It’s a free software, released under the Apache License. Originally, Tesseract was developed by Hewlett-Packard as proprietary software in the 1980s, later, it was released as an open source software in 2005. Then from 2006, it’s development is being sponsored by Google. In this guide, I will take you through the steps that I followed in order to install Tesseract on my Windows 10 machine. I shall also show you how you can use tesseract off the command line once you have successfully installed it.

Installing Tesseract 4 on a Windows Machine using .exe File:

To install Tesseract 4 on our Windows system, go to the following…

In this blog post, we are going to talk about how to set up YOLOv5 and get started. If you haven’t come across YOLOv5 already, here is a brief write-up about it explaining the idea behind its creation and its performance by the creators, Ultralytics:

One major advantage of YOLOv5 over other models in the YOLO series is that YOLOv5 is written in PyTorch from the ground up. This makes it useful for ML Engineers as there exists an active and vast PyTorch community to support.

YOLOv5 is also much faster than all the previous versions of YOLO. In addition to this, YOLOv5 is nearly 90 percent smaller than YOLOv4. This means YOLOv5 can be deployed to embedded devices much more easily. To know more about some of the advantages of YOLOv5, please refer to the above blog post. …


Bharath Sivakumar

A Machine Learning enthusiast who wants to make Machine Learning tools accessible to everybody

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store