Data Mining, Programming, Python

Image for post
Image for post

In today’s scenario, one way of people’s success identified by how they are communicating and sharing information with others. That’s where the concepts of language come into the picture. However, there are many languages in the world. Each has many standards and alphabets, and the combination of these words arranged meaningfully resulted in the formation of a sentence. Each language has its own rules while developing these sentences and these sets of rules are also known as grammar.

Image for post
Image for post

In today’s world, according to the industry estimates, only 20 percent of the data is being generated in the structured format as we speak, as we tweet, as we send messages on WhatsApp, Email, Facebook, Instagram or any text messages. And, the majority of this data exists in the textual form which is a highly unstructured format. …

Image for post
Image for post
Image by xresch from Pixabay

Natural Language Processing

Guide to various NLP tasks using the spaCy library

spaCy is an open-source, advanced Natural Language Processing (NLP) library in Python. The library was developed by Matthew Honnibal and Ines Montani, the founders of the company In my previous article, I have explained the Natural Language Processing using the NLTK library. spaCy was designed particularly for production usage and it helps to process, and understand the large volume of text. It provides crisp and user-friendly API.

To know more about NLP, I invite you to check out my previous article. In this article, we see how to use a spaCy library for various NLP-related tasks.

Image for post
Image for post



#Installing spaCy library
!pip install -U…

Image for post
Image for post
Image by Mario Hagen from Pixabay

Deep Learning, Programming

Showcasing an easy way to build a custom image dataset using google images and Bing image downloader

I work predominantly in NLP for the last three months at work. It’s been a long time I work on the image data. Hence, I decided to build a unique image classifier model as part of my personal project and learning.

One thing I am really missing in the current pandemic is traveling. These days I used to see a lot of travel vlogs and travel pictures on Instagram, wondering when we will go back to the normal world.

This strikes me to create an image classifier model with five classes like Mountain, Beach, Desert, Lake, and Museum. However, I don’t have an image dataset to build the model and unable to get any dataset from google. One way is to manually scrape the image, but it takes time. …

Image for post
Image for post
Image by Michal Jarmoluk from Pixabay

Machine Learning

Average Word2Vec and TF-IDF Word2Vec

In my previous article, I have written about a content-based recommendation engine using TF-IDF for Goodreads data. In this article, I am using the same Goodreads data and build the recommendation engine using word2vec.

Like the previous article, I am going to use the same book description to recommend books. The algorithm that we use always struggles to handle raw text data and it only understands the data in numeric form. In order to make it understand, we need to convert the raw text into numeric. …

Image for post
Image for post
Image by Markus Distelrath from Pixabay

Data Science

A simple explanation to containerization with Docker

Data scientists come from different backgrounds. In today’s agile environment, it is highly essential to respond quickly to customer needs and deliver value. Faster value provides more wins for the customer and hence more wins for the organization.

Information Technology is always under immense pressure to increase agility and speed up delivery of new functionality to the business. A particular point of pressure is the deployment of new or enhanced application code at the frequency and immediacy demanded by typical digital transformation. Under the covers, this problem is not simple, and it is compounded by infrastructure challenges. Challenges like how long it takes to provide a platform for the development team or how difficult it is to build a test system that emulates the production environment adequately (ref: IBM).

Speech to Text

Image for post
Image for post
Source: Screenshot from Information-Age

Speech is the most common means of communication and the majority of the population in the world relies on speech to communicate with one another. Speech recognition system basically translates spoken languages into text. There are various real-life examples of speech recognition systems. For example, Apple SIRI which recognize the speech and truncates into text.

How does Speech recognition work?

Image for post
Image for post
Speech Recognition process

Hidden Markov Model (HMM), deep neural network models are used to convert the audio into text. A full detailed process is beyond the scope of this blog. In this blog, I am demonstrating how to convert speech to text using Python. …

Image for post
Image for post
Source: Image by TuendeBede from Pixabay

Recommending similar books based on book description and name

If we plan to buy any new product, we normally ask our friends, research the product features, compare the product with similar products, read the product reviews on the internet and then we make our decision. How convenient if all this process was taken care of automatically and recommend the product efficiently? A recommendation engine or recommender system is the answer to this question.

Content-based filtering and Collaborative based filtering are the two popular recommendation systems. In this blog, we will see how we can build a simple content-based recommender system using data.

Content-based recommendation system

Content-based recommendation system recommends items to a user by taking similarity of items. This recommender system recommends products or items based on the description or features. It identifies the similarity between the products based on its description. It also considers the user previous history in order to recommend a similar product. …

Image for post
Image for post
Image by Ryan McGuire from Pixabay

Data Science, Python

Handy Python libraries for data science

Python is a best friend for the majority of the Data Scientists. Libraries make their life simpler. I have come across five cool Python libraries while working on my NLP project. This helped me a lot and I would like to share the same in this article.

1. Numerizer

Amazing library to convert text numerics into int and float. Useful library for NLP projects. For more details, please check PyPI and this github repo.


!pip install numerizer


#importing numerize library
from numerizer import numerize
#examplesprint(numerize(‘Eight fifty million’))
print(numerize(‘one two three’))
print(numerize(‘Fifteen hundred’))
print(numerize(‘Three hundred and Forty five’))
print(numerize(‘Six and one quarter’))
print(numerize(‘Jack is having fifty million’))
print(numerize(‘Three hundred…

Things I’ve learned in the real world about Data science

Image for post
Image for post
Picture courtesy: Atika Maan

Data science, machine learning and artificial intelligence have been hot domains for a few years now. Many people want to work as data scientists and are putting in an immense effort to upgrade their skills through university, online course or self-study. However, there are a lot of challenges in the real world in terms of working and solving a business problem. Non-technical skills are equally important in order to work as a data scientist. In this blog, I am sharing my personal experience that I have come across in my work as a data scientist.

Understanding the business problem

There are a lot of challenges in the real world problem that students don’t necessarily face at the University. In school, they used to get a structured problem and a popular dataset and eventually get the exact solution. However, the problem in the industry will often be unstructured and complex. Any assumptions on the problem will backfire in the real world. It is better to understand the business problem completely before diving into the analysis. Understanding business problems involves doing more research on the problem and its domain, planning, asking the clients the right questions and discuss with team members. …

Converting Emoticon and Emoji into word form using Python

Image for post
Image for post
Source: wallpaperplay

In today’s online communication, emojis and emoticons are becoming the primary language that allows us to communicate with anyone globally when you need to be quick and precise. Both emoji and emoticons are playing an essential part in text analysis.

Both Emoji and Emoticon are most often used in social media, emails, and text messages, though they may be found in any type of electronic communication. On the one hand, we might need to remove for some of our textual analysis. …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store