Proposed Model: Deep Dual Recurrent Encoder Model

The proposed model simultaneously uses “Transcript text data” and Audio features to understand the Emotions associated with Speech. The model would have the ability to analyze speech data from “Signal Level” to “Language level”. This approach is a unique way of using Deep learning model for “Statistical Speech Processing”.

Challenges in Speech Analytics

1. Training data — Insufficient data points to train Deep learning architecture is one of the biggest challenges in Speech Analytics

2. The characteristics of speech must be learnt from Low-level speech signals which is a difficult task to achieve with audio features like MFCC, Low level descriptor features alone.


Image Credit:

What is Class Imbalance problem?

Class Imbalance is a problem which occurs when there is a major imbalance between the number of data points in minority and majority classes.

Let’s consider an example: If your dateset has 100 rows of data, with 95 data points labelled “Apple” and 5 labelled “Bananas”, the classification model developed on such dateset would have a very high F score but it always classifies everything as “Apples”. This is problematic because it causes sub-optimal classification performance.

How does Class Imbalance Issue arise in data?

There are two major causes for Class imbalance: -

The data is naturally imbalanced

The minority class data is too expensive to obtain which…

Image Credits:

Data Dictionary

What is a Data Dictionary?

Data Dictionary provides a metadata about the data explaining the meaning of each and every field, data types of the field, sample values etc. Understanding the business context of fields in the data helps one to use appropriate features in the model. Data Dictionaries play an important role in understanding the data before we dive deep into it. In order to get started with Modelling, it is very important to understand the data.

Following are the crucial fields to be included in a Data Dictionary: -

1. Data Types of fields

2. Sample values of fields

3. Business Unit responsible for capture of the field

4. Point of Contact from the responsible business…

Stop asking such questions is how, start doing!

Image Source:

10 Steps to become a Data Scientist in 2020

Before we get started with the article, take down some pointers about life in general:-

  1. It is great to ask questions and get knowledge. But how much damn knowledge do you need to get started? Learn to dive without a parachute and build it on the way down! The Inspiration of Desperation is the greatest motivation.
  2. Don’t Live your life like leaves on the ground, going everywhere the wind takes you. Live like a bloody bull and only do what you really want to do in life.
  3. Be Obsessed in life, so…

Image Source:


In order to increase the customer experience and build customer loyalty, we can use the following approaches:-

  1. Identify the Leads most likely to create accounts.
  2. Offer the right product to the right customer segment
  3. Identify the customers who have created their account and most likely to refer friends and family.

Advanced Analytics Framework

Problem Statement

  1. Identify the Leads with high propensity to convert into account creations

2. Identify the right product mix for different customer segments.

3. Identify the drivers leading to higher conversions and deploy effective and impactful marketing campaigns for customers having a very low Lead to Account conversion probability.

High-Level Architecture

Datasets for Categories: Computer Vision, NLP, Reinforcement Learning, Deep Learning etc.

Image Source:

1. Quandl

It is a massive repository for Economic and Financial data. Most of the datasets are free but some are available to purchase as well.


2. Academic Torrents

It has data used to publish scientific research papers. The variety of datasets is massive with availability of free download.



It consists of a variety of datasets from US Government agencies. Domains include Education, Climate, Food, Chronic disease and what not.


4. UCI Machine Learning Repository

This site consists of datasets hosted by the University of California, Irvine. …

Image Source:


It stands for -Bidirectional Encoder Representations from Transformers

Lets dig deeper and try to understand the meaning of each letter.

B (Bidirectional) — The framework learns from both the left and right side of a given word. This makes it better than the LSTMs which are uni-directional in nature (Left to Right or vice versa).

Same words having different contexts is termed as “Homonym”. BERT deals with homonyms by understanding the context.

E(Encoder) — An encoder program which is used to learn the representations from a given dataset.

R(Representations) -They are learnt from the given dataset

T (Transformers) — The…

Image Source:

What is Google Colaboratory?

Colaboratory allows users to execute Python code through a browser. This is favorable for Machine Learning and Data Science enthusiasts. The best part of Google Colab is that it provides free access to heavy computing resources such as GPUs & TPUs. Google Colab is a free to use tool.

What are the usage Limits of Google Colab?

Google Colab follows the concept of dynamic usage limit allocation. This fluctuates in response to the demand from users across the globe. The allocation of GPU and TPU resources are favored to users who use Colab interactively compared to the ones running long notebooks.

Notebooks can be run on Colab as…

Image Source:

BERT is an acronym for Bidirectional Encoder Representations from Transformers. By the end of 2018, new Natural Language Processing technique based on Transformers revolutionized Deep Learning community

Background Research -BERT development

The lack of training data is usually the biggest challenge in Natural Language Processing tasks. Even though we have a lot of text data, we have to create unique datasets by splitting it from a large dataset. This results in shortage of labelled training data. Natural Language Processing model performance increases with the increase in training data.

Research was performed on several techniques to create language representation models through training on unannotated text…

Image Source:

Artificial Intelligence has numerous use cases when we talk about Drug Discovery and Development processes. Artificial Intelligence has been able to greatly advance the age-old pharmaceutical research methodologies and upgrade them from traditional approaches.

Drug Development is based on Molecule development. There are 2 approaches to develop molecules:-

1. Structure-based

This approach requires to have the knowledge of structure of both target and ligand

It deals with methods for free-energy binding calculation, protein-ligand docking, Dynamics of molecules etc.

2. Ligand-based

Ligand based approach uses ligand information to predict biological response.

What is a ligand?

It is an Ion/Molecule which binds with the central atom…

Sowhardh Honnappa

Senior AI Researcher || Writer —

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store