An easy way to grow the training data set

MNIST dataset has is known as “Hello world” of Image classification. Every Machine Learning Engineer tackles this dataset sooner or later.

Dataset

MNIST is a set of small images of handwritten digits. Look at the below image which has a few examples instances.

Image for post
Image for post
MNIST data set

There are 70,000 images and each image has 784 features. This is because each image is 28 x 28 pixels, and each feature represents a pixel’s intensity, from 0 to 255.

There are many classification algorithms( SGD, SVM, RandomForest, etc) which can be trained on this dataset including deep learning algorithms (CNN).

Training and Evaluating

Let’s take an example of RandomForest Classifier and train it on the above dataset and evaluate it. …


Introduction to SciKit Imputer

Image for post
Image for post
Photo by Helloquence on Unsplash

Most of the Machine Learning Algorithms cannot work with missing values in the features. See the below example of the Melbourne Housing Data.

Image for post
Image for post
melbourne_data.describe()

Here, BuildingArea has 7130 rows with values while most of the features have 13580 rows with values.

Now, A few things you can do to deal with missing values

1. Get rid of the corresponding data

melbourne_data.dropna(subset=["BuildingArea"])

This will drop all the rows with the missing values. You can see that the number of rows has decreased now.

Image for post
Image for post
melbourne_data.describe()

2. Get rid of the entire attribute.

melbourne_data.drop("BuildingArea", axis=1)

This will drop the entire feature/attribute. See below, BuildingArea column is dropped now.

Image for post
Image for post
melbourne_data.describe()

3. Set the missing values to some value

Approach A

If you think that the attribute is important enough and you must include for the training. …


Automate your tedious job using Github Actions.

Image for post
Image for post
Source: https://lubus.in/blog/github-actions-to-release-wordpress-plugin-3656

What is Github Actions?

GitHub Actions enables you to create custom software development life cycle (SDLC) workflows directly in your GitHub repository.

Github Actions are custom code/instructions that interact and perform some operations in your Github repo any possible way you would like. (really!!!)

CI/CD are the legacy examples of it. If you know CI/CD then know that Github Actions is ++ to it.

Image for post
Image for post

More about it?

  • Github Actions is fully integrated into Github hence doesn’t require any external site and can be managed in your repo only.
  • Github Actions provides multiple templates for all kind of CI configurations and one can create own custom action and publish on the Marketplace. …


Do the scars remain?

Image for post
Image for post
Photo by Dan Meyers on Unsplash

Before I start, I want to thank and appreciate Aman Panchal for allowing me to dig deep in his past few years of a life-changing experience. In his words, he says, “I learned to live”.

Wait…Who is he?

Nobody, though Somebody. Pick one random person from the crowd, he is the one. He is not a celebrity or not a media feeder, though he is a human being. He is one of those many people who give up or disappear in the situation of inescapable stress and depression. He is here to help us. …


if valand if val is not None are not the same!

Image for post
Image for post
Photo by Hitesh Choudhary on Unsplash

Woah, what??? Python is (not) crazy.

When you do if val is None, you call the operator is, which checks the identity of x. i.e, if val is value Here, is operator checks whether both the operands refer to the same object or not.

None is a singleton in Python and all None values are also the exact same instance.

But…

when you say if val, python behaves differently. if expects a boolean, and assuming val is not a boolean, Python automatically calls val’s __bool__ a method.

if val is actually executed as if val.__bool__

The confusing thing is, that bool(None) returns False, so if val is None. This works as expected but there are other values that are evaluated as False.


Easy implementation using Python and NLTK

Image for post
Image for post
Photo by NORTHFOLK on Unsplash

Introduction

TextRank is an algorithm based on PageRank, which often used in keyword extraction and text summarization.

We will implement the TextRank Algorithm for Sentence Extraction in Python. The crux of this algorithm is to fetch the most relevant Sentences form the piece of the text, which is one of the most important tasks of Extractive Text Summarization.

But, Let’s not re-invent the wheel

The prerequisite for this Article is the understanding of the PageRank Algorithm, which you can read from the following article on Medium:

PageRank (PR) is an algorithm used to calculate the weight for web pages, whcih is used by Google Search to rank web pages in their search engine results. …


Law of Karma explained, the easiest way.

Image for post
Image for post
Photo by JOHN TOWNER on Unsplash

I and my friend met a lady at a restaurant on a pleasant Sunday morning who shared the table with us for breakfast, as she was not getting any empty table around.

We don’t talk to strangers often, Yes We, We all, correct? And in the metro city, everyone feels like strangers only.

She looked in a rush, We smiled at her, And she started talking that she is coming from the dance class and is very hungry. I just casually asked, “are you an IT person like us?” She said I used to be, I’ve earned good enough money and that’s it. I told her, give us some tips as well, to make it funny. …


It’s very short. Promise.

Top Articles

1. Text summarization in 5 steps using NLTK

This article explains the implementation of Text Summarizer using Python. Very easy to understand and implement.

2. Text summarization using NLTK: TF-IDF Algorithm

I’ve explained the implementation of TF-IDF algorithm for the single document-multiple paragraphs.

…more at Towards Data Science

3. Java: Simple Factory Pattern

Introduction to Java Design Pattern with example

4. Python for JAVA Developers: Basics

A handy cheat-sheet for Java developers who are learning Python

Side Projects

1. Pythonizr

Machine Learning code generator for python to help you start coding right away!

2. Python Tricks

Python List Explorer — a tool to quickly find list methods

3. nlp-akash

A GitHub repo of NLP notes and implementation of Algorithms

4. ml-akash

A GitHub repo of Machine Learning notes and implementation of Algorithms

5. Summarize Webpage

A SAAS project using Flask and NLTK to summarize any webpage using a given URL.

It leverages Natural Language Processing algorithm called WordFrequency.

Gary Vee — Word Cloud

Image for post
Image for post

WORK. That’s how you get it — Gary Vaynerchuk


Easy implementation using Python ft. Streamlit App

In the Article Text summarization in 5 steps using NLTK, we saw how we summarize the text using Word Frequency Algorithm.

Bonus: See in Action with Streamlit App

Now, we’ll summarize the text using Tf-IDF Algorithm.

Image for post
Image for post
Photo by Romain Vignes on Unsplash

Note that, we’re implementing the actual algorithm here, not using any library to do the most of the tasks, we’re highly relying on the Math only.

Term Frequency * Inverse Document Frequency

In a simple language, TF-IDF can be defined as follows:

A High weight in TF-IDF is reached by a high term frequency(in the given document) and a low document frequency of the term in the whole collection of documents.


Here, what I’ve got

We all know him, The boy who portrayed Wesley Crusher in on the television series Star Trek.

Not enough? He also appeared frequently on The Big Bang Theory.

Now coming to the point, Wil is an active blogger and writer and he publishes his writing on Medium: Wil Wheaton

Last year he has published a story (link here) on his Depression and Anxiety. Yes, He has confessed and I’m grateful to him as many people do not open about mental health. And in His own words, “I’m not Ashamed”.

I’ve implemented TF-IDF algorithm for Text Summarization and I was sexperimenting with it. And I wanted to read his story again and thought of feeding his story to my Algorithm, below is what the machine has come up…

About

Akash Panchal

Full-time thinker, part-time Software Engineer. Side Project: https://pythonizr.com LinkedIn: https://www.linkedin.com/in/akash-panchal-2222749a/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store