Learn to looking at words as a sequence of numbers

Image for post
Image for post
Photo by Juan Gomez on Unsplash

In this article, we will look at how to tokenize never-before-seen words. Python’s tensorflow tokeniser can easily convert known words into tokens but what happens when you throw it words that it hasn’t seen before?

Tensorflow tokenizer is a very powerful tool. As shown in the article below it is very easy to get started with it.

The tokenizer can be used to convert a set of training data (sentences) into a dictionary where each unique word gets a different ID, so to say. Let’s look at how to create a dictionary out of words.

In Tensorflow, this dictionary is called a word…

Sentence chopping tutorial in 5 minutes with Python’s Tensorflow

Image for post
Image for post
Photo by Leonardo Toshiro Okubo on Unsplash

Tensorflow tokenizer lets you convert the words in a sentence into numbers. Each word gets an ID and thus lets you perform a wide variety of NLP tasks from sentiment analysis to sentence similarity.

Let’s say we have the following sentence

One plus one is two

The tokenizer will give an id to every word.

Tips for securing an offer @ Google for a PM job!

Image for post
Image for post
Photo by Arthur Osipyan on Unsplash


Google’s vision comes from the foot soldiers because it is inherently engineer driven. The role of a PM, therefore, is to make sure the engineering machine is well oiled. A product manager at Google provides its engineers with any non-engineering support that they might need. That ranges all the way from design to customer feedback. One of the PM’s crucial roles is to base their function on analysis and strategy.

Google’s vision comes from the foot soldiers because it is inherently engineer driven.

Culture @ Google

To understand the kind of people they hire, one needs to know the culture they have. Google has always been a startup at heart. Their culture derives from the same. They welcome new and innovative ideas and are not afraid to venture into unchartered territories. They are open to sharing code across teams and cross team collaborations. This is why a lot of the product manager’s move across teams during their career at Google. …

This is all you need to do to give voice to the artist in you.

Image for post
Image for post
Photo by Jason Rosewell on Unsplash

Your inner voice is expressionless. It occurs purely as electrical signals in the brain, otherwise known as thoughts. These thoughts are akin to clay — shapeless and without integrity.

Your inner voice is expressionless.

Clay becomes a recognizable object when appropriate force is applied to it. A thought becomes reality in a similar fashion through the medium of writing. In fact, whether it is writing, painting, or dancing, the origin of a particular piece of art occurs as a thought.

It occurs purely as electrical signals in the brain

All of us possess some amount of evolutionary creativity. Then, why is not everyone creating art? The reason is that we art is extremely hard before it becomes easy. Art is equal parts creativity mixed with discipline. Not everyone can be a creative genius, but mostly every can be creative. …

Competitive frontend-developer jobs expect engineers to be well versed in these features

Image for post
Image for post
Photo by Greg Rakozy on Unsplash


ECMAScript is a standard that web browsers follow while interpreting Javascript. Experts from across the world gather and decide on the changes that should make the language better. The 6th edition of such a conference happened in June 2015. The standard that came out of it was initially known as ECMAScript 6 (ES6) and was later renamed to ECMAScript 2015. Since then the standard is named as ECMAScript [Year No.] . This year’s standard is known as ECMAScript 2020

Just using terminal start counting words in 5 minutes

Image for post
Image for post
Photo by Max Muselmann on Unsplash

For the purposes of counting words in Shakespeare we will use the following Unix commands/tools

  • tr — replace words
  • sort — sort words
  • uniq — find unique occurences
  • grep — find specific words
  • less — show a limited text on console

tr is a utility in Unix-like operating systems used to replace or remove specific characters in its input data set. It is an abbreviation of translate or transliterate.

Basic Syntax for all tools


The utility reads a byte stream from input and writes the result to the console. As arguments, it takes two sets of characters, and replaces occurrences of the characters in the first set with the corresponding elements from the second set. …

Learn how web-apps validate email addresses in a form using regular expressions

Image for post
Image for post
Photo by Brett Jordan on Unsplash

Every now and then a data scientist comes across a text processing problem. Whether it is searching for titles in names or dates of birth in a dataset, regular expressions rear their ugly head very frequently. A regular expression is enough to scare any programmer beyond wits.

Join merges two tables into a single one.

Image for post
Image for post
Photo by Bryson Hammer on Unsplash

Data in the real world exists as a collection of distinct information. More often than not this information is organised as tables. Each table collecting data on a different aspect of the ecosystem. For example, a school might organise its data into the following tables

  1. Teacher information
  2. Student attendance
  3. Student marks
  4. Salary information, etc…

We can clearly see that tables 2 and 3 will have something in common (student IDs for one), just like tables 1 and 4 (teacher IDs for instance).

Such overlap of information across tables is quite common and becomes the basis for a join. A join lets you combine information from different tables. In R we have 6 types of joins. Let’s explore each of them with an example. …

Flights within India were started on 25th May. This is how things went down

Image for post
Image for post
Photo by Gabriel Küenzi on Unsplash

India started its flights after a 2 month flying ban on airlines. The flight journey is not without its risks. Consequently airports and airlines are taking extreme precautions like PPE suits, thermal screening etc. At the same time there are lapses in arrangement which could turn out to be extremely dangerous if not corrected. This article is a summary of all the major things to note when flying within India. I have also documented my flight experience in a video attached at the end of this article.

I will cover the following checkpoints

  1. Entry into the airport
  2. Check In
  3. Security and…

A 16 minute introduction to performing parallel operations in Python

Image for post
Image for post
`Photo by Michael Dziedzic on Unsplash


Python has a very powerful data analysis stack in the form of Pandas, Numpy, Scipy etc., but none of these libraries offer parallelism. Dask injects the much-needed parallel processing that has always been holding these libraries back from server level deployment. Dask provides parallelism for analytics, enabling performance at scale for existing python structures like, Numpy arrays, Pandas dataframes and machine learning tools from SciKit-Learn. Apart from parallelism in arrays and dataframes, Dask packs in itself a variety of advantages.


Rishi Sidhu

Imagine . Act . Realize | Artificial Intelligence, Books, Philosophy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store