Pandas cheatsheet for SQL people (part 1)

Adil Aliyev
HackerNoon.com
2 min readJun 6, 2018

--

Pandas library is the de-facto standard tool for data scientists, nowadays. It is used widely by many data scientists around the globe. After being familiar with it I always use it for processing table-structured data whatever project I am working on. It works fast and reliable, supports CSV, Excel, JSON and so on.

However, as a person experienced in SQL, I had some difficulties and confusion with manipulating the tables (a.k.a. DataFrames) in the beginning. Eventually, I learned more APIs and ways of doing the things properly. I believe many people who do his/her first steps on Pandas may have the same experience. Some time ago I prepared the cheatsheet using SQL queries and their analogy in Pandas. And I am happy to share it with all of you.

In this post, I am sharing the queries using “SELECT”. I will post other queries and their analogies to Pandas soon later. I would be happy to receive your feedback and wish-list regarding the cheatsheet.

Here we go.

We have three simple tables:

  • users
  • courses
  • attendance

And the rest of the queries will be about data manipulations on them.

Lets start from basic SELECT command.

Basic math and arithmetic operations.

Conditional operations and LIKE.

I finalize this post with JOIN and ORDER BY related SQL queries. To make the code more simpler and easy to read I will assign the results into new variables and re-use them in next ones.

Now lets join the result above with course titles. The result it will be same as the result of following SQL command.

The query above will return the full table/dataframe with all columns included. To select necessary columns we can use the same method as we did before.

--

--