Pandas cheatsheet for SQL people (part 1)

Published in

HackerNoon.com

2 min readJun 6, 2018

Pandas library is the de-facto standard tool for data scientists, nowadays. It is used widely by many data scientists around the globe. After being familiar with it I always use it for processing table-structured data whatever project I am working on. It works fast and reliable, supports CSV, Excel, JSON and so on.

However, as a person experienced in SQL, I had some difficulties and confusion with manipulating the tables (a.k.a. DataFrames) in the beginning. Eventually, I learned more APIs and ways of doing the things properly. I believe many people who do his/her first steps on Pandas may have the same experience. Some time ago I prepared the cheatsheet using SQL queries and their analogy in Pandas. And I am happy to share it with all of you.

In this post, I am sharing the queries using “SELECT”. I will post other queries and their analogies to Pandas soon later. I would be happy to receive your feedback and wish-list regarding the cheatsheet.

Here we go.

We have three simple tables:

users
courses
attendance

And the rest of the queries will be about data manipulations on them.

Lets start from basic SELECT command.

SELECT * FROM users

SELECT * FROM users LIMIT 0,10

SELECT * FROM users WHERE email IS NULL

SELECT first_name, last_name FROM users

SELECT DISTINCT birth_year FROM users

Basic math and arithmetic operations.

SELECT AVG(points) FROM users;
SELECT SUM(points) FROM users;

Conditional operations and LIKE.

SELECT * FROM users WHERE birth_year BETWEEN 1998 AND 2018

SELECT * FROM users WHERE first_name LIKE 'Ch%'

SELECT * FROM users WHERE first_name LIKE '%es'

SELECT * FROM users WHERE first_name LIKE '%on%'

SELECT first_name, last_name FROM users WHERE first_name LIKE '%on%'

I finalize this post with JOIN and ORDER BY related SQL queries. To make the code more simpler and easy to read I will assign the results into new variables and re-use them in next ones.

SELECT * FROM attendance atn
LEFT JOIN users usr ON atn.user_id = usr.id

Now lets join the result above with course titles. The result it will be same as the result of following SQL command.

SELECT * FROM attendance atn
LEFT JOIN users usr ON atn.user_id = usr.id
LEFT JOIN courses co ON co.id = atn.course_id

The query above will return the full table/dataframe with all columns included. To select necessary columns we can use the same method as we did before.

SELECT * FROM users ORDER BY first_name, last_name

SELECT * FROM users ORDER BY first_name, last_name DESC

SELECT first_name, last_name, birth_year,
points, course_name, instructor FROM attendance atn
LEFT JOIN users usr ON atn.user_id = usr.id
LEFT JOIN courses co ON co.id = atn.course_id
ORDER BY first_name, last_name

Pandas cheatsheet for SQL people (part 1)

Written by Adil Aliyev