Can Pandas take on SQL?
Who will win?
SQL has been around for many years now and it is a popular option because it is easy to learn and understand. Later came Pandas. Today we’ll start comparing these two.
So, without any further due let’s get started!!
SQL and Python are two languages that are a must-have skills for data analysts and scientists in the real world now. When anyone starts to explore data analytics, they usually start with SQL as their first step and then slowly move towards Python.
We will make an attempt to compare some very essential commands in SQL with those in Pandas. SQL is a language used for storing, manipulating, and retrieving data in relational databases. Pandas is a library in python used for data analysis and manipulation.
Importing data
SQL
Importing data in SQL is a bit harder than in Pandas. We have to first create a table using the CREATE TABLE statement with fields similar to those mentioned in the file and then copy data from the file. Check the code below:
CREATE TABLE dataset_name (
column1 character data_type(30),
column2 character data_type(50),
column3 data_type,
);copy fert_data from ‘Path\to\file’ with delimiter ‘,’ csv header encoding ‘windows-1251’;
Pandas
Now compare the above work with this code in Pandas for importing files.
df = pd.read_csv(’Path\to\your\dataset’)
That’s it you have successfully imported all the files 🎊
Select
SQL
In SQL, we use a SELECT statement to fetch records from a table. This statement would be :
SELECT column_name1, column_name2,…
FROM table_name;
what if you have a million records in you dataset…., now you would hesitat to look at them so, to avoid that we use LIMIT() and pass an integer value into it
In pandas, we use head() and tail() to fetch the first 5 and last 5 rows respectively. This is literally equals to SELECT * FROM table LIMIT 5
in SQL. If a specific column has to be selected from the data frame, then the syntax would be:
df[[‘column_name1’, ‘column_name2’]]
Where
SQL
In SQL, a WHERE clause is used to filter records based on a condition. The clause is usually written in conjugation with conditional and logical operators for forming filtering conditions. The syntax would be:
SELECT *
FROM table_name
WHERE condition_expression;
Pandas
There is no single command in Pandas for filtering records like SQL. We can use Boolean indexing and positional indexing. In Pandas filtering are computed through logical conditions. The syntax of this would be:
df[df[‘column_name’] == value] # for Boolean indexingdf[df.loc[‘column_name’] == value] # for positional indexing
In both SQL and Pandas, we can make complex conditional expressions using comparison (>, <, >=, <=, !=, ==) and logical operators(‘and’, ’or’, ’not’ or symbols such as ‘|’, ‘&’, ‘~’)
Insert
SQL
Insert statement is used to insert rows in a database table. The syntax would be:
INSERT INTO table_name(column_name_1, column_name_2,…) VALUES (value1, value2,…);
Pandas
In Pandas, there is no specific function to insert new rows in the data frame. We can use tools such as concat() or append() to perform this task. But the most popularly used one is append().
df = df.append(new_row_to_be_inserted, ignore_index=True)
Update
SQL
In SQL, the UPDATE statement is used to update or modify the values of an existing record. The syntax for the update statement is:
UPDATE table_name
SET column_name = ‘modified_value’
WHERE condition;
Pandas
In Pandas, there is no function to update an existing row. A common procedure is to locate the place that should be updated or modified is by using the positional indexing as earlier we discussed in the comparison of FILTER and then assigning it a new value. It’s almost like assigning a variable.
Conclusion
As you can see, both SQL and Pandas are better at some points and there’s no clear winner. For some tasks, like inserting and updating, SQL is a dominator due to its more readable and easier-to-understand code whereas, Pandas doesn’t require you to create a data structure for the data beforehand, which can be a huge time saver. Just imagine creating a table with more than 50 or more than 50 columns of data into multiple tables and using joins is really a pain.
Don’t forget to leave your responses.✌
Everyone stay tuned!! To get my stories in your mailbox kindly subscribe to my newsletter. Thank you for reading! Do not forget to give your claps and to share your responses and share them with a friend.