We Have Excel, Who Needs SQL?

Peter Kane
The Startup
Published in
5 min readJul 14, 2020

--

To answer that provocative question: we all need SQL. We need both Excel and SQL (short for Structured Query Language), sometimes called “ess-cue-el,” and some other times called “sequel,” this article will explain why.

Photo © Autor

As a scientist, I have done calculations with data in tabular format, and I am familiar with Microsoft Excel and Google Sheets. For years, I have heard about SQL but never really got into the depths of it until I started my journey into data science but a few months ago. As a data science enthusiast and a newcomer to the field, I asked this question once I learned that SQL is a computer language used with relational databases that store data in the form of related tables. Why do we need SQL when we are so familiar with Excel (or spreadsheets in general), and when Excel really ‘excels’ at tasks that we always perform with data in tables, including some complex analyses?

In a nutshell, what are SQL and Excel?

The blunt, simple answer is that SQL and spreadsheet applications such as Microsoft Excel are different things. They all indeed work with data in tables or structured data. Still, SQL is a computer language we use to communicate with relational databases through another set of software called relational database management systems (RDBMS) such as MySQL, one of the most popular open-source RDBMSs out there. On the other hand, Excel is an application that works like a very smart calculator that can do many different calculating and analysis tasks.

Another question arises: Is SQL a programming language? To answer this question, we need to understand some definitions first. Online Cambridge Dictionary defines programming language the same as the term “computer language.” According to the source, a programming language is “a set of rules, words, etc. that are used for writing computer programs.” It even goes as far as to put SQL in an example where it says, “SQL is a computer language used to create an interface that allows restricted access to a database.” Yes, one of SQL’s capabilities is restricting access to provide data security.

SQL is, in a sense, a programming language, as the definition above says. However, it’s not a general-purpose programming language like Java, C++, Python, etc. It is a domain-specific language, and its domain is specific to managing relational databases. We don’t use SQL to write applications, but we can use SQL to communicate with relational databases to retrieve, create, manipulate, and update tables of data in databases.

Up to this point, a more appropriate question should be: “What are the differences between Excel and RDBMSs such as MySQL?” Albeit uses SQL command lines, MySQL can also visualize data in the form of tables. Doesn’t Excel also create, manipulate, and update tables of data? Yes, Excel and the other spreadsheets do that very well. However, some things define the differences between RDBMSs and spreadsheets.

RDBMS vs. Excel (spreadsheets)

Storage Capacity

A relational database run by an RDBMS usually stores data much more than regular spreadsheets such as Excel can process at ease. At its full capacity, Excel can store about 1 million rows and 16,000 columns of data. Those numbers are relatively small compared to what a database system can store. The limit at which MySQL can operate depends on the file size allowed by that particular operating system. When you have more records than a regular spreadsheet software can handle, consider using a database system.

Data Processing Speed

We can indeed store up to about 1 million rows of data in an Excel sheet. We can still scale up by using multiple sheets. However, doing so will cause Excel’s data processing speed to drop, and looking up for any single rerecord of data will become a time-consuming task. On the other hand, an RDBMS can process millions of records of data without losing speed.

Data Integrity

Multiple users can access and work on a single database simultaneously over a network or on the Internet. Although spreadsheets are originally designed for a single user, an updated version of a spreadsheet such as Google Sheets can also provide multi-user functionality. For a database, multiple users can update or manipulate the data at the same time, but the RDBMS will carry out only a single process at any point in time. For a spreadsheet like Google Sheets, however, a multi-user session can be confusing. It can also cause data integrity loss because data can be deleted, overwritten, or changed simultaneously in unintended ways due to human errors. This error type will not happen in an RDBMS because the data input type is predefined during the database design process. Human error is eliminated because, in a database, each record in a table will not accept data of a type it is not assigned to keep. Therefore, a database can preserve data integrity better than any usual spreadsheet software.

Data Analysis and Visualization

SQL and an RDBMS such as MySQL can help us with many data-related tasks; however, they cannot perform complex data analysis and visualization by themselves. While MySQL can get us up to speed with managing large amounts of data, it offers only basic data analysis calculations. SQL can help us perform simple arithmetic tasks, but they still need additional tools such as Python or Tableau for complex analysis and visualization. On the other hand, spreadsheets can help us perform analyses and visualizations quickly on smaller datasets with their built-in tools. Basically, because of MySQL and Excel, or RDBMS and spreadsheets in general, they are built for different purposes.

What lies ahead?

In my opinion, SQL and RDBMSs are relevant in the age of data expansion, even though new languages for database management are starting to appear today. The big data is getting bigger by the day, and data processing tools like MySQL are becoming more and more essential for tackling and using such data. For spreadsheets like Excel, new tools are added continually by the software proprietors, making data analysis and visualization quicker and easier by the day. More insights can be derived from novel visualization methods, too.

To conclude this article, it is really up to the task and the user's goal of using an RDBMS or a spreadsheet. A user should aim for an RDBMS if he needs to store a large amount of data, preserve its integrity, and process it quickly. He should aim for a spreadsheet to help him with smaller datasets that require quick and complex analysis and visualizations.

--

--