Why SQL is a must for Data Scientists?

Rafael Oliveira
Data Analytics @ Hult
2 min readApr 28, 2018

Data scientists are working hard on projects to predict problems and finding solutions, and the topic is hot right now. The whole analysis process passes through a big mountain of data (public or not), all this information is stored in databases and that makes SQL a must for Data Scientists.

So, what is SQL?

SQL (Structured Query Language) is a standard programming language to access relational databases such as Oracle, SQL Server, IBM DB2, MySQL, Sybase, etc. SQL works with something called structured data, which means all the fields are organized in a fixed representation. Although new databases like Cassandra and MongoDB for unstructured data (freeform data with no consistent structure like Wikipedia posts, Tweets, etc.) are getting their use, the relational databases still represent a big and important portion of the total.

Advantages of using SQL

Empower your data usage: SQL allows users to create and manipulate the data, connect more than one database and cross information between them.

Combine with other programming languages: You can use SQL commands inside your code, almost all the programming languages have libraries to allow users to use SQL commands and they normally are straightforward. Additionally, you can also use it with XML, JSON and other scripting languages too.

Easy to learn: SQL is probably one of the simplest programming languages you will learn, it is easy, and the commands are very intuitive and simple to remember. Furthermore, the core of the language does not change from one platform to another.

Prepares you for NoSQL: It is not mandatory to learn SQL before but is easier. Once you have a good understand about SQL, will be clear the differences between it and NoSQL, and will be possible for you to map the pros and cons of each one.

I want to learn, what should I do?

As mentioned before SQL is not complex, and you can probably learn by yourself using a free online course. I suggest you check this SQL Tutorial created by W3 Schools.

Common mistakes about SQL

If you are a student or a data science enthusiast, you probably used some datasets in excel before, that is a first big mistake. People focus just on learning coding skills like Python or R and forget that in real-life the major part of the data will probably not be accessible using Excel.

The second mistake you should avoid is thinking that having SQL knowledge means you are a database administrator. The database administrator is a different role, with other complex activities that go far and beyond the SQL knowledge.

That being said, don’t forget to add SQL to your skill! It is a must for Data Scientist.

--

--