Hows Whats & Whys of SQL in Data Science

Data science is a mixture of multidisciplinary skills ranging from an intersection of mathematics, statistics, computer science, Communication & business. SQL is one of the key ingredient of this.

Darshana Singh
3 min readJun 3, 2019

Data Science is a vast sector; it is not just a standalone topic but a combination of many. SQL is an integral part of data science and learning SQL is the first step towards the path of becoming Data Scientist.

In the 21st century, data has become one of the most vital resources and handling of the humongous pile of data is the most tedious job. Data available can be incredibly noisy at times. The very first responsibility of a Data Scientist is to make sense of noisy data and here SQL comes into the picture.

SQL is database query language which is used for retrieving, manipulating & managing the data by the Data Scientists. The type of database used is called Relational Database. The word relational defines that the data is structured & organised in rows & columns of the table.

So here we will go with a stepwise approach to conquer the territory of SQL.

There are various “flavours” of SQL that work with different database engines. PostgreSQL, MicrosoftSQL are some of the variants of SQL, just like dialects in a spoken language. Here we will be discussing MicrosoftSQL server’s Microsoft SQL Server Management Studio for Windows.

Now, why are we talking about the database engines instead of discussing how to write queries? This is because writing simple queries in an editor and executing same query in a database engine with all exception handlings are two entirely different things. As we know that knowledge applied in right direction manifest the best results.

Following are the steps to be taken for operating MSSQL Server Management Studio:

Step 1: Download Microsoft SQL Server Management Studio as per your machine’s specifications. This db engine is quite heavy and requires a lot of space, so check your configuration and make sure you have enough space.

Step 2: Once you have the tool, you need to have a proper database schema with you. As we discussed above, there are a lot of public repositories we can easily download.

Step 3: Now we have all essential things, after which comes the most important task of attaching the db. Database engines have their own constraints. Hence, it is very important to paste your db to the right destination folder. Make sure when you download the database it must have 2 files with extension .mdf & .ldf.

Note: .mdf basically contains the data saved and .ldf has all the logs.

In case you miss anyone of them, try to re-download the db. Once you have db with these two files, follow the given steps.

Go to Program files Open MSSQL studio Open DATA folder Paste the db in DATA folder.

Step 4: All the pre-requisites are done. Now open the MSSQL Studio and attach the database as mentioned here:

Click on Database folder given in the object explorer window Click on attach Click ‘Add’ Select db location i.e. DATA folder of MSSQL Select your db and click OK.

MSSQL allows access to limited folders. Hence, it is important to paste the db to the right folder.

Step 5: If you can see your db in the explorer window, your db has been successfully attached.

Last but not the least before, moving on to the queries, try to understand the schema of the given db. I personally have faced this issue where I was unable to figure out which table to look first. So, for every schema there must be a ‘Root table’, mostly it is the user table. Once you successfully figure out the schema, you are half way done.

Try to write as many queries as you can. Explore & experiment as much as you can & never forget slow & steady wins the race!

--

--