SQL for Data Analysis

Adetola Adeya
May 28 · 4 min read

SQL (Structured Query Language) pronounced ‘sequel’, is used to modify and access data or information from a storage area called database. Here are a few things the SQL can do;

  • SQL can execute queries against a database
  • SQL can retrieve data from a database
  • SQL can insert records in a database
  • SQL can update records in a database
  • SQL can delete records from a database.

SQL is a database computer language designed for the retrieval and management of data in a relational database. It is the standard language for relational database management systems. A relational database management system (RDBMS) is the basis for SQL, and for all modern database systems such as MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access. The relational database system contains one or more objects called tables and data is stored in these tables. Tables are comprised of rows and columns.

SQL is effective for performing the types of aggregations that are commonly executed on Excel, but over much larger datasets and on multiple tables at the same time.

I am going to be explaining how I solved some challenges in the SQL Curriculum on the Gitgirl platform. We were asked to complete 54 challenges on SQL on the Hackerrank platform. Here is a description of an exercise that I found quite interesting.

The PADS

The challenge was for us to generate the following two result sets:

  1. Query an alphabetically ordered list of all names in OCCUPATIONS, immediately followed by the first letter of each profession as a parenthetical (i.e.: enclosed in parentheses). For example: AnActorName(A), ADoctorName(D), AProfessorName(P), and ASingerName(S).
  2. Query the number of occurrences of each occupation in OCCUPATIONS. Sort the occurrences in ascending order, and output them in the following format: There are a total of [occupation_count] [occupation]s.

where [occupation_count] is the number of occurrences of occupation in OCCUPATIONS and [occupation] is the lowercase occupation name. If more than one Occupation has the same [occupation_count], they should be ordered alphabetically.

SOLUTION

  1. First, we put in the SELECT statement is used to select data from a database and the data extracted here is the concatenation of the name column and a substring of the occupation column. The SUBSTR function is for extracting a substring from a string. An alias ‘Names’ is then given to the whole statement:

SELECT CONCAT(name,’(‘,SUBSTR(occupation,1,1),’)’) as Names

Next, we use the FROM clause that produces the table needed, which in this case is the OCCUPATIONS table:

FROM OCCUPATIONS

Lastly, we will sort the result-set in ascending order using the ORDER BY clause and we order by Names and close the statement with a semicolon showing the end of the query:

ORDER BY Names;

note: The ORDER BY clause sorts the records in ascending order by default.

2. For the second query, the same CONCAT function is used to add a line of string and the Count of records in the population column.

SELECT CONCAT(‘There are a total of ‘, COUNT(occupation),’ ‘, LOWER(occupation),’s.’)

Next, we use the FROM clause that produces the table needed, which in this case is the OCCUPATIONS table:

FROM OCCUPATIONS

After that, the GROUP BY clause is used in collaboration with the SELECT statement to arrange identical data: occupation, into groups.

GROUP BY occupation

We then sort the result-set in ascending order using the ORDER BY clause and we order by COUNT of population and the population column and close the statement with a semicolon indicating the end of the query:

ORDER BY COUNT(occupation), occupation;

A picture showing Queries 1 and 2

Lastly, we run both queries together.

The output of both queries

In this short tutorial, we were able to identify a few clauses and utilize them to retrieve, modify, sort and update data.

I assumed the understanding of the basic knowledge of SQL in this tutorial but if you want to learn more or refresh your memory check this out.

PS: I hope this tutorial helps you to get started with basic data analysis using any dataset you come across and practice reading, exploring, analysing, using SQL. My Hackerrank profile can be found here and my Gitgirl projects can be found on my Github.

Please comment below if you have thoughts or questions concerning anything I have explained. Till next time!

The Startup

Medium's largest active publication, followed by +524K people. Follow to join our community.

Adetola Adeya

Written by

The Startup

Medium's largest active publication, followed by +524K people. Follow to join our community.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade