SQL Database Inception

Joe Cha
bother7-blog
Published in
5 min readJul 31, 2017

SQL inception occurs when you the access the same database within itself in order to achieve the correct output.

For my first blog post, I have chosen to write about SQL. In the long run, I’m guessing that direct SQL programming won’t be necessary because we can use ORMs instead. I might as well get acclimated with SQL since I will be dealing with databases for the rest of my life. On a positive note, SQL is very inflexible, unforgiving, and cruel, and it reminds me of the way programming languages used to be. It is the opposite of writing code in Ruby and good practice to help prepare for working with any other language. When programming is easy, it freaks me out because I’m worried that someone else is doing all the legwork for me and I may not grasp the underlying concepts.

One of our labs was centered on interpreting data from the Daily Show. The information was located in a csv file; our objective was to parse the data into an SQL database and use Ruby+SQL methods to find certain statistics. Some of the bits of information the lab asked for were beyond our basic grasp of SQL. I banged my head against the objectives until I finally found a way to squeeze and twist SQL to achieve the objective. This is what that SQL looked like.

… if you don’t understand what the SQL is doing, it’s ok. I created this code and I have no freaking clue how it works.

So I’m going to restart. After coming back to the code, I found a much more elegant solution that I can actually explain. I think that if you can’t explain the process, than you haven’t reached a good solution. This new solution is a result of time spent letting the concepts simmer and also an epiphany (epiphany is shorthand for I asked for help from Gabi, someone with more knowledge on the subject).

This is still SQL inception, but the code is much more efficient than the trainwreck I wrote a week earlier. This is a diagram of the basic concepts I am trying to explain.

After being parsed into an SQL database file, the Daily Show information was organized into the categories shown in Figure 1.

Figure 1: Daily Show Database

The data is pretty self explanatory, it holds information on the date and year of the show along with information on the guest. Some of the hardest tasks were based upon organizing the data in SQL. In this example, the task was to return the year with the most guest appearances by profession. This information isn’t readily available because the data needs to be grouped by both year and by appearances. Since SQL can really only handle one function at a time, we need to manipulate the same data table three times before getting the information that we want.

This is where the concept of database inception comes into play; we need to reorganize the data through several iterations before it becomes presentable. This is more due to the limitations of SQL than anything else. If we were to use Ruby or Active Record for help, this would be a much easier task. The SQL functions we need are a count, sum, and then a max in that order. Because we need three different math functions, we must iterate through the same table three times.

Table C is our end result. First we will start with Table A.

Our first table takes count of how many guest appearances occurred each year by profession.

Our next interpretation takes these individual counts by year and just grabs the highest guest appearances in a year.

This is the data we want. If we wanted to go one step further, we could find the year with the most appearances by profession. Our final interpretation takes just the max row from the previous table.

The picture below shows a different ways to visualize the organization and execution of the SQL code to find the most guest appearances by year. Three different SELECTs were needed because we needed three separate math functions, and thus three separate GROUP BY commands, to obtain the data we need.

We have to do database inception because SQL is a simple language. 90% of the commands we use just read data. When we ask SQL to complete functions, it can only complete functions one at a time as the function (SUM, COUNT, etc.) must by connected to a GROUP BY. Thus, if we need to complete more than one function on a table, we need to call the table multiple times within SQL.

--

--