SQL Insider- Case I- How to solve the SQL Murder Mystery using only two queries!
If you’re already familiar with SQL or just dipping your toes into the world of database querying, you’ve probably heard of the notorious SQL Murder Mystery set in the adventurous SQL City. But what if there’s a clever way to crack the case using just two queries? Follow along till the end to discover the secrets behind the SQL City’s most baffling crime!
Warming Up
Like glass is to Sherlock Holmes, SQL is to a Data Scientist! 🕵🏻
SQL (Structured Query Language) is a powerful query language that allows a standardized way to interact with databases, allowing creation, retrieval, updating and deletion (CRUD) of data efficiently. Data analysts and data scientists heavily rely on SQL for their day-to-day tasks due to its ability to extract valuable insights at a glance from large datasets. It is vastly used to perform queries of filter, aggregation and joins to uncover complex business questions and gain deeper understanding from the available information.
Before the development of SQL (Structured Query Language) in the 1970s, there were various other query languages used to interact with databases, such as QUEL (Query Language), DML (Data Manipulation Language), QBE (Query by Example), ISBL (Information Systems and Business Language) etc. While these early query languages were used for specific database systems, SQL emerged as a standard language that could be used across different RDBMS, leading to its widespread adoption and becoming the reliable go-to for interacting with structured databases.
Getting Started
Well, now that we have a hold on what exactly is SQL and how it emerged to be one of the must-have skills for Data Analysis, let’s get our detective hats on to uncover the mystery behind the SQL City’s most baffling crime!
My advice, try it for yourself before going ahead here, to relish the thrill that comes with learning by doing! Don’t worry if you stumble, you can always come here and find your way back!
Sleuthing Begins
If you’re just stepping into the realm of SQL, going through the Beginner’s Walkthrough on their website will be immensely helpful in building foundation of core concepts like select *, where clause, wild card etc.
Remember: Before delving too deeply into finding a solution, it’s always wiser to start by asking questions to identify the actual problem we seek to solve. This not only brings clarity to our objectives but also conserves valuable resources like time and energy by minimizing trial and error.
1. What’s the case? (Problem Statement)
A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a murder that occurred sometime on Jan.15, 2018 and that it took place in SQL City.2. What are the places we should be looking at? (Data Points and sources available)
3. How do we get started? (Determine the first step)
Start by retrieving the corresponding crime scene report from the police department’s database. Also, study the table’s contents at a glance using this query, to have an idea of what we’re starting with: SELECT * FROM sqlite_master where type = ‘table4. How do we know we solved the case? (Determine what success looks like) Identify the murderer’s name and run the following query to find out the results:
Querying
Now that we know enough about the case, let’s start our sleuthing and dig up further information.
Let’s begin with the longer route to get to our destination, because this allows room for exploration and expands the scope of our learning.
At the end of this article, you’ll find out how to reach the killer in just two queries, so let’s get started.
We will structure our search in the following order, so it’s easy to follow the leads.
Process: Think about what we have and what we need
Query: SQLite query to get the results
Output: Result of the query.
Lead 1
Process: As we know from our preliminary questionnaire, we need to start by looking for the crime scene record in the police department’s database for a murder that took place in he SQL city at 15th Jan 2018.
Query: select * from crime_scene_report where type = ‘murder’ and date = ‘20180115’ and Lower(city) like ‘sql city’;
Output:
Lead 2
Process: From the previous query, we now know about two witnesses to this case:
a. Someone from Northwestern Dr (last apartment)
b. Annabel from Franklin Ave
Let’s find out more about them
Query: select * from person where address_street_name = ‘Franklin Ave’ and name like ‘Annabel%’ or (address_number = (select max(address_number) from person where address_street_name = ‘Northwestern Dr’))
Output:
Lead 3
Process: Now that we know their ‘person_id’ (which is the foreign key to most of the tables except drivers_license), let’s see what they had to say in their interviews.
Query: select a.person_id, b.name ,a.transcript from interview a left join person b on a.person_id = b.id where a.person_id = ‘16371’ or a.person_id = ‘14887’;
Output:
Lead 4
Process: We know enough information to identify our suspects for this case. Let’s dive into the gym’s membership data to list out our suspects.
Query: select * from get_fit_now_member where id like ‘48Z%’ and membership_status = ‘gold’;
Output:
Lead 5
Process: Since we’ve now identified all those who fall under suspicion, let’s see who from these have checked in to use the gym on Jan 9 2018, just like Annabel (witness 2) mentioned in her interview. We will identify our suspects using their ‘membership_id’.
Query: select * from get_fit_now_check_in where check_in_date = ‘20180109’ and (membership_id = ‘48Z79’ or membership_id = ‘48Z55’);
Output:
Lead 6
Process: Now that our suspicions have been confirmed, let’s verify our results with Morty’s (witness 1) statement too. Here, we will use the drivers_license information, to validate the results we acheived.
Query: select a.id, a.name, a.license_id, b.plate_number from person a left join drivers_license b on a.license_id = b.id where a.id = ‘67318’
Output:
Lead 7
Process: Looks like we’ve reached our goal! But have we yet? Let’s see what our ‘check_solution’ query has to say about this
Challenge Accepted! Let’s hear out what Jeremy has to say.
Query: select a.person_id, b.name ,a.transcript from interview a left join person b on a.person_id = b.id where a.person_id = ‘67318’;
Output:
Final Lead
Process: Thanks Jeremy, that was quite the description! Let’s see how can we identify the mastermind in just one query, followed by a verfication of our results. Finally, we’ll ask ‘check_solution’ if we’ve made it to the mastermind of the plan successfully.
Query:
(a) Mastermind Demographics
select distinct(a.id), a.name, d.height, d.hair_color, b.annual_income, d.plate_number, d.car_make, d.car_model from person a left join income b on a.ssn = b.ssn left join facebook_event_checkin c on a.id = c.person_id left join drivers_license d on a.license_id = d.id where c.date like (‘201712%’) and c.event_name like (‘SQL Symphony%’) and d.car_make = ‘Tesla’
(b) Verify the result with Jeremy’s statement
select a.name, d.height, d.hair_color, b.annual_income, c.date, c.event_name, d.car_make, d.car_model from person a left join income b on a.ssn = b.ssn left join facebook_event_checkin c on a.id = c.person_id left join drivers_license d on a.license_id = d.id where c.date like (‘201712%’) and c.event_name like (‘SQL Symphony%’) and d.car_make = ‘Tesla’
The Solution, you ask?
Finally, the shortcut
Now that we’ve travelled the long and scenic route, let’s see how we reach there in 2 simple steps:
For the witnesses to have given their testimony, all of them they should have been together in the same place when the incident took place, i.e on the evening of January 15, 2018, perhaps attending the same event.
Our witnesses were enjoying their evening at ‘The Funky Grooves Tour’. Now we just need to find out who else was accompanying them.
Et Viola ! You have the name of the killer who carried the incident out. And well, now you know how the story ends…
Wrapping Up!
Since this was a hypothetical situation, we got lucky and found a shorter way to our desired outcome, but in the real-world, seldom that is the case.
It’s a recommended practice to start by asking the right questions, note the key findings, report the findings to the stakeholders and collect their feedback as early on as possible in analytical projects.