SQL Murder Mystery — Solved

Gerald John Alviar
8 min readAug 13, 2023

--

The SQL Murder Mystery is designed to be both a self-directed lesson to learn SQL concepts and commands and a fun game for experienced SQL users to solve an intriguing crime. Developed by Northwestern University Knight Lab — Chicago • San Francisco.

The SQL Murder Mystery

A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a ​murder​ that occurred sometime on ​Jan.15, 2018​ and that it took place in ​SQL City​. Start by retrieving the corresponding crime scene report from the police department’s database. The first step to solving the mystery is to retrieve the corresponding crime scene report from the police department’s database.

Schema diagram of the police department’s database

SQL Commands used in this exercise:

/* My take aways from this exercise is located at the very bottom. */

SELECT -- Extracts data from a database
FROM -- Specify the location of the table you will extract the data
INNER JOIN, ON -- keyword selects records that have matching values in both tables
WHERE -- Clause used to filter the output
ORDER BY -- keyword is used to sort the result in ascending or descending order
LIKE -- Performs a partial string match
INSERT INTO -- Adds new rows to a table
VALUES -- Specifies the values to be inserted
DISTINCT -- Filters out duplicate values from the result set
GROUP BY -- Groups rows based on specified columns
HAVING -- Filters groups based on aggregate functions
COUNT -- Calculates the number of rows or non-null values
AS -- Assigns an alias to a column or table
AND -- Combines multiple conditions in a WHERE clause
OR -- Combines multiple conditions, at least one of which must be true

As I initially check the schema, I have to look at the statements located at the ‘interview’ table.

My first query to check for transcripts

The transcripts of every person are at 4991 based on a separate query.

Count of transcripts from the interview table

Since this is tedious to read, I have to read the initial clues provided:
• The crime is murder
The crime happened on Jan 15, 2018​
Took place at SQL City

By then, I checked the “crime_scene_report” table where the city should be at SQL City at the time of Jan 15, 2018

SELECT type, description, city FROM crime_scene_report
WHERE date = 20180115 AND city = 'SQL City'

/* English translation of the query is:
Select the type, description and city columns from crime_scene_report table
where the date is 20180115 and the city is SQL City.


The date is in integer format base on the schema */
There are 3 crimes committed last January 15 at SQL City, one of them is a murder.

Since this is SQL Murder mystery, I should focus on the type of crime which is murder. A clue provided here in the “crime_scene_report” that there are two witnesses, the first one lives at the last house on Northwestern Dr and another one whos name is Annabel and lives somewhere on Franklin Ave.

Let’s find the first witness that lives at the last house on Northwestern Dr.

SELECT name, address_number, address_street_name FROM person
WHERE address_street_name LIKE '%Northwestern%'
ORDER BY address_number DESC

/* English translation of the query is:

Select the name, address_number and address_street_name columns from person table
where the address_street_name should have Northwestern on it
and order it by address_number descending */
These are the list of addresses living in Northwestern Dr with a count of 50 sorted descending which provides the value of a certain “Morty Schapiro”

Let’s check Morty Schapiro’s statement if we can get more leads to this crime.

SELECT person.name, interview.transcript FROM interview
INNER JOIN person
ON interview.person_id = person.id
WHERE person.name = 'Morty Schapiro';

/* English translation of the query is:

Get the names and the transcript from the interview table. Combine information
from the "person" table with data from the "interview"
table by matching the person's ID from the "interview"
table to the ID in the "person" table

Filter the results to only show records with person's name is 'Morty Schapiro'. */

The clues provided by Morty Schapiro are the gender, membership number and car plate, but first let’s check the statement of second witness.

SELECT name, address_number, address_street_name FROM person
WHERE address_street_name LIKE '%Franklin Ave%' AND name LIKE 'Annabel%';

/* English translation of the query is:

Select the name, address_number and street name columns from person table
Filter out the address_street_name to display addresses with "Franklin Ave"
and a name like Annabel. */
A certain “Annabel Miller” retuned from our query

We will now check Annabel Miller’s statement if we can get more leads to this crime.

SELECT person.name, interview.transcript FROM interview
INNER JOIN person
ON interview.person_id = person.id
WHERE person.name = 'Annabel Miller';

/* Select the person's name, and transcript column from the interview table
Combine the person table by matching the "person_id" from the interview
table and with the "id" in the person table. */
Transcript of Annabel Miller provided more clues to the murder scene

By having the statements of both witnesses, we can now query that the killer is a man, the “membership_status” which is Gold members only, the “Get Fit Now Gym” bag, car plate included LIKE “H42W”, and second witness recognized the killer when working last week January 9th. Let's try to query this as a whole and use alias as this query get long.

SELECT membership.name FROM get_fit_now_member AS membership
INNER JOIN get_fit_now_check_in AS checkin
ON checkin.membership_id = membership.id
INNER JOIN person
ON membership.person_id = person.id
INNER JOIN drivers_license as drivlic
ON person.license_id = drivlic.id
WHERE checkin.check_in_date = 20180109
AND membership.membership_status = 'gold'
AND drivlic.plate_number LIKE '%H42W%';

/* Select the member's name column from the 'get_fit_now_member' table. Name
the table as 'membership'. Combine information of the 'membership' table
table and 'get_fit_now_check_in' aliased as 'checkin' by matching the
'membership_id' of 'membership' table with the '

By providing some of the conditions, we came up with a certain name of Jeremy Bowers. Lets try to connect the gender (well, we have to confirm this nowadays) and the membership number and lets see if the query returned the same name or no data.

Given all of the conditions stated on the SQL query, we now found the suspect on the murder case. Let’s try to check if our answer is correct.

I will now check Jeremy Bowers’ statement regarding to find the real villain behind this crime.

SELECT person.name, interview.transcript FROM interview
INNER JOIN person
ON interview.person_id = person.id
WHERE person.name = 'Jeremy Bowers';

/* Select the 'name' table and it's transcript from 'interview' table.
Combine the 'interview' table from the 'person' table through 'person_id'
of 'interview' table and 'id' table of the 'person' table.
Filter the 'name' table by the name of 'Jeremy Bowers'. */
Jeremy Bowers’ statement with some whereabouts

Given the statement of Mr. Jeremy Bowers, these are the clues that will lead to the perpetrator.
• Woman
• A lot of money (Annual Income)
• 5'5" — 5'7"
• Red hair
• Drives a Tesla Model S
• Attended SQL Symphony Concert 3 times, December 2017.

I will try to create an SQL query that will relate to the statement using a single query.

SELECT person.name, inc.annual_income,drivlic.height, 
drivlic.car_make || ' ' || drivlic.car_model AS 'Car Make and Model',
fbcheckin.event_name AS 'Event Name' FROM person
INNER JOIN drivers_license AS drivlic
ON person.license_id = drivlic.id
INNER JOIN income AS inc
ON person.ssn = inc.ssn
INNER JOIN facebook_event_checkin AS fbcheckin
ON fbcheckin.person_id = person.id
WHERE drivlic.hair_color = 'red'
AND drivlic.gender = 'female'
AND drivlic.car_make LIKE 'Tesla%'
AND drivlic.car_model LIKE '%Model S%'
AND drivlic.height BETWEEN 65 AND 67
AND fbcheckin.date LIKE '201712%'
AND fbcheckin.event_name = 'SQL Symphony Concert'
GROUP BY person.name
HAVING COUNT(fbcheckin.person_id) <= 3

/* Select the name, annual income, height, car make and model, and event name
from the 'person' table. Connect the 'person' table to the driver's license table,
income table, and FB Event checkin table and be aliased as 'drivlic', 'inc', and 'fbcheckin'.
From there, filter the result by:
having a hair color of red,
female gender,
the car make and model of Tesla Model S,
height between 65 and 67
and who checks in to the FB Event dated December 2017,
named SQL Symphony Concert. */

Let’s check if our perpetrator is correct.

We now found the perpetrator.

My take away from this exercise as follows:

SQL Basics: I do have fun doing this exercise for the reason of I can use the SQL basics that I’ve learned from various SQL exercise websites.
Data Analysis: Upon viewing the schema, I can gather what data I need by formulating precise SQL queries.
Problem-solving: As the query intensifies, I develop a critical thinking and problem-solving skills to determine the killer and also the mastermind of the murder.
Data Relationships: By the use of SQL joins, different primary and foreign key of every tables, I can establish relationships between them.
Query Optimization: Providing an alias to every table as the query progresses will help the server to extract data with optimal performance and accuracy.
Real-world Applications: The game demonstrates a basic real-world simulation of extracting data and make decisions based on it.

Do you have any questions about my post or recommend me some exercises? Contact me at:
LinkedIn

— Gerald John Alviar, Junior Data Analyst

--

--