SQL Murder Mystery
There has been a murder. The detective in charge needs me, the resident data analyst, to help solve the crime. I was given the crime scene report but unfortunately lost it. I however remember two(2) key details -
- The murder occurred in SQL city.
- It occurred sometime on 15th January, 2018.
This is what the Police precinct database schema looks like :
Fortunately for me, this precinct has gone digital already so all the clues needed to solve the case are buried in a large database. I will be using SQL to navigate through this vast network of information. My first step is to retrieve the crime scene report that corresponds to the case.
Great! We have seen a snapshot of what kind of data our crime scene report contains. I will now filter this report to show me only entries that match the information I recall : day (2018–01–15), location (sql city) and type of crime (murder). An entry must meet all three criteria before it is selected.
I opted to use ‘LIKE’- keyword so that I do not miss any data because of spelling and so on. We notice that there is only one entry that meets all 3 requirements. I want to read the entire description to get more clues so I will dive further into the description only.
Awesome ! I finally have more information than what I started out with. The crime occurred in a spot with CCTV. I have the first witness’ home address (Northwestern Dr) and the second witness’ name and home radius (Annabel, Franklin Ave). Armed with this update, I head over to the person table. Let’s see a snapshot of that database to be sure it can help us in our investigation.
The table contains information that will help me move forward in my investigation. I want to get all entries in that table containing information of any one who lives at Northwestern Dr and any Annabel’s living at Franklin Ave.
That is too many people to sift through to arrive at the relevant two. What can I do to cut down on that ? Viola ! The first nameless witness lives in the last house on Northwestern Dr so it is reasonable to assume that their home will have the largest address number.
Yes ! We have our two key witnesses now: Morty Schapiro (with id 14887) and Annabel Miller (with id 16371). All interviews are stored in the interview database using person id as the identifier to record what was divulged during the interview. Let’s find out what our witnesses said.
This is what the detective got from Annabel
“I heard a gunshot and then saw a man run out. He had a “Get Fit Now Gym” bag. The membership number on the bag started with “48Z”. Only gold members have those bags. The man got into a car with a plate that included “H42W”.”
This is what the detective got from Morty
“I saw the murder happen, and I recognized the killer from my gym when I was working out last week on January the 9th.”
From the corroborated witness statements, we gather the following leads :
- Suspect likely works out at a gym, specifically ‘Get Fit Now Gym’
- Suspect is possibly a Gold member of that gym (48Z).
- Suspect’s car plate number contains ‘H42W’
- Suspect worked out on January 9th
We look through ‘Get Fit Now Guy’s check-in table for who worked out on January 9th and with a Gold membership.
Only two members with id 48Z7A and 48Z55 met the required criteria. To get more information about these members, we will look into their member database.
The id’s obtained match to the members Jeremy Bowers and Joe Germuska.
Let’s see which cars in the Driver and Vehicle Licensing Authority (DVLA) database have license plates matching the witness description.
Ohk, we have obtained 3 people registered on the system that meet the required criteria. They have the license ids : 183779, 423327, and 664760. Let’s look through the person table to find out who they are.
One name comes up again, Jeremy Bowers. Could he be the guy ? Let’s confirm with all the information we have collected so far.
Confirmed ! Jeremy committed the murder. Jeremy was taken in for questioning by the detective, let’s see what he had to say. We will query the interview table using his person id (67318) to achieve this.
This was his confession :
“I was hired by a woman with a lot of money. I don’t know her name but I know she’s around 5'5" (65") or 5'7" (67"). She has red hair and she drives a Tesla Model S. I know that she attended the SQL Symphony Concert 3 times in December 2017.”
It seems there was a Mastermind behind the murder Jeremy committed. Searching through the DVLA database with Jeremy’s description, we discover that there are three red haired women of height range 65” to 67” who drive a Tesla.
There women have ids : 202298, 291182 and 918773. Let’s look through facebook-event-checkin to see who was at the SQL Symphony Concert three times.
Well ….. Well …. well , who do we have here ? Miranda Priestly seems to be the mastermind behind the murder.
Just to cover our bases, let us check to see if Miranda is indeed rich. What is the average income in our database ?
The average income for women in our database is 53559.09. Miranda Priestly earns well above that with a whooping 310,000.
She is definitely rich and checks all other boxes. My work is done, and I can go home now. See you next time a mystery needs solving with SQL.
To access the dataset, visit my GIT (https://github.com/rolanda4/murder_mystery ) . See other projects by me here (https://rolanda4.github.io/ ). Let’s connect on LinkedIn ( https://www.linkedin.com/in/rolanda-azeem/ ) . Visit the original Murder Mystery website here (https://mystery.knightlab.com/)