SENSE: Solving the SQL Murder Mystery with a Large Language Model Business Intelligence Tool

Yeo Yong Kiat
Government Digital Services, Singapore
5 min readJun 22, 2024

For those of you who have been following developments at GovTech, we’ve recently launched SENSE, a whole-of-government business intelligence tool that is able to understand policy language. This tool allows government officers to perform complicated queries in natural language in order to extract insights from their government databases (our previous post here).

Check the video out if you haven’t yet seen it:

Of course, the folks here at GovTech can’t let our product go without stress-testing it, right? What better stress-test than the SQL Murder Mystery game?

What’s the SQL Murder Mystery?

For most programmers who picked up a first course in learning how to write SQL code, they are typically introduced to a fun game known as the “SQL Murder Mystery”. This was a challenge created by Joon Park and Cathy He while they were Knight Lab fellows, where you (the detective) are supposed to solve a murder mystery through a creative construction of SQL queries.

For those who are interested in understanding the underlying database, it’s made up of 8 data tables, which you can either download from the Kaggle or the GitHub repository.

But the long and short is that you’re given a set of clues, and you’d have to navigate through these 8 data tables using SQL queries to finally solve the puzzle of who the murderer is.

Did I say a set of clues? Haha, I meant only one clue:

A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a ​murder​ that occurred sometime on ​Jan.15, 2018​ and that it took place in ​SQL City​. Start by retrieving the corresponding crime scene report from the police department’s database.

How can SENSE help?

Here’s where SENSE comes in really handy — a detective now no longer needs to write SQL to solve this murder mystery. Conceptually, we should be able to solve the murder mystery through a series of well-placed queries in natural language.

Let’s take a look at the video below to see what I mean — we video-ed ourselves using SENSE to solve the SQL Murder Mystery, in real time, without all the fast-forwards or jazzy narration:

And we took… a grand total of 20 minutes (including all the narration and pausing for the tutorial). Was that any faster than writing out SQL code, if one could?

How much did SENSE help?

Well typically, most people report spending anywhere between 30 minutes to about an hour trying to translate the detective logic into a series of equivalent SQL statements.

Quite a few people have solved this challenge online, so here’s an example of the SQL statement required to solve the problem (taken from maartenterpstra’s Github):

SELECT description
FROM crime_scene_report
where date = 20180115 and city = 'SQL City' and type = 'murder';
-- Security footage shows that there were 2 witnesses.
-- The first witness lives at the last house on "Northwestern Dr".
-- The second witness, named Annabel, lives somewhere on "Franklin Ave".

select id, name from person
where address_street_name = 'Northwestern Dr'
order by address_number DESC limit 1;
-- 14887 Morty Schapiro

select id, name from person
where address_street_name = 'Franklin Ave'
and name like 'Annabel%'
-- 16371 Annabel Miller

select transcript from interview where person_id in (14887, 16371)
-- I heard a gunshot and then saw a man run out. He had a "Get Fit Now Gym" bag. The membership number on the bag started with "48Z". Only gold members have those bags. The man got into a car with a plate that included "H42W".
-- I saw the murder happen, and I recognized the killer from my gym when I was working out last week on January the 9th.

-- Let's follow the first clue
select person.name
from get_fit_now_member
join person on get_fit_now_member.person_id = person.id
join drivers_license on person.license_id = drivers_license.id
where membership_status = 'gold'
and get_fit_now_member.id like '48Z%'
and plate_number like '%H42W%'
-- Jeremy Bowers
-- There is the murderer and the site confirms it:
-- Congrats, you found the murderer!
-- But wait, there's more...
-- If you think you're up for a challenge, try querying the
-- interview transcript of the murderer to find the real villian
-- behind this crime. If you feel especially confident in your
-- skills, try to complete this final step with no more than 2 queries.
-- Use this same INSERT statement to check your answer.

select transcript from interview
join person on person.id = interview.person_id
where name = 'Jeremy Bowers'
-- I was hired by a woman with a lot of money.
-- I don't know her name but I know she's around 5'5" (65") or 5'7" (67").
-- She has red hair and she drives a Tesla Model S. I know that she
-- attended the SQL Symphony Concert 3 times in December 2017.

select name from person
join facebook_event_checkin on person.id = facebook_event_checkin.person_id
join drivers_license on person.license_id = drivers_license.id
where event_name = 'SQL Symphony Concert'
and date like '201712%'
and car_make = 'Tesla'
and car_model = 'Model S'
and height between 65 and 67
and hair_color = 'red'
and gender='female'
group by person.id having count(*) == 3
--Miranda Priestly

That’s a pretty amazing feat! We achieved an efficiency increase of between 33% to 66%, depending on how long it would have taken you to write the equivalent SQL queries above. One can imagine that the efficiency increase would be even greater for more complex data investigations.

In Summary…

We learnt quite a bit today — in order to get the fullest potential out of SENSE, some good practices include:

  • Knowing your data: ask SENSE for the datasets you have and know their peculiarities
  • Questioning back: when in doubt, clarify with SENSE on its approach
  • Examining the SQL: generate the SQL (if it wasn’t produced) and double-check SENSE’s methodology
  • When there’s no metadata, be specific: reference the data tables and datafields that you require

Embrace AI folks, and learn to use it. You’ll find work a lot more interesting, and less routine. For one thing, you’ll be chasing down murderers more than chasing down numbers!

--

--

Yeo Yong Kiat
Government Digital Services, Singapore

Teacher l Data Analyst | Policy Maker: currently exploring the tech sector