I Also Wrote a Bot for HQ Trivia
HQ Trivia — if you haven’t heard by now — is a live trivia competition that happens on your iPhone. For 15 minutes, hundreds-of-thousands of users tune in to hear Scott Rogowsky (or one of a rotating cast of special guest hosts) read out trivia questions, each with three possible answers. Get all twelve questions correct, and you win some cash. Miss a question? Try again the next game.
You only get ten seconds from when the host starts reading the question to answer, so only the fastest typists have a hope of parsing the question, coming up with a good Google search query, glance through the results, and then responding in the app in that time. Clearly, this is something ripe for botting.
A First Shot
My first idea was to sniff the API requests between the app and HQ’s server. Using Charles Proxy, I looked at the traffic, but unfortunately the questions weren’t there. I saw API requests for getting live game data (such as start time and prize amount), but I couldn’t find where the questions were being sent from.
I did a little bit of research, and it turns out the streaming tech HQ uses allows other data to be shoved alongside the video data. Regardless, this video data was bypassing my proxy. Because I was (and still am) too lazy to set up an invisible proxy, this avenue was a dead end.
But my research did lead me to an article about someone else doing a similar thing, which got my mind working.
Standing on the Shoulders of Giants
I looked at the video in the article linked above and thought “I could do that.” Within a couple of hours, I had a basic bot going: I used Toby Mellor’s method of using an automator script to take a screenshot and then call some Python I had written. That Python script did a Google search of the exact question text and quickly scanned the snippets returned for the answer terms. Whichever answer appeared the most was reported to be the correct one. It wasn’t perfect of course, but it was a start.
I started to improve things. Two weeks later, it looks like this:
My goal for the project was screenshot to a confident answer in no more than 5 seconds. I chose 5 seconds because you need to wait for the question to fully render (about a second and a half) and time to actually tap the answer on your phone (call it another second), and then some wiggle room.
Like in Toby’s script, it starts with an actual video of the game (lucky for me, there are plenty of recordings of previous games on YouTube). I decided to ditch the Automator script to make things easier and because starting a new console window for every question ate in to my precious, limited time.
I use the macOS built in screenshot tool to take a screenshot of the question and possible answers. The tool saves the image to a directory that my script is monitoring for new images. Once the script detects a new screenshot, the image is OCR’d to extract the question and the three possible answers. The question is then “cleaned up” (more on that later). From there, the question data is passed in to different solvers, each running in their own thread(side note: this is the first time I’ve used Python 3’s futures capabilities and I really like it). Each solver returns which answer it thinks is best as well as a loose notion of its confidence in the answer.
Then the cycle repeats for the next question.
My original OCR was similar to Toby’s: I used Google’s Vision API to OCR the text and return the data back to me. This was very quick to implement and the results worked very well. A few days later, I decided to do my OCR locally using Google’s Tesseract Engine. This worked incredibly well and was much faster, because I didn’t have to pay the round-trip cost to Google’s servers (not to mention, I didn’t have to pay for it, either).
Tesseract isn’t perfect. For example, I was testing the following question:
Tesseract decoded the second possible answer as “Paciﬁc” which wasn’t found on any text snippet for the search “In which ocean would you find Micronesia?” This was quite odd, because of course the word “Pacific” is there! Well, if you look above, you’ll notice the word “Pacific” doesn’t contain an ‘f’ character followed by an ‘i’ character. It turns out that the ‘fi’ in Pacific was actually the latin small ligature ﬁ, which was causing issues with string comparisons.
The speed gains from doing OCR locally were too good to pass up (~400ms locally vs ~1200 ms on Google’s servers). It turns out that you can train Tesseract on fonts. I grabbed the IPA for HQ Trivia (which was an adventure with Apple’s Configurator utility) and extracted the font files. I used Anyline’s Train Your Tesseract to generate training files. Additionally, I added some code so if there are ever any “odd” characters in the output, we fallback to Google’s Vision API.
Cleaning the Question
Now that we have the raw question data, it’s time to clean things up to make all subsequent searches easier. First off, lower case everything. This makes string compares in snippets, web pages, etc much much easier. Second off, make everything singular: this improves matching again and the Python library inflect makes this easy. Also, we do other small things like dropping articles where appropriate and otherwise ‘standardizing’ on a proper way of presenting things.
We build all of this in to a Question object in Python so each solver has a single, consistent thing to look at.
Strategy and Solvers
We took our screenshot. We got it in to text. We cleaned things up a bit. Now it’s time to actually get an answer. The script takes that question object and passes it to several different methods of answering the question, each running in their own thread.
I still run the naive search I had first implemented, but I also use things like the Google Language API and Amazon’s Comprehend to parse and analyze entities present in questions and answers. Fun side note: this lead to a really odd bug I found in Google’s entity recognizer, where a film was tagged wrong.
The question was: “Which of these action movie stars did voice-over work in “Finding Nemo”? a) The Rock, b) Gerard Butler, c) Eric Bana
I sent the question off to Google’s Natural Language API and this was part of the response I got back from Google:
"name": "Finding Nemo",
"content": "Finding Nemo",
Notice anything odd? Yeah, that Wikipedia URL is not quite correct. As of this writing, the bug still exists.
(update on January 24, 2018 — this bug has been fixed)
Testing and Improving
How do I know this bot works? More importantly, if I make a change to some logic in a solver, how do I know that change made things better? It’s a pain to go through each video on YouTube every time.
In writing the bot, I had accumulated TONS of screenshots of previous questions. I whipped up a quick script to take those questions, OCR them, and dump them in to a CSV document. I didn’t have the answers, of course, so I had to go through the videos manually and fill in answers I could not Google manually.
This made a nice battery of questions and answers I could quickly test against. (You’ll note that this doesn’t test the OCR, but I consider that to be basically solved at this point).
I have another script that runs through that CSV file and generates scores for each solver and an overall accuracy. I can enable and disable strategies at will in my test bench and I can narrow down the questions to test against if I’m trying to optimize for something.
Where to go from here…
My overall accuracy is hovering around 85% based on my ever-growing set of test questions. Obviously, this can be improved, but there are other things I would like to implement.
Classification of Questions
First, I would like to add classification of questions. Right now, there are certain solvers that are better than others at certain types of questions. For example, the Wikipedia solver works best if the question’s possible answers are all proper nouns. I am currently working on a “Quorum Resolver” to take the results from each solver and produce just one answer. This will be especially potent if I can get a Machine Learning classification model working so the system will know which solvers to trust more for certain types of questions (so the solver will have confidences in its answers and the quorum system will have confidences in the solvers).
Questions where my bot doesn’t do well? Anything involving a strict ordering (“What is the biggest…”, “Who was the first…”) doesn’t work so well with any solver. It also has issues with questions that cover multiple topics (“Which of the following is a metal and a Kevin Bacon film? a) Gold, b) Quicksilver, c) Lithium”).
A Better Battery
My assortment of test questions is just randomly cobbled together. I would really like to go through and categorize each one to make sure I’m covering my bases properly. I would also like to build some historical data, so I can keep track of how my bot does against those questions over time (probably broken down by solver). I would also like to abstract away Google’s search engine for testing, since running the test bank costs a lot of API hits and I overran my daily quota the first time I did it.
Overall, this was a very fun project to work on in my spare time. I learned quite a bit about OCR, Machine Learning, and Natural Language Processing.
Other articles have gone over what HQ can do in order to prevent bots from ruining the game for everyone. Most of these center around making the OCR impossible, therefore making all the subsequent processes impossible.
Since running these bots against a live game violates the HQ Terms of Service and is a moral gray area as there is real money at stake, I am all for HQ keeping bots out of the game. That said, I think it would be a fun challenge for HQ to provide an API for question data and have an HQ bot competition. I know I’m not the only one working on one of these, and I’d love the opportunity to pit my bot against others.