Quize is a game where you have to hit 15 questions to win the competition. It is a Brazilian App based on his American brother HQ Trivia that came to worth it more than 100 millions of dollars.
For 15 minutes, tens of thousands of players meet in a broadcast to compete for prizes that already arrived in about 13500 dollars. There are two rounds by day, being one in the morning and the other in the night. There’re also extra rounds on Mondays, Wednesdays and Fridays.
Only 10 seconds are granted to answer each question, this makes it impossible to use some search engines manually. The prize is shared with all of the players that answer correctly all the 15 questions.
I meet Quize at work, before that I already knew and had already played the HQ Trivia. On the same day, arriving at home, I’ve already started writing the first lines of the project that would come to be called Tita.
None of the times Tita was used live, the transmissions were recorded using Vysor and after Tita was used.
My first version of Tita was created in a short time and did the obvious:
A print of the screen was taken, the text was extracted using PyTesseract, a request was executed with the question and then I counted the occurrence of the answers. The one that got the most value should be the option to be checked.
But… I didn’t want to stop there, I wanted to speed up the things and increase the performance.
I took as a first challenge to improve the speed and at the same time the accuracy of PyTesseract, after some time reading the documentation I decided to carry out the process of thresholding in the images:
With the thresholding I was able to read the texts faster and with almost no error (each reading takes around 400 ms).
After solving this problem I set out to build the search system. As I said, initially I did a simple search, but she was not being good enough.
After some time formulating a good search engine, I built 10 types of searches that basically boil down to searching for the answers on Google, Wikipedia and in the first Google’s page. Along with this I also perform certain transformations on the strings to improve the searches, like applying techniques of NLP (Natural Language Processing) to remove Stopwords, find the key word of the question, removal of symbols and accents and applying stemming technique.
With this I was able to perform several searches, in such a way to extract Snippets and the accurate answers from Google.
Using the BeatifulSoup library, I was able to search the soups for the specific ids of each object:
· Direct Answer — Z0LcW
With the direct answer I get the best answer possible, because of this, your score is very high.
· Snippet — LGOjhe
With the snippet I get a summary of the question, in many cases I get a satisfactory result with my scoring system.
Due to the increased complexity of the project and the lack of parallelism the task of finding the response time (<10s) was impossible. Because of this it was necessary to write the parallel search functions using the Python Multiprocessing library.
With this I was able to reduce the application time of search techniques to only 1 ~ 1.5 seconds. With the result in hand, I decided to extend Multiprocessing to other parts of Tita’s flow, I decided that the process of capturing the screen would be divided (previously I just took a Screenshot and cut the interesting parts of the image) in 3 prints in the areas of interest (plus time gain ~ 300ms for the capture part). Parallelism was also applied at treatment time.
The final search time was around 4 seconds, from capture, text extraction, text transformations, searches, calculations and return.
Some transformations deserve to be highlighted, such as:
· The removal of words in common between the two answers;
· Removing special characters and accents;
· Removal of Stopwords.
To better organize the various results from the 10 searches I built a scoring system, depending on their importance, I created a constant that multiplies the occurrence of the answers in the research returns.
I decided that the score for the Direct Answer would be the highest possible, more precisely 10,000,000, for every normal occurrence on the page I would multiply by 300, etc. An important detail for the calculation of K_Search, an algorithm that I built to increase search performance, is calculated as follows:
(Occurrence * (number of words)) ^ (number of words + 1)
The sentence “Normal Again” appears 13 times in a given text, and the word “Normal” appears 18 times in the same text.
The score for the first sentence is: (13 * 2) ³ = 17576.
The score for the second sentence is: (18 * 1) ² = 324.
In this way I was able to create a greater force for the right combinations of words and to avoid computing meaningless words to the wanted.
K_Search is Tita’s hearth algorithm, initially I create all possible combinations of each response, for example, for the sentence “Felype Castro Bastos” would create such combinations:
For each generated combination I perform the searches in the BeatifulSoup and punctuate as described in the previous section, giving more importance to the total combination. If the text has the key combination “Felype Castro Bastos” the level of scoring will be maximized!
This algorithm runs on a variety of search points, whether it’s the Google page, Wikipedia, or even the first page found on Google.
Along with all this I also decided to create a Bot on the Telegram that would be responsible for passing me the chosen answers, as I decided not to use Tita in any live game the creation was purely for didactic purposes.
The final flow of the project looks like this:
1. There is an automatic capture of the issue by detecting change in the average pixel color in the image (there is a constant image capture).
2. After capturing the images they are threshold and thus converted to text.
3. The original form of the questions and answers is stored.
4. Basic transformations are applied, removal of special characters and putting the words in lowercase, as well as the removal of the words in common between the two answers.
5. All the soups are needed for the searches, are around 6 soups being provided by Google, Wikipedia, the first page, etc.
6. After this, the specific transformations are performed in each search technique and the searches are performed.
7. I return a list of lists containing the score of each technique.
8. I add the score assigned to each response and after that normalize, leaving the final return as: Response 1 = 98%, Response 2 = 2%.
9. The Snippet (if any), the correct answer (if any), the scoring vector, and the most appropriate answer to a given question are sent to the Telegram.
10. The cycle starts and the next question is expected.
I generated a dataframe with exactly 100 questions and rode Tita in all of them.
There were only 5 errors, mostly questions that require more interpretation than the hand search (Scenes from the next chapters).
The questions she missed were:
1. The Bastille was a Parisian fortress that was used as a state prison in France?
2. Which of these characters was known by the staff, “Am I right or am I wrong?”
3. Is it “Gradually” used to describe things that are done slowly, slowly?
4. What is the main musical prize in the world?
5. How many F1 GPs did the driver Ayrton Senna win?
Obviously it is not possible to say with certainty that the accuracy is at 95%, however, in all games that it was tested there was an average of 2–3 errors per game, getting around 84% in practice, being in some it was possible check with the naked eye the response through the snippets.
Another mechanic that helps is leaps and can be used in times of abstention or non-safety.
Being from the Machine Learning area and passionate about AI (researching at the university currently on genetic algorithms and practicing in the area) I intend to add more “humane” to Tita, especially her ability to interpret the questions and find the answer more accurate. A very strong inspiration I have with me is IBM’s Watson super-computer that was used to outdo the humans in the Jeopardy game in 2011.