There is a betting company in the UK that offers bets based on whether certain words are said in big political speeches. These are usually big speeches here in the UK but occasionally speeches from around the world too, most notably the State Of The Union address from the US President. Fancy a bet anyone?
The betting company publishes a list of words with odds and as long as your expected profit doesn’t exceed £70 you can bet on any of those words, here is the list from the last SOTU address in Jan-2018:
Thanks to the internet we have a record of all of Trump’s speeches. If we could get all of the speeches together and analyse how often he says the above words then we could identify words that are more likely to be said than the odds would suggest. Essentially deciphering errors in the bookmakers odds.
With Google Sheets and their awesome =importxml() formula we can easily get list of links to all the speeches from a speech archive… then we can get the content of each of those speech links using the same method.
Now the content of each speech:
Pretty quickly we have an array of cells filled with the content of Trump’s bombastic speeches. All that is left to do now is count the words that are on offer. Here is quite a neat and elegant way of counting specific words using Google Sheets, it would work in Excel too if entered as an array formula:
=ArrayFormula(SUM(LEN(<Column with speech content>)-LEN(SUBSTITUTE(<Column with speech content>,<Search Word>,””)))/LEN(<Search Word>))
This works by counting how many characters are in the selected cells before and after removing the word we want to search for… The difference is the number of characters left over once we’ve removed the word we are searching for. So if we divide the result by the length of the search word we would be left with the number of times the word was used in those cells… magic!
Now we finally have a list of words with their odds, and a count of how often they have been used in the speeches we’ve included. This means we can analyse the word counts and the associated odds and find good value bet… relative to the other words at least!
To do this I created a metric which takes both odds and the count into account: I called it the Nick Rating and it is simply the Odds x Word Count.
This shows quite clearly that Mr Trump seems to say “Russia” a whole bunch yet the bookmaker is offering 3-to-1 on him saying the word during the 2018 State Of The Union address…
After placing a £15 bet on “Russia” being said I’m happy to say that President Trump did drop the “Russia” bomb mid-way through his speech, result!