Mayweather vs Pacquiao: The fight from twitter and facebook

Realtime data analysis using a stock platform.

sports and tech
7 min readMay 18, 2015

“The man who has no imagination has no wings.” — Muhammad Ali

I love sport, the stock market and technology, especially realtime systems. I have worked with realtime systems most of my life. I started at British Telecom many years ago as a young lad when they introduced me to the realtime monitoring team that ensures BT’s network is fully functional at all times, think NASA control room but on a smaller scale. I moved to insurance and banking and learnt plenty about pensions and investment funds etc… I was offered a job working at Canary Wharf for a company called FTSE (now part of the London Stock Exchange). FTSE has a number of realtime platforms that calculate indices i.e. FTSE 1oo.

Nexusfuse — networks combined

My last day at FTSE was on Valentines day, there were a number of reasons for leaving, which I won’t go into as it could be a blog/book in itself. Anyway on my way home on 14th feb, someone jumped in front of the train in front of mine and I ended sitting on the train for a number of hours which left a load of people on the train holding flowers and phoning their wives. I ended up having a very disappointed wife and kids as we were due to have pizza & beer to celebrate a new start. The only positive event in this situation was I was reading about a hedge fund manager talking about how they used to talk about stocks in steam rooms or over dinner and talk about rumours/recommendations about stocks and other market news. The hedge fund manager then went on to say that this no longer goes on as most of this activity now happens on twitter/other mediums. I assume the steam rooms now are like the commuter train journeys i.e. where you sit next to the same people for years with a polite hello but never really know who they are.

Nexusfuse tracks the stock price as well as social media so you can see if there is an increase in positive or negative tweets and if this has an influence on the stock price going up or down. $BABA was tracked for a while when it launched a few months back.

I had two weeks before I started my new role and had a list of things that I wanted to research/learn: sentiment analysis, machine learning, google web toolkit and big data / nosql databases.

I am not going to cover what has happened with nexusfuse but what you need to know is I took this technology and applied it to the biggest fight in the world: Mayweather vs Pacquiao.

Mayweather vs Pacquiao

I had a couple of days to take what I had created for nexusfuse and come up with a slimmed down version. I came up with the following :

Technology

  1. Feeds — developed in java.
  2. Webpage — developed using GWT (it Rocks!).
  3. Data storage — I used MongoDB. I am on the fence on this one as this is the first NoSql db I have used but early indications are that I like it, there are some pro and cons. It does depend on your use case.
  4. Sentiment engine — Java and nodejs.
  5. Deployment — I am also a big fan of Digital Ocean, I had a couple of droplets but had to add another one to increase the number of the sentiment engines to help cope with the sheer number of tweets.

Challenges

Stocks have plenty of chatter on twitter and facebook and my simple machines have had no problems dealing with the number of volumes of this data. We tracked the following hashtag and users on twitter for the fight night:

  1. @FloydMayweather
  2. @MannyPacquiao
  3. #MayweatherPacquiao

Note : #MayPac was not tracked

The feed connector connected to twitter using the above keywords and within a few hours I had received more tweets equivalent to six months worth of tweets for certain stock symbols. My single sentiment engine was not able to cope with the volumes and an additional engine was needed, to be honest I could have used another 10 engines. I have two sentiment engines, one that is lightning quick but the results of the tweet sentiment is questionable whereas the slower engine is far more accurate, which is the one I used. I should say it is not perfect, but it is good…..

The facebook data

Floyd Mayweather facebook page started on 9,624,592 facebook likes.

Manny Pacquiao facebook page started on 6,751,979 facebook likes.

After the fight Mayweather had accrued 758,114 additional facebook likes and Pacquiao had gained 679,194 additional likes but after a couple of days Mayweather had 1,66,4306 additional likes and Pacquiao had added 1,88,3333 likes which is 219,027 more.

Floyd Mayweather finished on 11,288,898 facebook likes.

Manny Pacquiao finished on 8,635,312 facebook likes.

Mayweather still has 2,653,586 more likes, but it will be interesting t0 see if this gap reduces over time. I think it will.

Note: Mongodb received 3 messages but failed to commit them to the database.

The twitter data

The twitter streaming feed was connected to the platform for around 24 hours and within that period it received 1,68,0057 tweets.

You can see from the above images that during pre-fight, tweets were negative for @Pacquiao (ignore $ the platform is treating these as stock symbols). It was interesting to see that during and after the fight @Pacquiao bubble was slowly moving to the left and up (positive).

The #MayweatherPacquiao overall ended with a negative sentiment which I think kind of stands true to what the press reported over the last few days. Maybe I should have tracked @Beyonce as there were plenty of positive sentiment about her outfit… I won’t include a photo!

Over a 24 hour period the platform received 1,68,0057 tweets.

@Mayweather ended slightly negative but remained consistent throughout the fight and the remaining hours.

Now lets take a look at the tweet message rate over a time period. I thought it would be useful to show you $AAPL which is the stock symbol for Apple. As you can see stocks are easily manageable but I’m sure in time this will change as the old school moves out and kids of today start getting into more senior roles within the investments banks.

The second chart shows England Vs Norway on 3 SEP 2014–8:00 PM. As stated you can easily identify when kick off starts, half time and when England scores. But take note of the peak which was around 18 tweets.

Now lets take a look at big fight message rate

Again i’ll point out that during the England game it peaked at 20, this just blew it away. Considering my twitter api receives less then 1% of the data twitter has available the sheer volume is awesome. I assume at 6am people either went to bed or had breakfast. I am still investigating what happened to cause a spike at around 9pm.

Conclusion

I learnt plenty, for example, I architect systems that can be scaled quite easily, considering it was designed to deal with stock data, I had to do very little tuning to deal with the increase in volume. The platform is not perfect but it works quite well.

I will be looking at this data for a while and analysing it in more detail. Finally, there is so much you can do with this type 0f social data (facebook, twitter etc..) and additionally the added benefits it can make to companies & people etc. I do believe in time that stock data on social platforms will look like the #MayPac or #EndvNor volumes in time to come.

Well it’s the Monaco grand prix next, I hope you enjoyed this post. Catch you later.

--

--