Scraping Data From Bovada

Aaron Smith
4 min readAug 9, 2019

--

I have previously mentioned that I’m a Product Manager with a data curiosity, so here is my first post about some of the data that I play around with. If you are looking for my first Product-related post, you can check it out here.

While I wouldn’t call myself a gambler, I do like to use Bovada as a data source to see how some trends over time for some competitions. I have tracked the NBA, NCAA, Presidential Election, Oscars and even the Game of Thrones’ winner. This just goes back to my data curiosity, it is interesting to see data change over time.

So, how do I do it? Well, I am an avid user of Jupyter which is a notebook interface that allows you to take advantage of a number of different programming languages…I use it mostly for Python.

While Python is great for data analysis, Jupyter also allows for other libraries to enable other functions as well. In this article, I use Selenium to scrape Bovada to build a dataset that can be used for data analysis.

We’ll be scraping from the web to a Jupyter notebook

I’ll provide a link to the full solution at the end, but I want to go cell by cell so that you understand what we’re doing.

First, import some packages to help us accomplish our goal. You’ll see we are importing Selenium, Datetime, CSV, Numpy and Pandas. Those all will help us read or grab data to build our dataset.

I also import Plotly as a data visualization tool that I’ll cover in another article…data is only effective if you can use it to tell a story, which is where the visualization comes in.

Ok, after everything is imported, we will setup a couple of functions. The first ones are pretty simple and will be used for data analysis in a future post.

The next function is where all the scraping comes in. I tried to comment areas so that you can follow along, but I’ll give a high level overview here as well. It is long, but not that complicated.

We build this function to pass in two values: the Bovada url to scrape and a filename to read in historical data and save the data off.

First we initiate Selenium, you’ll need to download a ChromeDriver and point this code to your local ChromeDriver. The end of that first section will launch Selenium to the url that you passed in.

Next we scrape values from the website based on their element name, as defined by Bovada’s web developers. You can use Chrome’s dev tools to find the elements or use other Selenium functions to scrape the data you are looking for.

You can use a variety of web elements…for this example, I used class_name

Our next part is to check if we have an input file. Since each run is a point in time, I create a history file that we add on to so that we have a historical view of this data over time. If there is no input file, we’ll continue on.

Next, we initiate three list objects for the three pieces of data that we care about: the title of the bet (i.e. World Series 2019), the outcome (i.e. Houston Astros) and the bet price (i.e. +225). We are going to loop through the Selenium objects we previously loaded to append each value in that list to the list object. The title doesn’t require to loop through since it’ll be the same for each.

Our final section will be to take that data from the list objects and put it in a Pandas dataframe. We will add the timestamp so that we can look at the data over time and we will append it to our input dataframe, if it exists. Finally, we’ll write a file so that we have our historical data stored locally, close the browser, and return the dataframe for further data analysis.

When all that is created as a function, all that is left is to set a url and a filename and run the function.

Current odds in Bovada — Go Reds…maybe next year
Current odds scraped into my pandas dataframe

Now you have your completed notebook.

As you run this overtime, you’ll start to collect a dataset overtime which you can use for data analysis and visualization. I’ll talk about that in my next data related post.

--

--

Aaron Smith

Hello! My name is Aaron, I’m a Product Manager with a very strong Data curiosity.