Some last minute fine-tuning out on the patio

Why TicketMaster should implement a geography-based pricing model

My experience from DataFest 2016

This past weekend I had a blast at UCLA’s DataFest 2016. My team and I tackled a challenging problem: if TicketMaster were to consider a pricing recommendation system, what variables should they use to set the price? My team consisted of João Teixeira (Math/Econ, 5th year), his younger brother Joaquim Teixeira (Math/Econ, 3rd year), Samad Patel (Stats, 2nd year), Alibek Danyalov (Math of Computation, 3rd year), and myself. We named ourselves “Bayesian Bros Everywhere, UCLA” in reference to a Childish Gambino lyric we’d each heard about a million times while attending UCLA.

We decided to explore the possibility of using artist genre vs. preferred regional genre as an element to determine the demand for an artist in a particular region. We examined the distribution of when tickets were bought relative to the start of an event as a proxy for regional demand for that artist’s concert; concerts that sold out early likely had regionally popular main acts, whereas concerts that had a more steady distribution were likely less popular. We found huge variances in popularity, and concluded that the early selling out of tickets was actually poor pricing on the part of the artist, likely due to an inability on their part to properly predict their own popularity. TicketMaster could benefit by improving their pricing system for two huge reasons: they and the artist would make more money by picking the optimal price, and they would prevent scalping (which TicketMaster emphasized as a concern).

Pictured: Working Hard — Not Pictured: Hardly Working

We faced and overcame many challenges, from the physical (by the last day we were all exhausted) to the technical (João’s laptop crashed at a critical junction) to the statistical (what do you do when only .6% of the data has the variable you’re looking at). With only forty hours (including time we spent eating, sleeping, and playing with Legos) we tackled a problem that professional data scientists work for years trying to solve. I could not have picked a more amazing, complicated, hilarious group of folks to work with and I am truly blessed to have had this experience working with them. To my fellow Bayesian Bros: thank you for putting up with me and thank you for each contributing so much.


Exploratory Data Analysis:

Click here to check out a high quality version of this image

When we were first presented with the data, we did not have a solid idea of which way we wanted to take it. We each had some ideas, but needed to get a feel for the data and flesh them out. We had a couple false starts (most notably my attempt to find a correlation between how early somebody buys a ticket and their age and income; turns out no such correlation exists), but by noon on Saturday we had some idea that we wanted to focus on geography and genre. Samad and I worked on doing some simple mapping to get a feel for the data, while Ali, Joaquim, and João brainstormed what avenues we should pursue.

The first visualization was simple enough to make. We summed the total number of click events in each state, and divided it by the US Census Bureau’s estimation of the population of that state. That gave us an approximation for how often TicketMaster was used in each state. We used Plotly to make a choropleth map of the result, where the darker blue states have the higher proportion of TicketMaster users. This told us that there was a pretty big disparity in terms of how far TicketMaster had penetrated state markets. For instance, Montana had almost no usage, while Illinois had a particularly high proportion. This disparity hinted at the fact that the demand for TicketMaster varied by state, but it was far from conclusive.

Next, we took a look at the average ticket prices of each state. This was just a matter of creating another chloropleth map, this time using darker greens to mean higher prices (pure white means that there was no information about ticket price in that state). We found two interesting things: 1. none of us could afford the average ticket price in a lot of states and 2. the states were largely uniform in average price, with the exceptions of Delaware and Nevada. We still aren’t quite sure why Delaware has such high prices, but we chocked up the price disparity in Nevada to Las Vegas-based shows. The relatively low variance in average price by state told us something important: artists weren’t using price discrimination by geography nearly as much as they should be.

We knew that we wanted to see how much different genres were liked in different states, but first we had to check to make sure that people weren’t traveling too far to go to concerts. If people traveled from too far away, that would make the data about which genre states preferred noisy. We used the “Distance Consumers Travel to Venues” variable to do this, and found the average for each city. Unfortunately, only .6% of the observations had this variable, which meant that we would be only looking at a relatively small subset of the total dataset. We debated whether or not to go ahead and scrap this analysis, but in the end we decided to go ahead with it for two reasons: 1. the TicketMaser representative told us that the subset which had “distance traveled” was representative of the whole (so we weren’t going to get skewed results) and 2. since we were only doing exploratory analysis at this stage, precision wasn’t paramount. We ended up using Plotly again, this time to make a bubble map, where the darker and larger the bubbles, the further people traveled to an event in that state. On the whole people didn’t travel very far for a concert with one big exception: Atlantic City. Atlantic City is like the Las Vegas of the East Coast, where people from neighbouring states will drive or fly over for a show. On the whole though, we felt pretty comfortable examining the genre preference of each state.

So in our next visualization we did just that. We tallied the number of concerts of each category for each state, but the result was not very useful: almost every state had “Rock/Pop” as their favorite genre. It turns out that TicketMaster uses Rock/Pop as a catchall for many of the acts that it is unwilling to classify into particular sub-genres. To gain some meaningful insight, we had to normalize our data. Instead of looking at which genre had the most raw popularity in each state, we examined which genre was disproportionately represented. We did this by dividing the genre score of each state by the total score for that genre (e.g. we divided the number of Louisiana’s Rap/Urban concerts by the total number of Rap/Urban concerts in the US). We could now see each state’s disproportionately preferred genre.

Rock/Pop has the highest early peak here and is the most popular genre according to the map

With these exploratory visualizations out of the way, Samad and I regrouped with the others to share our results and get a better idea of what they had been working on. It turned out that they had come up with a way to look at the distribution of ticket sales times by genre per state. For instance, they could plot this distribution for California (pictured on the left), and the resulting visualization would have 6 lines (one for each genre). The group noticed that in states where heavy metal was disproportionately favorite on the genre map, heavy metal also had the highest peak for early ticket sales. On the other hand, states where heavy metal music was unpopular would have much more uniform distributions of sale times for heavy metal concerts. This indicated to us that the more a state likes a particular genre, the faster tickets to a concert of that genre in that state would sell out. This meant that artists could charge more in states where they are more popular and still sell the same number of tickets, thereby making more money. We now knew this trend held up on the aggregate, but we wanted to prove it on a case-by-case basis, so we picked two artists of different genres and did a case study.


Demonstrative Case Studies:

Click here to check out a high quality version of this image

Throughout the day João and Joaquin had been making jokes about incorporating the data on Tim McGraw into our analysis, and now we had the perfect opportunity. McGraw had a high variance of sale time distributions — his distribution for Nashville was the most extreme of any distributions we saw for any artist in any city—and he could consistently be classified as “Country”, so his was a perfect case to use for demonstration. Next we had to look for a big-name, multi-city artist who was the antithesis of McGraw. They should be an artist from a different genre (we decided that Rap/Urban was the best opposite to Country), who had toured on the West Coast (most of McGraw’s concerts were on the East Coast and in the South), and who had a more uniform distribution. After some trial and error we decided on Rihanna; she fit all of our criteria, plus one more we hadn’t thought of before. It conveniently turned out that Rihanna and McGraw both had concerts in Phoenix, which meant that we could compare their distributions side-by-side.

Now that we knew our artists, we just had to pick some cities. Our criterion for picking cities was simple: they had to be representative of the trends we saw in the artist’s distributions. For McGraw this meant picking a distribution with a high peak when tickets went on sale (Nashville), another distribution where the sales stared low and peaked around when the event started (Mansfield), and a third distribution which had both an early and a smaller late peak (Clarkston). For Rihanna, this largely meant picking cities in which the distribution was uniform, although we made sure to pick a in a state in which Rap/Urban was the favored genre (Seattle, Washington) to compare against cities from states with other genres (Heavy Metal for Las Vegas, Nevada and Rock/Pop in Miami, Florida).

The McGraw in Nashville distribution is an extreme example of what we saw for big artists performing in states where their genre is popular: a huge number of tickets sold as soon as the tickets go on sale, as true fans and scalpers alike are buying up tickets. The McGraw in Mansfield distribution was pretty typical for artists playing in states where their genre isn’t as popular: a steady ramp up in the number of tickets sold as the event time approaches and many more last-minute purchases. The McGraw in Clarkston distribution was an extreme case of the bimodal distribution we saw quite frequently; we aren’t sure why these distributions have two distinct peaks with such deep valleys separating them. We conjecture it may be due to confounding variables such as marketing campaigns.

The early peak for Rihanna was highest in Seattle, then in Las Vegas, and was lowest in Miami. Correspondingly, Rap/Urban was relatively more favored in Washington than in Nevada and Florida, and more favored in Nevada than in Florida. This further supported our hypothesis that we could use the popularity of a genre in a state to predict the distribution of ticket sales for a concert in that state. Interestingly, all three distributions we chose to include for Rihanna were bimodal. Again, we couldn’t be sure what factor explained this bimodality. We also weren’t sure why Rihanna’s distributions were more uniform than McGraw’s. Maybe McGraw concerts are more polarizing, while Rihanna is more broadly but less intensely popular.

Next it was just a matter of plotting out the distributions and the paths the artists took. We again used Plotly to make the map, but this time used ggplot2 for the distributions. I used GIMP to combine the 6 distributions with the map and slapped a title on it. Joaquim had the brilliant idea of putting images of the artists’ heads on top of the cities where they stopped, so I spent the better part of an hour (which I should have spent sleeping) cropping the artists’ heads, coloring them their corresponding colors, and adding them to the image. The result was the second slide of our presentation.

Due to time and space limitations, we didn’t have a chance to include our side-by-side comparison of Rihanna’s and McGraw’s Phoenix concerts’ distributions in our presentation. Notice that the Rihanna distribution has a strong early peak, and then another peak later on, while McGraw doesn’t peak until the event begins (meaning a lot of last minute sales). Arizona had a higher preference for Rap/Urban than for country. This provided more evidence for the group’s theory that there’s a correlation between how popular an artist’s genre is in a state and the distribution for ticket times for that artist’s concert in that state.


Limitations and Conclusions:

While we were working on this project we were constantly checking and double-checking each other’s conclusions to see if they held up. Ali was particularly meticulous about making sure that we did not make conclusions outside of the scope of what the data led us to conclude. Unfortunately, the very limited scope of our data meant that we also had a very limited scope in terms of what we could conclude. We decided that we would rather be honest about the limitations of our conclusions, even if that meant making less bold of claims.

Our conclusion was simple: TicketMaster should take into account the location in which the artist is performing. Currently it appears that artists are not taking this into account when pricing their concerts, as evidenced by the low variance in average ticket price by state. TicketMaster is uniquely positioned to give pricing recommendations based on these factors, because TicketMaster — unlike the artists — have access to this aggregate data. A major caveat of this conclusion is that two artists of the same genre playing in the same state might have different demand because of the difference in their overall popularity. TicketMaster can use any number of variables as a proxy for artist popularity, from Facebook likes to how quickly that artist’s shows have sold out in the past.

If we’d had more time and more data, the Bayesian Bros would have loved to build a more comprehensive price recommendation model. Nonetheless, I’m proud of what we did with our constraints and look forward to applying the skills we developed in future endeavors. Overall, DataFest 2016 was a great exercise in our critical and creative thinking skills as well as our abilities to work fueled entirely by adrenaline and junk food. Can’t wait to do it again next year!


If you liked this post, make sure to click the ❤ below.

If you want to check out some more visualizations I’ve worked on, check out “Where Does Hillary Clinton Write About” and “The Best Time to Post on Reddit”.

If you have any questions, issues, concerns, comments, or want to check out the code I used, feel free to comment here or shoot me an email at dannyleybzon@gmail.com. Thanks for reading and have a great day!