Analyzing Music Festival Fan Data

As a graduate of Berklee College of Music, I can personally vouch to the vast amount of opportunities the college offers to students seeking to advance their careers in the music industry. In August 2017, I was accepted into the Berklee Popular Music Institute (BPMI) program run by music legend Jeff Dorenfeld whose experience includes managing Boston, Sammy Hagar, and Ozzy Osbourne. BPMI selects 20 students to find and develop seven Berklee artists to play summer festivals including Welcome to Rockville, Essence, Lollapalooza, Osheaga, Outside Lands, Country LakeShake, and Music Midtown.

My opportunity with BPMI involved myself and three other students to organize a case study aimed at analyzing fan purchase habits. The specific festival was Welcome to Rockville in Jacksonville, Florida hosted by promoter Danny Wimmer Presents.

The objective: create a survey app to better understand Rockville fans, analyze and draw conclusions from the data, and present our findings to Danny Wimmer Presents. I did not play a role in developing the app but participated in the entire process of creating the list of questions, formulating the process of conducting surveys with fans, communicating logistics with the promoter, and creating incentives for fans to complete the survey.

Case Study Details

With eight students, we collected 558 surveys within three days. The surveys took between 30 seconds and three minutes for fans to complete. The process involved students orally asking questions and, in return, receiving a response from fans. The survey questions are listed here along with the GitHub project hosting the code.

By attending the event and speaking with fans, I understood the festival, its customers, and why they were motivated to purchase their tickets. This festival is considerably different than other nationwide festivals. Welcome to Rockville is a genre-based festival as opposed to typical lifestyle festivals such as Lollapalooza or Coachella. Rockville centers its brand around the world’s top rock and metal acts including headliners Ozzy Osbourne, Avenged Sevenfold, and the Foo Fighters.

First Cleaning Steps

After the festival, I put my data analysis skills to work. The data initially came in the form of a JSON file. Originally, the file contained all of the questions in one column of a nested dictionary. This was an obstacle that required good data wrangling skills because all of the questions appeared in one column instead of several when converted to a Pandas DataFrame. I’ve shown before and after pictures of the data cleaning process below. The “after” process displays a transformed DataFrame (for viewing purposes).

I rotated the DataFrame here in order to screenshot all of the columns in one image

Three questions had nested questions containing “If, then” statements. My initial EDA process involved splitting the answers into individual questions and answers. For example, Q1 states “Is this your first Welcome to Rockville? How many years have you attended in the past?” I split the question into two individual questions. The “if” statement demonstrated that if a fan answered “yes,” the second question would receive a value of “0”. Otherwise, fans would respond with the number of times they attended in the past.

For questions three, four, and five, additional cleaning was necessary before implementing the data into Pandas. Q3 had one missing value so it was necessary to assign it a value of “Other”.

Question four stated “Did you purchase VIP tickets? If so, what most influenced your decision to buy?” If fans answered “No” to part one, part two would receive a value of “n/a”.

The same appeared for question five asking if fans anticipated attending next year. The second half of the question asked why fans did not plan to attend if they answered “No — Definitely not coming” or “Maybe — Other factors.” If a fan answered “Yes — Coming no matter what” or “Maybe — I want to see the lineup first”, the second part to the question would be filled with “n/a”.

After splitting the nested questions for Q1, Q4, and Q5, I added all of the responses to individual lists as separate columns to a Pandas DataFrame.

More Cleaning

Now, I could continue cleaning the data. The first process involved dropping the test surveys at rows 9 and 22 in the DataFrame. From here, I reset the index, converted the ‘createdAt’ column to datetime, and configured the survey answers into useable data.


For ‘years_attend’ and ‘first_rville’, the strings were replaced with integers. The next mentionable section refers to remapping ‘coming-next-reason’ to have answers categorized into groups. The ‘coming-next-reason’ column is a response to the survey’s previous question, “Do you plan on attending next year?” Some fans had multiple reasons for not attending next year’s festival, all of which would be used for data analysis. All of the charts were created with Tableau. Per the legend, new fans are defined as fans who have attended zero Rockville festivals in the past. Return fans are defined as 1+ Rockville festival appearances.

Using Python’s Zip Code Library

After the EDA process, I explored fan zip code information to understand what areas of the United States most fans were coming from. The charts below can show what information can be gleaned from viewing zip code data. The charts confirm how Rockville is a regional festival considering it is located in Jacksonville, Florida. Danny Wimmer Presents hosts festivals all across the United States including California, Wisconsin, Ohio, New Jersey, Kentucky, and North Carolina so it makes sense to cater to each region of the US.

Additional Charts

Other available visualizations created from the data are shown below. There is some useful information to be gleaned from these charts.

The interesting part of this visualization shows how many new fans are present for the festival. The percentage of new fans at the festival is 50.7%.

Welcome to Rockville 2018 included headliners Ozzy Osbourne, Foo Fighters, and Avenged Sevenfold.

The “Rate Experience” visualization becomes more interesting when combined with the question, “Do you plan on attending next year’s festival?” Fans who stated having a negative experience yet still answered “Yes — Coming no matter what” regarding next year’s attendance are interesting data points. The audio files provided some insight to understand their responses. Additionally, it was important to analyze fans who had a positive festival experience yet do not plan to attend next year (the most common reason was due to travel).

Below we can see that most fans are attending the festival with a significant other which provides valuable insight from an advertising standpoint.

Finally, we can see the favorite features for fans who purchased VIP tickets and had exclusive access to a VIP area within the festival.


The whole data analysis process was completed in order to understand the purchasing habits of Rockville’s fans. This case study demonstrates how effective data can be in understanding consumer buying habits. If you have any questions or comments, please feel free to contact me on LinkedIn or Twitter. Thank you for reading!