Who Watches What? Identifying Features that Impact Television Viewing Behaviors
This summer, I had the opportunity to intern on Xandr’s Data Science Team, a broad and essential organization that provides information and insights for business decisions, product support, and management.
The data we worked on ranged from auction prices to television viewership information and customer demographics, and more. I am especially grateful for this internship opportunity as we are going through a global pandemic, and the University Recruiting Team went through a lot of preparations to make the remote internship experience fun and rewarding.
My project’s general goal was to select traits that can be used to identify customer segments of similar behavior in order to enable relevant ad targeting for a specific households. For example, if we know that a household tends to spend a large portion of time watching kids channels, then our algorithm would suggest certain characteristics or interests, like a tendency to visit amusement parks and zoos, drive an SUV or buy children’s toys. We can then use these characteristics to place correlating TV advertisements, ultimately leading to a significant increase in the efficiency and personalization of the consumer experience.
This project is essential to creating a better advertisement system where ad contents reach interested consumers. As we improve our understanding of consumer traits for efficient ad placement, we can use our knowledge to enhance the accuracy of our TV platform applications in association with our advertising products.
For my project, I analyzed two main datasets: Viewership data and Experian data.
Viewership data comes from a subset of households with a subscription to Direct TV (DTV). DTV set-top box captures TV viewership events that help derive various other metrics such as the total duration of time spent watching television, the proportion of time spent watching different genres, channels, time of day, etc. DTV has about 16 million users subscribed.
Experian data comes from a third-party source demographics data provider, which contains around 945 features of various user traits, including automobile model type, shopping trends, personal interest and hobbies, and many more. The large quantity of traits in the Experian dataset makes model training challenging. Thus, it is essential to develop an algorithm to select traits that are important in determining the TV viewership behavior of a consumer.
Since we were selecting features from Experian data (X) based on Viewership data (y), every feature within Experian would need to go through the selection process. There are in total around 1,000 features, which cannot be processed at once, so I decided to make 10 groups of 100 features. Within those 100 features, I add a ‘random’ column and select all features with a higher importance score than random. Another approach is to select features by categories either provided by Experian or developed internally at Xandr.
While the algorithm outputs a list of essential features specific to the television genre’s viewing time, we needed to analyze the effect and distinctive features that are important to each genre of viewership.
Suppose we want to find the identifying traits for customers who like to watch sports vs. kids & family, I would first run the same algorithm for the two different genres (y = sports vs. y = kids &family) and then separate the features into groups to analyze the differences. There are various ways to analyze and compare the results. Below is an example of the discoveries I made by comparing the effect of features group “Automobile” with regards to genres “Sports” and “Kids & Family.”
Almost every household would have purchased automobile-related products. How can we distinguish the viewership genre from automobile information? We approach this a mixture of analysis.
By taking a closer look at the types of cars included in the selected important features, we can see that multiple pick-up trucks were found to be associated in predominantly Sports viewing households. While in cars of different SUV models were found to be important in predominantly Kids & Family content viewing households. Thus, by analyzing the number of features included and the content of the feature included for two genres, we can also identify unique traits associated with customers who view these distinct television genres.
I am very grateful for my experience at Xandr with an exciting project that allows insight into some of the core businesses that the company does. I want to thank all the Data Science team members for giving me advice and feedback on my project in many ways. I am very excited to present my project, and I look forward to learning more about Data Science in the future.
About the Author
Lucy is a master’s student at Johns Hopkins University studying Computer Science. Her interests include watching crash course videos on Machine Learning, travelling and cooking.