Analytics Vidhya
Published in

Analytics Vidhya

A Scrappy Top 6 Netflix Titles Analysis

Disclaimer:
The following is not a peer reviewed academic publication.
The purpose of this article is to make the reader consider why the visual treatment of these materials came to be, not to provide a concrete answer or suggestion.

It was a week of reckless deicsions. But I stopped myself from some more damaging ones, and instead I sourced this screenshot from lots of people after I pointed out this type of effect to someone earlier this week:

A screenshot of 1–6 on the top 10 in the US Today on Netflix as of October 23rd 2020

Companies like Netflix, Hulu, Disney, Amazon, etc all have promotional materials that they utilize on their streaming platforms to get you to click. Last week I wrote an article about one that went wrong. But I got a lot of follow up about promotion materials for more general situations.

So I have made this article. Which basically goes through a (semi) brief analysis of some of the trends of 38 people’s “Top 10 in the US Today” on October 23rd along with how I did it.

Also — if you contributed THANK YOU!!

Anyways…Steps to this…insanity.

  1. Decide to not text your ex and instead to write an article and analysis
  2. Pick a category/small subset of a thing (in this case, US Top 10)
  3. Solicit/collect the data
  4. Clean/organize/prep the data
  5. Pick some analysis metrics (KPIs if you are fancy lol)
  6. Do some OpenCV
  7. Analyze what you found
  8. Think about the context and put it in the context and takeaways

Steps 1–2 were already done. So…

Step 3: Solicit/collect the data

I collected via Twitter, Messenger, and Slack. And discovered a conspiracy in Air and Earth Signs from Massachusetts (n = 7) but that’s a different article.

Uptmost sophisticated data collection methods called Group Chat-yltics and Twitter-doop

Anyways, I ended up with the top 6 of the “Top 10 in the US Today” from 40 people. And if you participated — THANK YOU SO MUCH! This entire article could not have been possible without you.

Step 4: Clean/organize/prep the data

A glance at my project folder for this article — I have a ‘Raw’ folder of all the pics people sent and then each of the 6 titles has a folder of their thumbnails.

I took all the images and renamed them to make them easier to parse and saved them in a folder called “Raw” and then I went through them all and cropped out the thumbnails for the 1–6 titles and put them in individual folder.

There are easier ways to do this. This took about 2 hours, but I was also sippping wine and talking to friends during this time, so I am sure it could be done in less time. This is just what I chose to do.

I will say, the con of crowdsourcing data, is that it is nonuniform. So for example, I had 38 of the first title and only 30 of the third due to cropping issues. I don’t personally mind for these purposes, but it’s important to think about for other projects.

I also personally like seeing other people’s data choices. I think data curation and collection are interesting and I liked seeing what people submitted.

Of course, the data itself was super interesting. As you can see below, there are a lot of different varieties amongst the collections of folks.

After I had the collections, these were the thumbnails for each of the 6 titles.

Top 6 thumbnails cropped and named appropriately. As you can see, some thumbnails have much more diverse selections than others.

Below are the unique promotional artworks for each of the six titles. Obviously, as we saw above, some titles have more variety/options than others in this sample set.

Step 5: Pick some analysis metrics (KPIs if you are fancy)

These are the things I chose to analyze/look at:

  • Color palettes across title collections
  • Color palettes across viewer collections
  • Edges across title collections

Here viewer collection means the series of titles from one account. Like the picture below would be a viewer collection for our purposes.

A viewer collection example

Meanwhile, a title collection would be all the images of the promotional artwork for the same title, like the picture below.

A title collection example

In numerical simulation we might call this partitioning or identifying sample populations. A statistic would be an individual title within a title collection. But terminology can be silly these days, so we won’t super burden ourselves with it for our purposes.

You will also notice that I use the metric of Edges across title collections. If you are new to computer vision, “edge” refers to some of the elements we get when we do various types of “edge detection” on images.

It’s defined as: “Edge detection includes a variety of mathematical methods that aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities.” (Wikipedia)

Basically, edge detection is a super helpful way at understanding what is going on in the photo, because it basically gives us the outlines. There are several different algorithms we can use to get different types of edges from images.

Example of Canddy, Scharr, and Sobel edges from the OpenCV for Processing library. Code here: https://github.com/atduskgreg/opencv-processing/blob/master/examples/FindEdges/FindEdges.pde and credit goes to Greg Borenstein, who is an awesome contributor to lots of cool open source code.

Computers can’t intuitively understand the visual world, that is why so much of computer vision is about making geometric representations for visual things that can be done at scale. This does admittedly mean computer vision is mostly Linear Algebra and Geometry.

As I tell my undergrads — matrices are good for the soul.

What I really like about edge detection is that it can give you an idea of the visual composition of an image — how busy it might be, where the subjects are, etc. They also can be represented in very convenient data structures for doing a variety of classification algorithms on.

I find it a really simple yet powerful way to also compare different images. I have done lots of image composites online, but you can also do composites with edges and get some neat results.

Example from a project done for Ethan Zuckerman’s class where I took over 150 Vogue Covers and did composites of their edges. Viewable here.

For this article, I was using edge detection to understand the general image composition of various images to see if they had subjects centered, rule of thirds, and other image composition properties.

With my organizing collections and questions set out, I went on to writing some OpenCV code.

Step 6: Do some OpenCV

(note: all the graphs and results are in Step 7, this is some context of I did it and my process but skip to Step 7 if you want the results)

Color palettes across title and viewer collections

Color quantization is the method by which we can extract palettes from images. It is basically using clustering of all the pixels of an image by graphing each pixel in a 3D space based off its color value in a certain color space.

I talked about this methodology a bit in my last piece, but basically it comes from Computer Graphics and how we represent colors.

RGB HSL HSV color spaces as cubes in geometric space. Credit: WikiMedia

Every color is usually a vector of red, blue, green (RGB) or hue, saturation, value (HSV)etc. Therefore, we can define a 3D geometric space where all these colors can be plotted, which is often called a color space.

I already have color quantization code, so I found the palettes for all the unique images of my titles, which I was able to do with a little bit further data cleaning. I post the results of these color palettes in the next seciton.

Skin colors

Detecting the skin color of a subject is actually a nonnegligible thing to do. I was working on lots of different methods regarding this, so I won’t given too many details here because I eventually want to do a publication about it (after graduation — it doesn’t fit within my current thesis work and that is ok).

I also won’t actually be doing skin colors for this study, due to the sample size being so small and since so many of the photos don’t have people. I do want to do a larger skin color sample and study at some point though for Netflix or Hulu or maybe multiple platforms data — but it feels like a winter break project to scope and write and perform due to grad school.

Face detection

I was curious to see the faces across different viewer collections. I used OpenCV in Processing and some hand labeling. Basically if the algorithm said there wasn’t a face present, it was moved to another folder where I quickly checked it. This is because, as we know and as I have written about previously, a lot of facial recognition algorithms are biased and not always accurate.

Unfortunately, there wasn’t a lot of interesting things to report and these programs rotate, and some don’t really include faces, so I decided not to show them in this article. I do think next time I do one of these I will include faces with a larger dataset that makes more sense.

btw: I did an interview with Moiya McTier for her Exolore Podcast all about makeup and face science if you want more insight on face stuff in the meantime.

Edge detection and Image Composition

I used Canny edges for these images in Processing, like the picture in Step 5. I chose to do this to understand the image composition (i.e where are more edges and how busy are backgrounds). Results are in the section below.

Step 7: Analyze what you found

Let’s go metric by metric. Please note, I didn’t want to show all of the viewer collections because of space and my own sanity and because many viewer collections had cropping errors so were incomplete — which is totally ok!

1. Color palettes across title collections

I think it is also important to note that most photos have LOTS of hues present, especially when they are in natural light and have lots of human subjects because skin tones are often mixes of several hues. Meanwhile, often it is photos that are more edited or lit in a more professional manner, leading to reflections and shadows and highlights from all the subjects based off the lights used, edits done, etc.

These stylistic decisions amongst different titles are super interesting. Color being used to highlight subjects or set a stylistic mood for the entire title.

2. Color palettes across viewer collections

I picked the following viewer collections for both the faces and color palettes so we could see the space. I found the results interesting and underwhelming as well.

All of the posters were quite dark and muted dark and autumnal palettes, with one or two very colorful posters in each collection.

Color palettes across 6 viewer collections, showing that there are a few saturated palettes per viewer collection, but often that the collections don’t include all the maximized saturation and colorful posters

4. Edges across title collections

Below is canny edges for each of the unique artworks for “The Haunting of Bly Manor”. As you can see, this effect can be sort of spooky, but we can see some trends when we aren’t as distracted by backgrounds and colors.

For example, we can see that this collection tends to center the subjects and tends to keep the titling fairly uniform. Overall the title isn’t the standout of the structure of the image — it is very structured on the subject — characters or the manor.

This is very different than the collection of artwork from “Yes, God, Yes”, which features prominent text, even on its more subject oriented pictures.

I think also that edges could be used more at scale to understand structural patterns in lots of title promotions and in context of various collections. For this sample size, edges can be super neat but they aren’t going to expose a lot of insights that we can’t tell from intuition or more manual methods.

Step 8: Think about the context and draw some takeaways

I don’t have as many deep insights this week as my last few articles. It’s also just a super busy week but I wanted to get this out before I went into inevitable November/December semester cram time. Overall I think this was a super fun exploration. I think it’s important to understand that this sample size is way too small to make any statistically significant draws, but I hope people learned some stuff and that it made them think.

I think also next time I would focus on fewer titles and titles that do not rotate. I would also love to see what the titling is like in say the winter holiday season versus late October and to see what visual communication exists.

But yea. If anything I hope this made you think about the purposeful and customizable nature of the promotional content on streaming platforms.

I think I’ll be doing a more focused study around faces/skin color, etc later. But I am definitely open to ideas if there is something that folks want to see! Feel free to drop me a Tweet @ninalikespi or email at nlutz@mit.edu!

Misc FAQ

Nina, do you really think you will get a job with these?

  • I mean…maybe? Probably not? I am incredibly grateful to have been able to talk to some folks at Netflix because I really really like this space — it combines my love of visual communications and data science super well! And it’s not a bad time for tech/data science/arts people like me to get into the streaming universe.
  • But I am applying broadly. I have learned the hard way to never put all your eggs in one basket, so I am casting a wide net. Honestly, I am open to working at lots of different types of places so like…anyone can hire me please hire me lol.

So, why do you write these?

  • Honestly, these articles are super fun for me, and I’ve been struggling mentally during the pandemic a lot so I guess I am just doing what feels good? Especially on the weekends, when I do these, and it’s when I feel (and often am) the most alone.

Nina, how do you think of these?

  • I wish I knew lol. I just try my best. I think the crowdsourced component helps, and like tweeting updates while I’m writing. It makes me feel like people are expecting things of me which makes my brain go zoom.

Nina, why are the graphics uneven/bad/unaligned?

  • Because I chose to use pages because I keep thinking they’ll just be charts and tables and then they grow and I like making my life hard. It’s my toxic personality trait.

Nina M. Lutz is a student at the MIT Media Lab and thinks about human quantification, aesthetics, and data-driven visual communication. Previously, she studied Computer Science and Design at MIT for her undergraduate. She is currently working on participatory digital artworks regarding Sign Language for her MS thesis.

Please hire her after June 2021. She is so hireable and will do code and math and writing and art for you for compensation and health insurance.

Her biggest crime in 2020 is that she likes gatorade. Like actually likes it for the taste. She is currently being held together by TV and Wegman’s S’mores ice cream and her friend Kelsey who lets her scream to music at stop lights.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store