How a small Chrome extension got the attention of a large media conglomerate (and not in a good way)

Isiah Lloyd
8 min readNov 27, 2019

--

Raise your hand if you didn’t know that the popular website RateMyProfessors.com was owned by the one of the largest media conglomerates in the world?

🙋‍♂️

And that’s where our story begins.

Katal

It was Spring 2018 and I was using Catalyst to register for classes. Catalyst is University of Cincinnati’s student information system, it’s used by students to register for classes, accept student loans, and a whole lot more. It is also widely regarded by both students and faculty alike to be a bit of a pain to use. While registering for classes there were some repetitive tasks I found myself doing for each search and that is why I started Katal. Katal was a Chrome and Firefox extension that was meant to be a “toolbox” for Catalyst.

A katal is a unit of catalytic energy, hence the name for the extension. (And no, I’m not that smart; when trying to come up with the name I just scrutinized the Wikipedia page for catalyst until I saw a word that sounded cool.)

The plan was to continually add different “tools” to it that made Catalyst easier to use. The first of these tools was to add a professor’s RateMyProfessors.com score right next to their name on class searches.

Katal added a color coded score next to professors names

When students decide which section of a class they want to take they usually ask themselves two questions: “How early is the class?” and “Is the professor any good?”. The second question is usually answered by going to RateMyProfessors.com; a website that allows students to rate and leave comments about their experiences with professors. The professors are assigned an aggregate rating between 1 and 5 based off these reviews. Katal made this process simpler by displaying the professors rating (color-coded too!) right next to their name in Catalyst. Users could also click on the rating to go to the professor’s page and read the reviews.

You can find the source code for Katal on my GitHub!

Viacom? Via-gone!

On January 15th I published the extension on both the Chrome Web Store and Firefox Add-ons. I shared it with a school club’s Slack and it got favorable reviews but it couldn’t have gotten more than ten downloads. That’s why I was surprised that in the afternoon of January 17th I received an email:

FROM: RMP Support <*****@mtv.nanorep.in>

SUBJECT: Unauthorized Scraping of Viacom Media Network’s Data

BODY:
Hello,

Attached please find correspondence from RateMyProfessors.com.

Thank you,

Rate My Professors Support

When I first read the notification I was confused, while I knew who Viacom is, I wasn’t quite sure what I would have done to warrant their attention.

For those who don’t know, Viacom is a multinational mass media conglomerate whose properties may be very familiar to you including Nickelodeon, MTV, Comedy Central, BET, CMT, Paramount Pictures, and is currently undergoing a merger with CBS Corporation. As it turns out for some reason from 2007 to October 2018, Viacom also owned RateMyProfessors.com under their MTVU brand.

Attached to the email was this Word Document:

Way to stay nameless Viacom, it’s very spooky and asserts your dominance!

In the document Viacom alleges that I scraped data from the RateMyProfessors.com website and even displayed user ratings (how dare I!) and those actions violated RateMyProfessors.com’s Terms of Use. They then go on to demand that I delete the Chrome extension and confirm in writing that I will never again scrape RateMyProfessors’ data, publicly post its information, or otherwise violate its Terms of Use Agreement and if I failed to do so, they would forward this matter to its legal department.

Well, I was successfully spooked. I ended up following their orders even though I don’t believe they would have had a case (see next section), because I am a college student and it wasn’t worth fighting Viacom (who is known to be very litigious) over a small side project. I complied with their demands later that day, removing the extension from both the Chrome and Firefox stores.

To this day, there are a couple question that remain unanswered:

How did Viacom know of this extension only two days after publishing it? I’m guessing that Viacom employs the use of some kind of program that scrapes the internet for use of their trademarks and my extension popped up (isn’t that ironic?)

How did Viacom find my school email? The complaint wasn’t sent to the email tied to the Chrome Web Store listing but rather sent to my school email. I’m not really sure how they got this email. I thought maybe I created a RateMyProfessors account with that email and they looked me up in the system but I tried resetting the password with that email and no account existed. Creepy

What is web scraping and is it legal?

There are two ways a website can gather data from another website for its own use. For the purpose of explaining these concepts I will call the website that has the data we want Website A, while I will call the website that is gathering this data Website B.

Application Programming Interface

A lot of websites develop an Application Programming Interface (API) that allows a third party to access their data. Website A provides certain methods to access its data in a way that a machine can interpret easily rather than a human.

An API is usually beneficial to both Website A and B. Website A can control which developers can use the API and cutoff access or rate-limit if they become malicious. Having an API also reduces the bandwidth Website B will use to get the same data compared to web scraping.

Website B will prefer to use the API rather than web scraping because getting the data will be much easier and less likely to break without a warning.

An example of a website that offers an API, is reddit. A normal user would get the subreddit /r/all by going to www.reddit.com/r/all. But reddit makes it easy to access the API endpoint for that subreddit by adding a .json to the end of the URL: www.reddit.com/r/all.json, notice that this link takes you to a JSON document, a standard way to share data across applications.

Web Scraping

If a website doesn’t offer an API (like RateMyProfessors.com), another method to gather data from Website A is to “scrape” the website. This means that Website B has to download Website A just like a normal user would, including things that aren’t interesting to Website B like styling, JavaScript, and data that isn’t pertinent to Website B.

It’s harder for Website A to control Website B’s access because its requests look like regular users. It’s also harder for Website B to web scrape because it has to parse the HTML (the language that makes up a webpage) and the HTML can change at any point without warning causing the scraper to break.

Is Web Scraping Legal?

API’s are of course legal because they are a service that a website explicitly provides. The legality of web scraping has been in questioned in court for many years.

Notice: The legality of web scraping varies all over the world, I will only be discussing the legality of it in the United States in this article. I am also not a lawyer and the following is not legal advice, merely just a discussion.

Companies usually argue that Web Scraping is illegal under the 1986 Computer Fraud and Abuse Act (CFAA). This law makes it illegal to access a computer without authorization. The problem is that “authorization” is vague; is putting a web page on the public internet giving implicit authorization to anyone (or anything) also on the internet? Another question is that does the CFAA consider access unauthorized if a breach of Terms of Service occurs, which is how Viacom was saying Katal was unauthorized.

In 2012 the Ninth Circuit Court of Appeals issued a ruling in United States v. Nosal which stated that they refused to turn the CFAA into “into a sweeping Internet-policing mandate.” Unfortunately, in two later cases the Ninth Circuit muddied its own ruling by deciding that password sharing and using other people’s accounts (with consent) to scrape data both violated the CFAA.

On September 9th, 2019 the ninth court made a important decision in hiQ Labs, Inc. v. LinkedIn Corp.

“HiQ Labs’ business model involves scraping publicly available LinkedIn data to create corporate analytics tools that could determine when employees might leave for another company, or what trainings companies should invest in for their employees. Perhaps because it intended to develop its own products that would compete with hiQ, LinkedIn served a cease and desist letter, stating it would implement technical measures to stop hiQ from accessing the website at all and relying on the Power Ventures case to argue that any further access to this public information would violate the CFAA. Rather than waiting to be sued, hiQ itself filed suit, obtaining a preliminary injunction in the district court, which found that hiQ was “likely to succeed” on its claims and holding that automated access to public information is likely not a violation of the CFAA.” — Electronic Frontier Foundation

The Ninth Circuit decided that using automated scripts to access public information is not the same sort of “breaking and entering” that the CFAA was intended to police and granted the injunction.

Conclusion

From my research it seems like I didn’t have to take down Katal, but I’m not too angry about it. The experience has given me a fun story to share and the chance to research the intersection of the law and the internet.

Although I used web scraping just to make my life simpler, web scraping is very important in research and journalism. If web scraping was illegal, the investigative newsroom ProPublica wouldn’t have been able to distribute their Facebook Political Ad Collector, which allowed them to investigate what type of audiences political campaigns were targeting on Facebook, or find out that companies are discriminating older workers from job postings on Facebook . The Internet Archive wouldn’t be able to operate its Wayback Machine that allows you to go back to a website at different points of time. Even search engines like Google and DuckDuckGo would have trouble indexing the internet if web scraping was illegal. The ninth circuit court made a very important decision in regards to the internet that I hope is upheld.

I would like to thank the Electronic Frontier Foundation for its articles about the legality of web scraping. If you find this issue important (or many of the other issues they work on), please consider donating.

If you found this article interesting, you may find my last article on Reverse Engineering Tinder a good read!

Isiah Lloyd is a fourth year Computer Science Student at the University of Cincinnati. He is interested in full-stack web development, devops, and making tools that make peoples lives easier. You can find him on LinkedIn, GitHub, Twitter, and his personal website. If you have any questions or comments feel free to respond to this post or email him at hi at isiah.me

He is also looking for a Spring and Summer 2020 co-op. If you think he would be a good fit for your company please feel free to contact him!

--

--

Isiah Lloyd

Computer Science student at the University of Cincinnati