Overscripted Web: a Mozilla Data Analysis Challenge
Two cohorts of Canadian Undergraduate interns worked on data collection and subsequent analysis. The Mozilla Systems Research Group is now open sourcing a dataset of publicly available information that was collected by a Web crawl in November 2017. This dataset is currently being used to help inform product teams at Mozilla. The primary analysis from the students focused on:
- Session replay analysis: when do websites replay your behavior in the site
- Eval and dynamically created function calls
- Cryptojacking: websites using user’s computers to mine cryptocurrencies are mainly video streaming sites
Take a look on Mozilla’s Hacks blog for a longer description of the analysis.
The Data Analysis Challenge
We see great potential in this dataset and believe that our analysis has only scratched the surface of the insights it can offer. We want to empower the community to use this data to better understand what is happening on the Web today, which is why Mozilla’s Systems Research Group and Open Innovation team partnered together to launch this challenge.
To guide thinking, we’re dividing the Challenge into three categories:
- Tracking and Privacy
- Web Technologies and the Shape of the Modern Web
- Equality, Neutrality, and Law
You will find all of the necessary information to join on the Challenge website. The submissions will close on August 31st and the winners will be announced on September 14th. We will bring the winners of the best three analyses (one per category) to MozFest, the world’s leading festival for the open internet movement, taking place in London from October 26th to the 28th 2018. We will cover their airfare, hotel, admission/registration, and if necessary visa fees in accordance to the official rules. We may also invite the winners to do 15-minute presentations of their findings.
We are looking forward to the diverse and innovative approaches from the data science community and we want to specifically encourage young data scientists and students to take a stab at this dataset. It could be the basis for your final university project and analyzing it can grow your data science skills and build your resumé (and GitHub profile!). The Web gets more complex by the minute, keeping it safe and open can only happen if we work together. Join us!