Final Report for Outreachy Internship

Zareen Farooqui
Becoming a Data Analyst
6 min readMar 4, 2017

It’s crazy how fast the past 3 months have gone by, but I’m in the final week of my Outreachy internship with the Wikimedia Foundation!

What I’ve been up to

The week after the Wikimedia Developer Summit, I spent 3 days working from WMF headquarters in San Francisco. It was great to meet and work with some of the team in person, mainly my mentor, Tilman Bayer, a senior analyst at WMF. The office is spread across a couple floors with lots of meeting rooms and open spaces for people to collaborate.

Presenting at WMF about my journey into data analytics

I gave a talk at a PyLadies San Fransisco meetup which was hosted by WMF about my journey “Becoming a Data Analyst”. About 70 people attended and this was the largest crowd I’ve ever spoken in front of! I didn’t know until just before the presentation that it would be recorded, but now I’m glad to be able to review my presentation skills. For having almost no prior public speaking experience, I’m happy with my talk. However, there are a couple things I want to work on such as not swaying around as much and limiting the number of “uhms”. I’m also glad that it was recorded so that I could help others by sharing my experience learning data analytics, which was one of my main goals when I started this blog. I wanted this talk to be useful for those who aren’t even sure where to begin learning and it was really gratifying to have people come up to me afterwards (and later random people message me on LinkedIn) and say it did exactly that.

For the past few weeks, I’ve worked on developing and exploring a new privacy-friendly retention metric to help WMF understand how often readers visit the site. This metric utilizes an existing instrumentation (introduced fairly recently for the unique devices metric which launched in January 2016) to calculate statistics for the timespan until WMF sees a device (desktop or mobile phone) return to the site. It’s been interesting work and I’ve learned a lot about web analytics. I’ll have a separate blog post about this project soon.

I also started work on a new privacy-friendly engagement metric for WMF by vetting data quality of the new table for this metric, which has already been useful to the developer and product manager of this product team. However, this project won’t be complete before the end of my internship, so I’ll likely have to hand it off to the team. This project aims to understand how long readers engage with the website by measuring the time a reader has the site open in a browser window.

Reflections on the internship

Overall, I think the Outreachy program is fantastic and would highly recommend working with WMF (by the way, applications for the next round are now open). I felt that my projects were useful to the community and at the right technical level for me. I was never bored with my work, but I also wasn’t totally overwhelmed with the technical complexity. In Python, I gained a deeper understanding of the Pandas and matplotlib libraries. In SQL, I learned a lot more about aggregate functions and nested subqueries. I also became familiar with Hive and Hadoop.

One part of the internship which I enjoyed was working with global data. There are currently 295 Wikipedia language editions. Although I didn’t interact much with members of the global WMF communities, it required me to think on a global scale. For example, while working on the article sections heading project, I learned how other language editions refer to similar headings differently based on their language norms.

This was a fully remote internship. Thankfully WMF makes it easy to work from anywhere in the world by using tools such as IRC, Google Hangouts, Blue Jeans, Etherpad, live streaming meetings on Youtube, and documenting notes from meetings to keep people connected.

Four times a week I had a check in with Tilman via IRC and once a week we had a Google Hangout. This worked really well to communicate progress, blockers, ask questions and discuss projects. Outreachy only requires mentors to check in with interns twice a week, but I thought it was extremely helpful to have daily check ins. About once every other week, I had a Google Hangout with Jon Katz, the Reading Team Product Manager, to discuss broader issues outside my day-to-day activities (general questions about WMF, how we felt the internship was progressing, etc.). Although it wasn’t related to my project work, it was nice to have dedicated time for other discussions.

There are many obvious perks to working from home (mainly that I don’t have get out of sweatpants), but sometimes it’s weird to be at home alone all day. Occasionally, I worked from coffee shops, but often found the wifi issues weren’t worth it. I’ve been working from home for over a year now and to balance this out, I schedule lots of evening activities like fitness classes, events with friends/family or attend meetups. It can also be strange to work with people, but not know them on a personal level which is a natural progression when working together in person, but much harder on remote teams.

The timeline for my project work was flexible, so I was able to prioritize the most important work, but unfortunately, I won’t finish all the projects originally planned. Part of this was due to unexpected delays such as system issues and maintenance, but also tasks taking longer than planned. Some of this was deliberate decisions to add more time on projects that were going well to make sure I did a thorough job. I’ve made sure to document everything for a smooth transition and will also spend a few hours over the next couple weeks to finish additional work.

Personal Takeaways

During my internship, I identified two things I want to get better at. One is sharing my work in progress. I secretly worry that until my work is complete, since it’s not perfect, someone will judge me for this. Of course, this is silly and sharing work in progress is important to make sure I’m not making early mistakes which can affect the results, brainstorm additional ways to solve a problem, and get feedback. Since I had check ins every day, I had to share work in progress and I’ve gotten more comfortable with it. Most work at WMF is done in the open with lots of communication from others (who might not directly be working on that project) and I’ve seen how beneficial this can be to successfully completing a project.

I also realized that sometimes I obsess over small details that seem strange, but won’t really affect a project’s overall results. Data is messy and there’s almost always anomalies, but it’s important to know when to dig into these more and when my time is more valuable working on another task.

Overall, this experience was totally amazing. I worked on one of the top ten most popular websites in the world, improved my technical skills, got better at communicating and documenting my work (not something I had to do a lot when studying on my own last year), had fun and got paid to do it all! I want to thank the Gnome Outreachy program for making this possible and helping women all over the world break into open source projects, sharpen their tech skills, and gain confidence in their work. I also want to thank WMF and specifically, the Reading Team for accepting me as an intern and supporting this program. Last, I want to thank Tilman for the guidance and time he dedicated over the past few months to my internship. I’m incredibly grateful to have had this opportunity!

--

--