What it was like interning for RStudio

Yim Register (they/them)
Bits and Behavior
Published in
8 min readOct 9, 2019
classic RStudio hexagon stickers including markdown, feather, dplyr, shiny, opensciR, tidyr, lubridate, %>%, rvest, and RStudio itself
Image of Spartan-like warriors standing over a field of possibly dead bodies, saying “How do we know if they’re actually dead or just pretending?” The second warrior shouts to the field “dplyr is pronounced ‘deeply R’”. All of the bodies on the ground start shouting back “No! dee-ply-R! deeply-R???” revealing that they are not actually dead.

From the interview to my view on company culture, here is where I detail the account of working on my RStudio summer internship. I created a series of lessons to teach statistics and data science to software engineers, using software engineering data sources and academic research on software engineering practices.

If you don’t care about all the feelings and details, here’s the first draft of the project! Enjoy! ds4se.tech

The Interview

I get extremely nervous for interviews. It might have something to do with The Needs of Autistic Adults in Video Calling, which Microsoft Research so aptly captured. It might also have to do with my all-over-the-place background and skills; to the point where I don’t yet feel that I have a specialty in one thing yet. But over the past few years, I’d been developing a love for R. I even pleaded with my Quantitative Methods class to give R a chance, reminding them that any new language can be annoying (turns out R is kind of annoying, but now I have a more nuanced perspective of why). So when I applied to RStudio it was in hope that I could combine my love of R with my PhD research in general machine learning literacy.

The interview was actually the most joyful interview I’ve ever had (I think it was a really good fit). Greg Wilson was my interviewer, and eventual mentor. He was weird, in a maybe-we-are-from-the-same-planet kind of way, and he seemed to have a strong vision of what he wanted to create. The interview completely detoured to talk about some of my activist work at University of Rochester, and I could tell that Greg had a lot of knowledge on community organization and institutional change. This was a person who would believe me, support me, and help me thrive. I will totally admit, I had no idea how to do the project he was asking me to do. And I told him as much. But I liked RStudio and I liked Greg and I wanted to become a better programmer and maybe, just maybe, start to figure out how to be a software engineer.

Greg emailed me a few days after our interview to try to call me (for goodness sake’s Greg) to tell me the good news. I avoid phone calls like a good autistic millennial so he has learned that now. I knew I’d be taking the internship several months in advance, with enough time to rapidly try to learn everything I could about software engineering.

The Project

The project is called Data Science for Software Engineers, or “DS4SE”. It’s a series of hour-long RMarkdown lessons (and packaged data) that students can use to learn statistics on software engineering problems. These problems include “how many repositories are on GitHub and how fast is it growing?” and “how bad will your code suffer if you’re sleep deprived?”. The idea was to give Computer Science students and early-career software engineers a chance to learn data science on data they care about. And they would learn some software engineering research findings along the way. Over the course of the internship, I began to love this idea and project. But when it was first pitched to me, I had no idea why anyone cared about git repos or Agile development or test suites or whatever. I wrote research code, not industry code! But I learned the importance of this work before it even sunk in for myself; when I told software engineers about the project their eyes lit up and they got excited and said they’d love to see those lessons. I knew Greg was on to something, even if I didn’t totally get it. What I did get was how to teach statistics in a meaningful and gentle way. I personally find myself pretty funny so I tried to incorporate that into the lessons. The goal was to make the learners laugh, reflect, feel confident, and feel excited. And of course, to learn some data science. By the end of the internship, I think we accomplished a really great first draft of that idea. Enough of a draft to show to others what we envisioned, and enough to already start helping people learn some statistics and software engineering jargon.

The Preparation (for the Project)

I literally signed up for a Software Engineering Research seminar the second I accepted the internship. Knowing Greg now, I know I could have said “hey listen, I’m pretty terrified that I have no idea what software engineering even is and I don’t know what I’m doing and you made a huge mistake”. Instead, I vowed to read and study as much as I could before my first day on the job so that no one would regret hiring me. The first day in that seminar was incredibly frustrating. I hadn’t realized how little I knew about software engineering. I simply did not know the terminology or the practices at all. I’d written a few unit tests before, but that was about it. I had no idea what a “test suite” was (it’s just a bunch of unit tests except called a suite like a fancy hotel). We were reading a paper that mentioned the use of “fuzzers” and I just felt totally out of my element. (fuzzer: throwing random data at a program to try to break it and make sure you didn’t miss any weird edge cases or something). But I kept at it; Greg sent me lots of resources and my PhD advisor Amy Ko has a strong background in software engineering and research and computing education research. So I got resources, and I tried to prepare to do a good job. It’s also important to note that part of my preparation for the internship involved entering into trauma therapy. I do something called EMDR to heal from sexual violence. I wasn’t sure if I’d include this in this blog post, but it’s truthful and it was a huge part of my summer experience. So, you’re not alone if you’re trying to do it all while also trying to heal.

The First Day

The first day royally sucked. And in hindsight, it’s because I didn’t properly advocate for myself. Greg tasked me with the first lesson we wanted to make: How many repositories are there on Github? Well, my API knowledge was utterly crap on that first day. I’d basically used one API before and then given up because I didn’t understand permissions, what I had access to, how to query anything, etc. Currently, I’m in a course that I call “Bigger Data” (because I already took “Big Data” and this one was next) where I’m learning all the SQL, cloud computing, API magic for data science. But literally the first SQL I ever wrote was on my first day of my RStudio internship and I didn’t want to admit how little I knew. Turns out, I’m a fast learner. And even if I didn’t learn that stuff, I have skills in other areas.

Greg also sent me this photo on the first day, which made me realize that maybe everything was going to be okay:

Thor in pink sparkly outfit with Hello Kitty bows. I don’t remember what the conversation was about but I do remember feeling better.

The People

RStudio is a collection of intelligent misfits. Some of them left academia, and some of them left tech giants. Some of them came from startups, from teaching, or got scooped up because of their rambling Twitter accounts about #rstats. They are all incredibly odd in the best ways. Each person is very unique, with passion and commitment and excitement. I worked most closely with the Education team, which was filled with extremely clever and hardworking people who want to make the world a better place. They were always accommodating of my special needs, pretty good with my they/them pronouns, and willing to connect over “virtual coffee” to help me in my path. While RStudio does have a Boston office, we were mostly all remote. On one of our Education Team meetings, Carl Howe said “how was everyone’s commute to work today? no tripping over laundry on the stairs I hope!” (He’s a really kind and funny guy, who also went out of his way to learn about my sensory needs at conferences). The other interns were miraculous people. I’m an absolute fanboy over Desirée De Leon’s Teacups, Giraffes, and Statistics project. Dan Chen sent me his own book (that he wrote!) on Pandas for Everyone, which was funny because we were both working for RStudio. And Maya Gans and I became friends for life (she was Greg’s other intern and she created tidyblocks). Maya and I also had the amazing opportunity to go to Toronto to meet Greg and his family. And that was honestly one of the best experiences of my life. Also, RStudio funded our attendance to the International Computing Education Research conference, where I got to meet my computing education community and wear both my RStudio and PhD hats all at once.

The RStudio community was kind, open, weird, smart, and visionary. And I was very, very lucky to meet them. Not only because they helped me grow, but because they also showed me what a non-tenure-track job could look like. I still don’t know where I will end up after the PhD, but it was nice to get some insight into something different.

The Skills I Picked Up

This summer was almost like a skills bootcamp for me, and also an opportunity for me to create educational materials rather than go deep on the theory of how we should teach computing (also good). I needed a creative muscle flex after a year of theorizing and only reaching a couple dozen students with my machine learning literacy research (I’m sure I’ll get a chance to reach more in the future). I’m going to list in bullet-point form the skills I managed to pick up during my RStudio internship:

  • intro to SQL
  • waaaaaay better at using git, collaboratively and productively
  • got way better at group Zoom meetings (video conferencing)
  • teaching skills and exercises and how to make sacrifices and hard choices when designing at scale
  • some API/cloud work
  • ggplot2_skill ++
  • dplyr_skill ++
  • software engineering knowledge, conceptual and practical
  • playing nice with others
  • explaining statistics and envisioning how lessons could look like at large scale
  • company internals/business things I didn’t know about
  • how to work remotely
  • how to rid my work of excess exclamation points
  • a hopefully long friendship with Greg and Maya, who give me hope for humanity and inspiration to be myself

Next Steps

I’m back at PhD school now, working on interventions for developing machine learning literacy using personal data. I’ll also be doing a bit more with StackOverflow data, looking into the machine learning and “big data” communities there and their social dynamics and discourse. DS4SE still has a long way to go, but now the vision is partially out there. I hope to return to it soon, and hopefully evaluate some of its effectiveness and finish enough of it so that we can begin to circulate it. I’ll be at rstudio::conf 2020 both as a volunteer (equipped with some gender, accessibility, and sensory guidelines for all!) and with a quick Lightning Talk about the project. Hope to see you there.

--

--

Yim Register (they/them)
Bits and Behavior

Attending PhD School. Radical optimist. Machine learning literacy for self-advocacy and algorithmic resistance