What I learned from R User Day at Data Day Texas
My key takeaways, and view of the R community as a whole
This past weekend I got the opportunity to be a speaker at R User Day (a part of Data Day Texas). Besides giving my own talk, my day was packed with listening to other R experts present on the subject. It was a great experience and felt amazing to finally meet so many of the people I have been following on data science Twitter. My biggest issue was that there were so many R talks that I couldn’t attend them all. Before I forget everything, I wanted to share the highlights of what I learned (and thanks to Lucy D’Agostino for wrangling all the R talk slides in one place).
Deep Learning in the Real World, Lukas Biewald (Crowdflower)
Lukas’ talk didn’t relate to R, but instead was a part of the larger Data Day. He cut through the marketing and got to what AI and deep learning are actually good at, and where they are being over-hyped. I’m excited to get a link to the video of this talk so I can show my marketing consultant coworkers a straight-forward guide on this topic.
Using R for Advanced Analytics in MongoDB, Jane Fine (MongoDB)
Jane showed how you can connect a MongoDB database to R and use it for your analytics. As I mentioned in my talk, at Lenati when we get large datasets to analyze in R we store it in Azure SQL. I could see a practical situation where it would be easier for us to use MongoDB. My next step is to actually trying running MongoDB in Azure and seeing if connecting it to R is as easy as she made it look.
Statistics for Data Science: what you should know and why, Gabriela de Queiroz (R-Ladies)
Gabriela went through the five key statistical concepts data scientists should understand. Having this list is useful to me personally for thinking about what to share with people when I help onboard them to our Lenati data science team. Often people coming into data science in industry don’t have a statistical background and the list of possible topics is daunting.
Pilgrim’s Progress: a journey from confusion to contribution, Mara Averick (RStudio)
Mara presented on how going from being a person who uses other people’s R resources (packages, answers, etc.) to one who contributes and creates isn’t nearly as scary as it looks. Adding documentation, features, or answers to the R open source community is valuable and everyone can do it. I personally have been terrified to dig into open source R code, so this talk inspired me to start trying to contribute to ggplot2.
Speeding up R with Parallel Programming in the Cloud, David Smith (Microsoft)
David presented on how you can easily spool up Azure batch jobs to run R foreach loops in parallel in the cloud. This was great for me because I’ve struggled to figure out how to run R in parallel in a practical way. In fact, that problem was one of the main reasons that we made the Lenati Loyalty ROI simulator in F# instead of R. I don’t have an immediate need for batch parallel cloud computing in R but at least now I know it’s an option.
Opinionated Analysis Development, Hilary Parker (Stitch Fix)
Hilary presented on how the job of analysis development (creating an analysis to inform a decision) has a lot of ways where it could be done more rigorously and effectively, and we as a field need to create processes to make sure we do so. I viewed her talk as the justification of why my talk existed: first we need to convince people of the importance of rigorous processes, and then we need to come up with the actual processes that work in the real world.
Using R on small teams in industry, me (Lenati)
In my talk I went through how our team of five data scientists figured out how to work together as analysis developers, and how we interact with all the business people around us. I covered storing data (Azure SQL and Dropbox), keeping code easy to trace (R scripts and Rmarkdown), and presenting to executives. I heard from a lot of my audience that just knowing the ways a different company deals with the same issues they had was really informative.
We R What We Ask: The Landscape of R Users on Stack Overflow, David Robinson (Stack Overflow)
Dave broke down some R trends in the stack overflow data and how to access the Stack Overflow data programmatically. It was shocking to see just how quickly tensorflow has grown compared to other data science tools, which makes me appreciate that Rstudio has made the keras package. Also compared to Python, R isn’t growing as quickly but is still being used in many new places.
The Lesser Known Stars of the Tidyverse, Emily Robinson (Etsy)
Emily ended the conference by presenting on some lesser known but very practical functions in the tidyverse. As an experienced R user I was aware of most of these, but I still was able to learn something extremely practical: a way of setting the theme for all ggplots in a markdown document. Given that the conference was for R users of all levels, it seemed like there was something for everyone here.
Besides the great talks, the thing that amazed me most was just how welcoming and warm the R community was. All the attendees, especially the speakers, were excited to help each other learn and grow. When I’ve been to events oriented around a technology, there is often an attitude of posturing of the attendees — including things like people who try and make themselves look good by asking tricky questions. At R user day, everyone was introducing each other, having welcoming conversations, and just being nice. It makes me extremely excited for the future of R as a language because I think the community supporting a technology is really what motivates people to continue using it. I wish I was able to go to rstudio::conf next weekend and I just know I’m going to get FOMO after having attended R User Day.