Data Science and R: how do I start?

Jesse Maegan
6 min readJan 4, 2018

--

It always starts with a DM on Twitter, where someone shares with me their personal data science ambitions, where they currently are in their plans, and then they follow up with a request for me to help them figure out where to go next.

I love these messages — they’re an affirmation that the R community continues to grow and attract new members in part by creating a welcoming and supportive space for beginners, and that our community members are deemed approachable (enough) for someone brand new to R to reach out!

These messages used to be infrequent enough that I would spend time writing a tailored response to each individual, but over time the frequency has increased to a point where I can’t respond with as much thought and attention to detail as I would like. Rather than offer up a generic response, I’ve created this post for you, the data science beginner that wants to learn R!

we’re so glad you’re here!

But what about [resource]?!

I get it — we all have our favorite resources! This list is by no means meant to be comprehensive at all. It’s intended to be very biased towards the resources that I’ve personally used, but that doesn’t mean they’re the only resources out there!

You’ll also notice that this list leaves out things like deep dives into Machine Learning and Artificial Intelligence, and that’s intentional. This list is aimed at someone literally starting out with data science and R — complex topics in ML/AI will be there for them later on in their learning path.

there is no shame when it comes to awful puns

If your math skills need some love

Linear Algebra and Calculus — videos

Statistics — videos

Statistics — books

me, trying to figure out how to split the check 6.2 ways

If you’re new to R

R — books

R — other awesome stuff

I aspire to this kitten’s level of greatness

If you’d like to explore Computer Science offerings

what I imagine an MIT capstone project involves

On learning to learn

From open house to home ownership

I had a couple of amazing Chemistry professors in undergrad, and I will never forget how one of them phrased learning:

The first time you encounter a piece of information is like going to an open house. You don’t know if you’re going to rent that house, buy that house, let alone if you’re even going to like that house.

The next few times you encounter that same information is like renting a house — you’re committed for a relatively short amount of time, but you’re not necessarily in it for the long haul. You don’t own it, and you can’t really make any changes.

Once you’ve really learned something, you’ve bought the house. It’s yours. You can knock down walls and landscape the yard and you don’t have to ask permission to do so, because you own the house.

When you first encounter something in data science and/or R, it’s 100% OK to forget it 20 minutes later. At first blush, you don’t know if you’ll ever need this information again! So why go through an online course or a textbook once, expecting to extract all of the information?

Instead, try getting comfortable with the fact that you’re going to forget a lot of things when you first start out — but the more that you read and re-read and learn and practice and apply what you’ve been learning, the more you’re going to remember.

Let go of perfection

It’s so easy to use perfection as procrastination — you don’t have the correct plan for documenting your learning, you don’t have the right color pen, the music is too loud, whatever. These are all excuses.

So what if you don’t document every moment of your learning process on a personal blog that you created using blogdown? So what if your notes are scattered in a couple of different notebooks and half-used GitHub repos? Who cares if you even take notes?!

Commit to yourself every day and show up and do the work — if a course or book or video isn’t working for you, try something else! This is your journey — you get to decide how you get there.

My daily learning habits:

  • Read through and engage with the #rstats hashtag on Twitter a couple times a day
  • Read through the tidyverse section of the RStudio community site once a day (I’m personally working on getting better at the tidyverse, but feel free to substitute in any tag that is of interest to you!)
  • Spend two hours a day working on content knowledge such as statistics, linear algebra, calculus, or computer science
  • Code for at least 30 minutes a day in addition to what I do for work
  • Engage with the R for Data Science Online Learning Community once a day
you’ve got to practice! work on cultivating discipline instead of waiting for motivation.

Questions that I can’t answer for you

These are all potentially life-altering decisions that I am in no way qualified to help you answer. These are, however, great conversations to have with people who know you well.

  • What degree should I get?
  • Should I do a data science bootcamp?
  • Should I drop out of school?
  • Is an advanced degree worth it?
  • Should I change my major?
this will be my new response GIF when you ask me any of the above questions

What now?

Share your success story!

One of my all-time favorite things is when someone shares with me their recent R success story — whether something has finally clicked, you learned a small trick that’s made your workflow more efficient, or you’ve gotten your first data science job — tell me about it on Twitter! I’m starting to collect all of the amazing success stories out there, and would love to add yours. And remember, no accomplishment is too small!

Blog about your experience

You have something to say — you’re alive on this planet commanding a tiny metal box to do your data science bidding, and that’s awesome. Now tell the world!

you’ve got this!

--

--

Jesse Maegan

molecular biologist turned public school teacher before falling in ❤️ with non-profit data science. perpetual #rstats noob.