I love data — and I broadcast that fact pretty widely.
If you’ve attended a party with me recently, I apologize for talking your ear off about data visualization tools for the web or the cool R package I was playing with recently.
If you play fantasy sports with me, you’re welcome for the charts. So many charts.
That has perhaps unsurprisingly led to me hearing this question more and more frequently:
“Nick, I want to get into data analysis and visualization, where should I start?”
Unfortunately there is no perfect one-size-fits-all solution — everyone’s needs are a little different, and what folks already know can vary widely. One of the things I love about the current technical/educational/business climate is that smart people from very different career paths and backgrounds are saying to themselves “I need to get more into data.”
If that describes you, organized from “no coding” up through “I ♥ code”, this is where I think you should start.
First, if you haven’t pushed Excel’s boundaries, it’s worth doing. Seriously. Learn pivot tables at least. It may sound lame, but Excel can do a lot more than people expect. It can even make pretty charts if you try hard enough.
If you have some data already and just want a good tool to explore it visually or to export more compelling charts, Tableau is incredibly popular and powerful. There is a free public version and a very expensive paid version which you can get for free as a student. It can publish to the web, or to static graphics to include in research papers, post to Instagram or print out as giant wall-sized charts. The Tableau Public website has a lot of quality examples posted for you to get inspiration from.
Sadly, the next “No coding” tool I like to recommend, Infoactive, is shutting down…but on the bright side it is because they were acquired by Tableau. This hopefully means good things for Tableau Public in the future. I will plug a free book spearheaded by the Infoactive team that is useful background on data visualization design using any of the tools I cover here:
If I were picking one single programming language to use solely with data I would pick R. It’s free, supported by tons of ongoing development adding useful packages on top of the base language, and there are great free resources to learn it. First among those resources — I cannot recommend these Coursera classes highly enough:
Data Science Certificate | Coursera
Become an expert with Data Science Specialization offered by Johns Hopkins University.to be a Data Scientist Take free…
Taking all of them might be overkill for a true beginner, but the track of classes walks a nice line from the introduction of key data science terms and ideas, through exploratory data analysis (which covers useful packages for R like ggplot, a very popular visualization tool) all the way to adding interactivity, publishing to the web via Shiny and storytelling with data.
R is what I use most frequently for small, quick analyses and ad hoc visualization — if you’ve got a dataset that Excel is struggling with (too big, not flexible enough, poor visualizations), R is perfect for exploring quickly.
This is also the time for a quick “yes, you should probably learn some SQL.” SQL is very targeted in scope compared to R (really, it’s far from an apples to apples comparison)—but if there are databases that you need to dive into to gather data for use with any of these other tools or languages, there is a good chance you’ll want to know SQL, and it will pay dividends in the long run.
I ♥ code
More often than not, the question of “where should I start?” comes in response to a fantastic interactive visualization presented on the web. I’m a huge fan of all the recent innovation in this area (see my in-depth survey of innovative work here).
Unfortunately, if you really love this piece:
A Visual Introduction to Machine Learning
Let's revisit the 73-ft elevation boundary proposed previously to see how we can improve upon our intuition. Clearly…
…it can be disheartening to find out how much you have to learn to be able to build your own. It’s worth reiterating up front that “being as good as the New York Times” is a tough goal. A worthy one, but tough.
Fortunately, there are many great resources to help.
Bostock’s website is a gold mine of examples and tutorials (you can’t beat learning from the creator of the library…). I’d also recommend Interactive Data Visualization for the Web by Scott Murray, which you can either buy from O’Reilly or work through for free online:
Interactive Data Visualization for the Web
This is a book about programming data visualizations for nonprogrammers. If you’re an artist or graphic designer with visual skills but no prior experience working with data or code, this book is for you. If you’re a journalist or researcher with lots of data but no prior experience working with visuals or code, this book is for you, too.
One area to call out as a particular strength of D3 is geospatial visualizations. D3 is great at creating maps of many flavors, and there are nice dedicated tutorials available if that’s your area of focus:
Let's Make a Bubble Map
My previous Let's Make a Map tutorial describes how to make a basic map with D3 and TopoJSON; now it's time to cover…
D3 can be difficult to use directly, but there are many tools you can use on top of it to make your life a little easier. I’d recommend learning at least the basics of D3 rather than only using a more abstract plotting library, but if that proves intractable, a tool like Plot.ly can help make things feel more approachable.
Finally, if you really want to learn a do-it-all programming language that just happens to be great at data visualization, go with Python. Python is the most general purpose and powerful tool of anything I’ve listed, and it’s quite popular in the data science community.
I find Python very approachable as a multi-purpose programming language, but in truth it is probably overkill if all you want to do is explore and visualize data. Youtube is built with Python, for example…1 million lines of it. If you do go the Python route, the Code Academy course is a short (10–20 hours) and fun introduction to the language.
Phew. That’s a lot. Have fun — and if you build something cool, send it my way! Find me anytime @uptownnickbrown on Twitter.
Find more of my writing here on Medium:
The past and future of data visualization
A dive into the invention of key chart types and what innovations are coming next
The Story:System Spectrum
Last week, Bloomberg published What Is Code? a magnum opus by Paul Ford. Meticulously crafted in both prose and…
Or at http://uptownnickbrown.com/: