Mike Bostock to humans: 'Try to look for small problems first'

The AMA on Reddit with ex-NYT graphics editor and the creator of D3.js is simply the best motivational piece on the Internet for those who want to work with data-driven documents

by Sérgio Spagnuolo - editor of Volt Data Lab, a data-driven news agency based in Brazil — follow him on Twitter (@ProjetoStock)

If you are a data journalism nerd, or even a small-time enthusiast, you will surely know who Mike Bostock is. But, if you don't, I will tell you: he is the guy who made possible that cool data visualization you like so much.

As the creator of the JavaScript library D3.js, this guy is admired within all the data journalism community and beyond. In May, Bostock announced he was quitting his coveted job as the New York Time graphics editor to "work full-time on visualization tools!!" (second exclamation mark is mine) — notably the next major release of D3.

It takes a pretty freaking serious full-time work for someone to quit a dream job at the NYT, right? Well, my guess is that a person that is so notorious in such a specific, yet vast, field gotta have things covered for the long run. It is not like he would be out of a job if things go south (although I doubt they will go south in this case).

Well, on September 8, 2015, Bostock answered several questions made by data junkies over a Reddit AMA, under the nice page Data is Beautiful (yes, I know I am late with this post).

Although sometimes he got quite technical, this AMA is maybe the best motivational piece on the web for people who want to work with data.

As I myself am one of those people, I highlighted some parts of the AMA for you to immerse yourself in this world. Yes, you! You nerd.


Mike Bostock, Reddit AMA

On good ways to use data visualizations

"The data dictates whether there’s a worthwhile graphic to go with the story, so focus on data-gathering and analysis before diving into fancy (interactive) graphics. Charts shouldn’t be about making the story more eye-catching, but about communicating more efficiently — meaning, showing a pattern in the data that would be too laborious to describe in prose."

On Microsoft and big corps using open-source tech

The adoption of open-source technology by large companies can often have a positive effect on the technology, either through external validation (“if Big Co. uses it, I should too”) or through contributions. Though as we learned through OpenSSL and similar cases, it’s easy for free things to be taken for granted, and finding a way to make open-source development sustainable is a challenge. It’s easy to have open-source side projects, but it’s a lot harder to find multiple people able to work full time, for free.

On discipline and hard work

I find it to be the easiest thing in the world to work on something if you are passionate about it, and you can break it up into small pieces (like examples) that you can publish and share with others for external validation. So probably, choosing to work on things you are excited about, and then finding space to avoid distractions or interruptions is the key.

On how to handle data and use data viz softwares

If you’re just trying to find insight for yourself, then an exploratory tool (like Tableau, or R, if you don’t mind writing code) is probably best. If you’re trying to communicate insights to others, then a slower tool that allows greater expressiveness (like D3) could be more appropriate. The NYT graphics department is probably the most extreme case of the latter, where you invest heavily in a single graphic because it reaches a very large audience.

On newcomers learning D3

I recommend patience. To become proficient, you will need to master multiple skills: data collection and cleaning, quantitative analysis, visualization design, programming, web development, etc. It’s tempting to want to learn all of these things and do something amazing in a very short time frame — like, say, during a hackathon — but really the best approach is to be diligent and methodical. Keep practicing, keep tinkering with smaller problems, and you will gradually improve. I studied Computer Science as an undergrad, so certainly I had a lot of help. And before that, I played with Hypercard, Macromedia Director (Lingo), TI-82 graphing calculators, HTML, etc. I second the recommendation of Scott Murray’s tutorials (and his book), but probably the best thing you can do is to think of small coding problems that you are comfortable solving, and then increasingly ramp up to larger problems as you go. The satisfaction you derive from solving the smaller problems will motivate you to keep going.

On corporate use of D3

I think for bigger projects (like, enterprise applications built and maintained by larger teams, and not one-off graphics that are developed by one or two people over a couple weeks) you likely want an abstraction on top of D3 to help you keep things organized. I’ve seen a number of successful projects integrate Ember with D3, so that’s an option as well as React. I don’t have a lot of experience in this area, so I haven’t formulated a strong opinion as to which application framework I like best; but, I do think it’s important to keep D3 small and decoupled, so that it integrates well with whatever you want to use. I’ve not spent enough time playing with Vega to say anything insightful. The strictly declarative approach is certainly attractive, particularly for other tools that can generate Vega as output. Though as someone who does a lot of bespoke work, you can pry the DOM from my cold, dead fingers.

On becoming a strong coder for using D3

Try to look for small problems first, the sort of thing you can solve once per day, whenever you have time. The rewards from early victories are strong motivation to keep going.

On why D3 is so hard to learn

First, many people that are learning D3 are also learning web development at the same time. If you’re learning programming, JavaScript, HTML, CSS, and SVG simultaneously — and you’re also learning how to transform data and design visualizations — that’s a lot to take in. This is true more so of D3 than other visualization libraries because D3 is “representation-transparent”, which means that it exposes the web standards (such as SVG) rather than providing a smaller, specialized abstraction. The trade-off is that time spent learning standards will be more valuable in the long run, since it will apply to whatever tool you use. And of course D3 is more expressive. Second, D3 is, well, different. The data-join in particular. The difficulty you experience is you forcing your brain to change its perspective on the problem. It’s not just about making you work more efficiently; it’s about looking at the problem differently. The advantage is that once you have that A-HA! moment, you work more efficiently from then on. So it’s hard now, but it’s worth it in the end.

Hope this was useful for some. If you like this piece, please recommend it.

You can check my site www.voltdata.info
and follow me on Twitter: @ProjetoStock