Stories by Zan on Medium

Be as Cautious as Possible for the next 14 days

Zan — Sun, 15 Mar 2020 00:35:06 GMT

This is the letter I wrote to family and friends this morning. Within 20 minutes I’d gotten texts from my friend who is an ER doctor and my friend who is a public health nurse. They both said Thank You.

With them in mind, I’m sharing this more widely here. (with updates, as I learn more)

Hi friends and family,

Nothing like what is happening right now has happened in my lifetime. It’s hard to wrap my head around how different this is from what is normal.

You know I’m not a public health expert or a medical professional.

That said, these recommendations don’t rely on any specialized knowledge. They are based on a bit of math (exponential growth), data from trusted sources, and framing how we think about upside/downside risk during a very limited time frame (14 days).

I’ve put together this document with the best information I’ve seen so far (as of March 14th).

I’m sharing this with you because I want you to be ok because you are important to me. And because I want to support our community: especially our hospitals, doctors, and nurses who are going to be putting everything on the line to take care of us. I’m willing to risk looking alarmist, because the risk of not doing enough is too high. Plus… I (unfortunately) don’t think what I’m writing here is wrong.

Then re-evaluate based on what we know then.

Why 14 days?

we will have a better idea of how bad this really is
we will continue to better understand the disease and how it spreads
if we do extreme social distancing for two weeks, we can reset our understanding of who is contagious… and who is not. Which means we can start to interact more.

Risks

Maybe I’m being too alarmist and maybe this is all wrong. But…

If you take the most cautious path possible for the next 14 days, and it turns out this was no big deal… what did you lose?

If you aren’t not cautious enough right now and you get sick, or our healthcare system collapses… what did you lose?

Your life?
Dying a really painful death (drowning in the fluid in your lungs)?
Not getting to say goodbye to your loved ones?
You + those around you getting pretty sick?
Unknowingly infecting somebody you love….leading to them getting really sick or dying?
Getting somebody in your community sick, who can’t choose to be more cautious or who is more vulnerable?
Adding a burden to your city/hospital/the doctors & nurses who are risking their lives to take care of your community?
Being sick so you can’t help others?

What does cautious mean?

Again, I’m NOT a doctor or health professional.

[Aside: I haven’t done enough research yet into how transmission takes place to speak to this, and I don’t know if it’s yet known. If you have a good source for this, please find me on Twitter (zanstrong) and I can post it]

There are a lot of unknowns. So, to me, this means avoiding unnecessary contact with other people because right now we do not know who might be contagious (at least outside of a few countries like South Korea, Singapore, etc that have done extensive testing & tracking).

Don’t believe me? Here is what UK scientists are saying: http://maths.qmul.ac.uk/~vnicosia/UK_scientists_statement_on_coronavirus_measures.pdf

Goals

My first goal is selfish… I want the people I love to be ok.

Fortunately, that is in line with the second goal: to avoid crashing our healthcare system

From this article on why acting RIGHT NOW matters so much:

As a politician, community leader or business leader, you have the power and the responsibility to prevent this.

You might have fears today: What if I overreact? Will people laugh at me? Will they be angry at me? Will I look stupid? Won’t it be better to wait for others to take steps first? Will I hurt the economy too much?

But in 2–4 weeks, when the entire world is in lockdown, when the few precious days of social distancing you will have enabled will have saved lives, people won’t criticize you anymore: They will thank you for making the right decision.

https://medium.com/@tomaspueyo/coronavirus-act-today-or-people-will-die-f4d3d9cd99ca

Yes — a lot of us will eventually get the virus. It’s still better to act early, and be cautious now. Give the researchers and hospitals and society a chance to get ready.

By Alexander Radtke - Alx — learn more on https://flowingdata.com/2020/03/09/flatten-the-coronavirus-curve/

So, please #flattenthecurve and #slowthespread

https://thespinoff.co.nz/society/14-03-2020/after-flatten-the-curve-we-must-now-stop-the-spread-heres-what-that-means/

Reasons (and references)

1. The number of people infected will grow exponentially, unless we take extensive and extreme measures to slow the growth rate.

https://medium.com/media/870c3ace37ea7d29e76f8b3824fb0e30/href

Extreme measures mean either testing + strict quarantine for anyone exposed, or essentially shutting down society if we’re past the point of knowing who might be contagious.

Source: Financial Times. Link to John Burn-Murdoch’s Twitter account with updating graphic

2. Don’t trust case counts in the US: we’re not currently testing most people who need tests

People who have symptoms and want tests can’t get them. CNN

Fortunately our universities and public health labs are creating their own tests and stepping up — link.

But we are very far behind. Do NOT trust any US case count numbers currently.

3. The US is ~11 days behind Italy. It’s really bad in Italy.

The hospitals in Italy are already failing, doctors are choosing who they can take care of … and who to let die.

Both the chart above and comparing number of deaths indicates that we’re ~12 days behind Italy right now.

Death counts are more accurate than case counts, because they are a stronger signal and rely less on testing. Italy currently has 1441 deaths and the US 51 — Johns Hopkins data. Assuming the 33% daily growth rate from the original chart and similar death rates, that also puts us ~11 days behind Italy.

What is happening there now will be the US in 11 days. And, it’s only going to get worse unless we do something to mitigate it.

But it is possible to mitigate: see how Italy’s curve is starting to peel off the 33% growth line.

4. Our individual, collective, and civic actions do matter

See the top chart for Singapore, South Korea, China, etc.

Or, these simulations from the Washington Post. Although, they are missing an important color… there are not simulating the deaths.

https://www.washingtonpost.com/graphics/2020/world/corona-simulator/?fbclid=IwAR17sNwkeAtSJ1NXwtExJ_RYob0cu4qRYxlUiy02nsb-YuO4lYVEJiBlGfE

5. Be as courageous as the civic leaders in Saint Louis in 1918.

“ Within two days of detecting its first cases among civilians, the city closed schools, playgrounds, libraries, courtrooms, and even churches. Work shifts were staggered and streetcar ridership was strictly limited. Public gatherings of more than 20 people were banned…

The extreme measures — now known as social distancing, which is being called for by global health agencies to mitigate the spread of the novel coronavirus — kept per capita flu-related deaths in St. Louis to less than half of those in Philadelphia, according to a 2007 paper in the Proceedings of the National Academy of Sciences.

https://qz.com/1816060/a-chart-of-the-1918-spanish-flu-shows-why-social-distancing-works/

Are you a leader, trying to decide if you should cancel an event?

This article was written for you: https://medium.com/@tomaspueyo/coronavirus-act-today-or-people-will-die-f4d3d9cd99ca

6. Be as cautious as you can possibly be right now, for the next 14 days.

Typically we increase our caution as the severity of an issue seems to get worse, ramping up caution as the situation gets worse.

Instead, I urge you to jump right now to full caution.

Be as cautious as possible for the next 14 days. This is especially true if you have any risk factors that mean you will likely need to be treated in a hospital (and have a higher risk of death).

In 2 weeks:

we will have a better idea of how bad this really is
we will continue to better understand the disease and how it spreads
if we do extreme social distancing for two weeks, we can reset our understanding of who is contagious… and who is not. Which means we can start to interact more.

If things are ok or we have small groups that we know are not contagious, we can then re-evaluate and be less cautious.

Note — this warning is *especially true* if you are over 60 or have additional risk factors which mean you will likely need to be treated in a hospital. But, if you’re under 60 and have no risk factors, it’s time to STOP thinking you are somehow safe. In Italy, “Twelve per cent of those who have been treated in intensive care are aged between 19 and 50, according to official figures released last week.” — link

And, from a person currently recovering in the ICU: “Important point: we really don’t know much about his virus. I’m young and not high risk, yet I am in the ICU with a very severe case” — link.

7. Yes — social distancing is a privilege: doing the best you can leaves more resources and helps protect those who don’t have that privilege

Different people’s needs and circumstances are different. I urge you to choose to look at your options… and be as conservative as is reasonable/possible for you.

I am lucky that I can (and am actually required) to work from home right now.

If you have the privilege to be able to do extreme social distancing, then please do it. This will leave the hospital beds and resources for those who don’t have that privilege. And will slow the spread.

8. Yes — the economic impacts of this are huge and worrisome, but they will be worse if we swamp our hospitals.

The first thing we need to do is #slowthespread and #flattenthecurve. If the number of people who need medical support overcomes the ability of our hospitals to meet that demand, the economic impact will be worse.

No citation… this just seems clear.

9. Concerned about the impact to your local restaurants/businesses/organizations? Support them by buying gift certificates.

This means the small businesses get the money now to help them, and the people who rely on them for income, make it through this challenging time.

(Thanks Mom for the great idea!)

10. This is a crisis unlike anything else the world has experienced in my lifetime.

I’m thinking a lot about the stories of my grandparents and great-grandparents. The ways that they rose to meet challenges that I can hardly imagine. Facing things I’ve never imagined dealing with.

My Grandma Armstrong lived in isolation for two years when she caught TB, only allowed to wave to her 2yr old and 4yr old sons and her husband through a window. She recovered and lived until over 90 yrs old.

It is in her honor, that I write this email and that I choose caution. In this crisis, it’s not going to be two years in seclusion… it’s going to be life or a death for many people… and for the hospitals, doctors, and nurses who take care of us.

11. You can still go outside. Just… stay away from other people

Go for a walk or a bike ride. Go to the forest. Go to a park. (but maybe not a playground — evidently it stays on surfaces).

12. Unfortunately you need to be careful of scammers, as well as of the virus

For example, Medicare sent a warning “Scammers may use COVID-19 as an opportunity to steal your identity and commit Medicare fraud. In some cases, they might tell you they’ll send you a Coronavirus test, masks, or other items in exchange for your Medicare number or personal information. Be wary of unsolicited requests for your Medicare number or other personal information.

It’s important to always guard your Medicare card like a credit card and check your Medicare claims summary forms for errors. Only give your Medicare number to participating Medicare pharmacists, primary and specialty care doctors or people you trust to work with Medicare on your behalf. Remember, Medicare will never call you to ask for or check your Medicare number.”

Ug. Be careful out there.

13. Fortunately, in general PEOPLE are so truly WONDERFUL, AMAZING, GENEROUS, and COURAGEOUS. Let’s do this for each other.

I love how creatively people are coping, including the songs from balcony’s in Italy. And, South Korea, Singapore, and other countries are showing us that we can collectively step up to the challenge.

People are so, so, so wonderful. We’re all in this together. Let’s take care of ourselves, so we can take care of each other.

A Sampling of Data Visualization

Zan — Wed, 19 Sep 2018 21:58:50 GMT

I wrote this originally for students studying CS at Burton High School in San Francisco, but figured I’d post on Medium to have the chance to share with a wider audience as well.

Data visualization means very different things to different people in different contexts.

When you think of “data viz”, do you think of a pie chart or map used in a presentation on the floor of Congress, the charts in your math textbook, a piece of art in SF Moma, the futuristic displays we see in movies, the chart in your phone showing the amount of data you’ve used in the last month, or the graphs in the Economist or the New York Times?

We also often utilize visualizations without thinking of them as visualizations, like a heartbeat on a monitor in a hospital, charts of stock prices, a weather forecast, or the user interface you use when comparing prices/times/duration of plane tickets online.

Below are 10 different data visualizations that can be found online which exemplify some of the variety in data visualization. They vary in visual form, intended use, and the type of data they display. As you look at them, enjoy them for themselves. And, consider how each visualization’s visual form, presumed purpose for being created, and the type of data they display relate to each other.

What’s Really Warming the World

Creator: Eric Rosten and Blacki Migliozzi, published in Bloomberg Businessweek

Link: https://www.bloomberg.com/graphics/2015-whats-warming-the-world/

This animated news graphic compares factors that are proposed to be contributing to global warming to help readers to engage meaningfully with the data, and understand the story of global warming through the data. It’s part of a larger story.

Figures in the Sky

Creator: Nadieh Bremer

Link: http://www.datasketch.es/may/code/nadieh/

Visualizations can be both beautiful and informative.

You Draw It: How Family Income Predicts Children’s Changes of Going to College

Creator: Gregor Aisch, Amanda Cox, and Kevin Quealy of the New York Times

Link: https://www.nytimes.com/interactive/2015/05/28/upshot/you-draw-it-how-family-income-affects-childrens-college-chances.html

In this story, readers have to draw their best guess at the graph showing the relationship between parent’s income and children attending college. After they draw the graph, they can see the true data.

Earth Wind Map

Creator: Cameron Beccario, inspired by Fernanda and Martin Wattenberg’s US wind map.

Link: https://earth.nullschool.net

This map of the world shows the data about wind from the last few hours. Note: click on “earth” to switch to different types of metrics. It shows a large amount of data at one time.

Visualizing Algorithms

Creator: Mike Bostock

Link: https://bost.ocks.org/mike/algorithms/

This set of visualizations compares, contrasts, and explains different CS algorithms and randomness (or lack there-of).

A Timeline of Earth’s Average Temperature (as a cartoon)

Creator: XKCD — Randall Monroe

Link: https://xkcd.com/1732/

Visualizations don’t need to be interactive to be powerful or impactful. I love the use of scale and aspect ratio in this cartoon about global warming. Like the Bloomberg piece above, it shows the parts of the graph when “nothing is happening” to make the importance of when something is changing much more clear.

The Fallen of World War II

Creator: Neil Halloran

Link: http://www.fallen.io/ww2/

The Fallen uses visualization, narration, and animation to tell the story of the people who died in World War II.

Why are so many babies born at 8am

Creator: Me (Zan Armstrong)

Sometimes the same data can (and should) be visualized in different ways, for different purposes. Many charts are created by someone as part of their analysis, and are never shared. Or, are used successfully to identify an interesting aspect of the data that is then shared in a different form (a story, presentation, or in a graphic designed for communication rather than analysis). Charts for analysis will look scrappier and less polished than charts designed for communication, just as the notes we take for ourselves while interviewing a source or working through a science problem would look like a work-in-progress compared to a final publication.

I created the graph shown below while analyzing CDC data about the time of data babies are born. I ended up using a version of this chart in my talk on Everything is Seasonal.

Graphic created for analysis

Creators: Nadieh Bremer and I (Zan Armstrong)

Later, I revisited this data in collaboration with Nadieh Bremer to create a graphic science article for Scientific American.

Link: https://blogs.scientificamerican.com/sa-visual/why-are-so-many-babies-born-around-8-00-a-m/

Mortality of the British army : at home and abroad, and during the Russian war, as compared with the mortality of the civil population in England

Creator: Florence Nightingale

Link: https://archive.org/details/mortalityofbriti00lond

Impactful data visualization doesn’t have to be made on a computer, or made recently. This set of visualizations created by Florence Nightingale convinced the Queen of England and the military to take sanitary conditions more seriously, as far more soldiers were dying of illness than on the battlefield.

Parable of the Polygons

Creators: Nicky Case and Vi Hart

Link: https://ncase.me/polygons/

Visualizations can help us model our world through simulations. In this case, the simulations helps create a way to talk about and think about complex social issues.

Open Data Sources

Zan — Mon, 16 Apr 2018 00:02:49 GMT

I recently taught a lesson on data visualization to a class of CS students at Lowell High School in San Francisco. One question was about finding good sources for open data online, which students could use in their final projects. Instead of just writing a doc and sending them, figured I’d write a blog post to share more widely. Many other people keep much more extensive lists of open data than I have, so several of my recommendations are to other lists.

Data SF

Since Lowell HS is based in San Francisco, let’s start with something local. Data from the city government can be found on https://datasf.org/opendata/

Data is Plural

I love the Data is Plural list by Jeremy Singer-Vine. It has a great variety of fascinating data sources. The types of content include data that is on quite serious topics, others are playful, and some are just wonderfully quirky. Where else can you find data on burritos, Incan knotted strings, air pollution, and negotiations with North Korea all in the same place? Some are very large datasets and others much smaller.

Click on the link to check out the archives and/or subscribe.

Scott Murray’s List of Resources

Scott Murray, author of Interactive Data Visualization for the Web, provides a list of resources for data visualization on his website. This includes a long list of data and visualization related newsletters including Data is Plural.

Sometimes data is “locked” in pdfs. Scott points to a number of resources for extracting structured data from pdfs here.

FiveThirtyEight: Sports and Politics

FiveThirtyEight publishes the data and code behind a lot of their graphics, which they post here. They include a link to the article, information about the dataset, and the data itself for easy download.

This is especially nice because you can get both the data and see how the FiveThirtyEight team used that same data in their reporting.

The New York Times Upshot has a similar repository on Github with the data and methodology behind some of their stories.

CDC: Life, Death, and Everything Health Related In-Between

The CDC provides a lot of information about health, wellness, disease, births, and death in the United States. The datasets are here. You can usually download the data or go to an online data explorer.

ProPublica’s Data Store: Shedding a Light on Explotation

The ProPublica Data Store gives you access to the data behind our reporting and helps to sustain the challenging, expensive work of investigative reporting. We provide free access to the raw data behind our work, as well as premium data products and custom data services. These and other initiatives support ProPublica’s mission of investigative journalism in the public interest.

ProPublica’s Data Store is here.

Bureau of Labor Statistics:

The United States Bureau of Labor Statistics has a lot of data about people working in the United States. I often find it a little difficult to find what I’m looking for here, but it has a lot of potentially interesting information about the US.

Story Behind the Viz: The Baby Spike

Zan — Wed, 01 Nov 2017 02:42:55 GMT

When you first look at the visualization below, what do you notice first? The big spike? The colors? The shape? Something else?

Does your first impression make you want to look at it more? If yes, why? If no, why not? What do you notice as you look at it longer?

Does anything surprise you?

What features help you interpret what’s going on? Are these effective or not?

Does it make you ask more questions, or want to learn more? Why or why not?

Lastly, does it change any assumptions you had about baby’s births?

Focusing on the 1440 minutes per day

Nadieh Bremer and I created these data visualizations for the July 2017 issue of Scientific American’s Graphic Science page. I also wrote a more detailed accompanying blog post Why are So Many Babies Born around 8am?

The heart of the visualization is a radial area chart showing how the number of babies born for each of the 1140 minutes of the day compared to average. In print, this was accompanied by two other charts showing different time scales. Online, we shared three different minute-of-day chart; one for each method of delivery. This revealed distinct underlying seasonal patterns by delivery method, which together combined to create the observed overall daily pattern.

It’s just a one-page article, with 3 charts, and another 3 charts online. Yet, we thought a lot about how we wanted to present this data in a way that was engaging and best fit the story we’d discovered in the data and wanted to share. Perhaps some of these details relate to your answers to the questions I posed at the beginning?

Nadieh and I thought it would be fun to share our process and the story behind the viz. In my post here, I’ve highlighted some of my favorite design insights and decisions made in the course of the project and why we chose them. You can also check out Nadieh’s blog for her insights into the design process, including images showing iteration. There are also some great bonus bloopers!

Credit: Nadieh Bremer and I, with editing by Jen Christiansen — original published on Scientific American’s blog

This project had 5 main stages, and different design elements emerged from each stage. Let’s start back in March of 2016…

1. Finding the data and core story

In March 2016, I downloaded CDC birth data because I wanted to include an example of minute-of-day data from a public dataset to illustrate seasonality in very granular data for my OpenVis Conf talk Everything is Seasonal.

Most public datasets don’t have minute-level granularity, so I breathed a sigh of relief when I found this CDC (Center for Disease Control and Prevention) dataset showing the number of babies born per minute of day and day of week by year.

I’ve found that most data that has to do with people or nature will have minute-of-day seasonal patterns. So I expected to see something interesting.

But, wow! It was much more striking that I expected, especially when each point of the graph represents a minute and I “faceted” by day of week.

One of my first charts with this data, made in R showing the stark difference between weekdays and weekends

After sharing this chart with my friend Brendan who is a nurse, he responded: “Can you break out natural vs induced vs c-section?”

So, of course I did. And, this is what I saw. Note that all these charts have the same y-axis scale.

As Brendan replied “Damn that’s a lot of c-sections.”.

This visual form worked well enough for the point that I wanted to make in the presentation. Instead of being overwhelming or “too much data”, looking at granular minutely data was actually easier to interpret than aggregated data. It revealed strong, rich patterns. Seeing all the dots reinforces the feeling that this isn’t just some rogue datapoint; something is going on here.

I tweaked the charts just a little bit, and then they were ready for the talk.

Explaining the metric

I also found daily data, which I aggregated to weekly to remove the day-of-week effects, to illustrate the cycles that we also see annually.

“Week of Year” birth patterns from the same talk

At this point, I’d already established a few key insights that would influence the final visualization published in Scientific American.

I showed different seasonal patterns at different levels of granularity (minute of day, day of week, and week of year).

2. It was clear that medical intervention was part of the seasonality story.

3. I loved that showing fully granular data, a dot for every minute, made the pattern feel more real and striking. And, lastly, showing multiple small versions of the chart, “small multiples”, provided context and enabled comparison.

The original form was effective for the talk, and it fit the rest of the presentation in which all the time series were presented with time on the x-axis and count on the y-axis.

However, in revisiting this data to create a stand-alone, printed piece, there were a number of specific challenges that I wanted to address.

For example, the peaks were much more obvious than dips.

It was hard to visually distinguish the more subtle shape of the spontaneous (no c-section/no induction) births vs induced. These are actually quite different, but they look pretty similar in this original form. The importance of this was actually much more clear once we had developed the new visual form, because the differences are so much more obvious.

The week of year and minute of day visualizations didn’t really fit together, as part of the same story or visual form.

The metric itself “the total number of babies born per minute on a particular day of week” was how the data was defined in the data source. But, it’s such an awkward construction to explain! Sometimes it takes another’s perspective to see the obvious. When Nadieh later asked why not just use “average number” instead of “total over the course of the year”, I immediately normalized the data and made the switch. To do this, I had to also adjust for the fact that some days of week occur 52 times and some 53 times in a year.

Oh, and aesthetics! Beauty isn’t just a “nice-to-have” when engaging a reader’s attention. It makes a big difference to create something that is enjoyable to look at. If you want to look at it, you’ll look at it more! Moreover, when done well, subtle aesthetic details help the reader notice more about the data itself and see a richer, more complex story.

The strengths, and weaknesses, of the original set of charts along with a changing the context to a polished printed piece set the stage for later design decisions. But, I’m getting ahead of myself…

A quick aside: why are there so many C-sections?

In the US in 2014, 32% of births were c-sections, 18% were induced, and 50% were spontaneous. Having seen the dramatic peaks at 8:30am and noon on the c-section chart above, it’s probably not surprising to learn that many of these were scheduled. In fact, of the c-sections, 75% of c-sections were planned/scheduled and the other 25% were unscheduled.

At this point you might be wondering why there are so many scheduled c-sections and inductions. That’s a great question. Understanding what drives these rates, why they vary from country to country and hospital to hospital, and what they “should” be requires investigating many intersecting factors. There are a suite of questions to ask about what’s driving decision-making and recommendations for hospitals, doctors, insurance, and patients. It’s also not an easy question, either at the population level (what percent of births *should* include intervention) and the individual level (what should this women do?). And, there are lives and health at stake in the answers, both for the moment of birth and the recovery afterwards. Historically childbirth is one of the most dangerous things a women might do in her lifetime and being born is one of the most dangerous things we all do in our lifetimes.

The goal of this visualization is to show that many births are scheduled. These “why” questions, however, are out of the scope. I hope we’ve peaked your curiosity to learn more and please check out the articles linked in the appendix of this article!

That said, it’s worth noting that “scheduled/planned” and “elective/voluntary” are not synonymous. For example, a heart surgery might by unscheduled, if somebody shows up in the emergency room or something goes wrong while they are in the hospital for something else. Or a heart surgery, like an angioplasty to clear partially blocked arteries, might be scheduled due to a high risk of something bad happening if a medical problem is not addressed relatively soon. In both cases, the procedure is recommended by doctors and the hospital. It’s just that in one case it’s in reaction to an immediate emergency and in the other there is time to plan ahead for how to best address a known risk. This is similar for c-sections and inductions. There are unscheduled inductions and c-sections due to something that came up unexpectedly during the labor. There are also scheduled inductions and c-sections, which in the US are primarily in response to a medical recommendation. These might be due to factors that are known before labor starts. For example, for C-section the woman might have diabetes, a heart condition, had a previous birth by c-section, or the baby might be in a “breech” position or not growing well enough — source.

2. A story about seasonality of birth, told graphically at three levels of granularity

A week after I gave the OpenVis Conf talk, I was absolutely thrilled when Amanda Montanez from Scientific American reached out asking if I might be interested in creating a visualization based on the talk. When I was a teenager, my family subscribed to Scientific American and I often discussed articles and ideas from the magazine with my Dad. Therefore, it was a dream come true to be invited to contribute to a well-respected publication which had also been so personally important to me.

I loved how Amanda’s framed the story in her pitch to the Scientific American editors, as shown below, and I was so excited to run with this idea.

The Seasonality of Birth

This idea was sparked by a data viz talk on seasonality I saw a while back. I had heard before that hospital births tend to spike around the times when doctors’s shifts are ending. However, I’d never seen it visualized…it is quite dramatic! In addition, there interesting patterns if you look at days of the week (fewer births on weekends and holidays) and weeks of the year (more births in late Sept/early Oct than any other time, consistent dips at the beginning of January, etc)....Could be interesting to do a set of time three series, each one zooming in from the previous one to show a more granular level of seasonality. And annotate to explain each.

Because of some schedule constraints in my personal life, including taking some extended time off to travel to Alaska, Oregon, Namibia, Mozambique, Egypt, Chile, and Argentina, it wasn’t until March 2017 that I was ready to follow up.

Despite this being a relatively small project, I also thought it would be a great chance to collaborate. I love Nadieh Bremer’s work. Her creations are always beautiful, her technical skill in R, Javascript, and Illustrator is excellent, she contributes to the community through her presentations and tutorial-style blog posts, and she respects the story that is revealed in the data. We’d both spoken at the same OpenVis Conf in 2016 and enjoyed meeting there. Moreover, our skills sets are partially overlapping and partially complementary, so I thought this would be a great chance to collaborate.

I was thrilled when she agreed to collaborate, and made the time for this project despite having a packed schedule of travel, work, and presentations. It worked out even better than I might have imagined, both in terms of leading to a better final product and being a fantastic, fun, creative, thought-provoking experience! In writing this blog post, I reread many of our emails from the time, and enjoyed reliving the energy, shared curiosity, mutual respect, and sense of exploration as we evolved our understanding of the data and form through words and images.

3. Defining the core visual form in a few intense creative in-person working sessions

In just two in-person working sessions in San Francisco, Nadieh and I worked together and established the core visual form we would use.

In particular, we would have 3 radial charts: one for minute-of-day, one for hour-of-week, and one for week-of-year. We also planned to dive into the second data set, by delivery method, at a later date. In each chart, we would focus on the differences compared to average rather than the raw counts, although we had a big design challenge in front of us for how to actually depict that difference to average (area? bars? something else?).

On day 1, I served up munged data from Python while Nadieh was working magic in R. We sat next to each other, in almost constant conversation as we played with various forms. By day 2, Nadieh had written the core of the viz into D3 and we again focussed on quick iterations coupled with lots of discussion. For these quick iterations on a relatively small project it worked well to have a bit of division of labor and focus on whatever enabled us to experiment/explore most quickly.

Technical skills open up a wider design space

While it’s easy to conceive of design and technical skill as independent, it was obvious during these two sessions how Nadieh’s facility in code directly impacted our design decisions. And, as Nadieh reminded me, “it was also often you providing me with a different view on the data within mere minutes that was crucial. During the first parts of our process being able to quickly create different ‘lenses’ on the data and create crude plots of those…is the complementary part to being able to quickly change visual elements later on in the design phase.” Throughout the project there were numerous moments where we were debating if we should go with one idea or another, when we would just stop debating and try it.

If this had been a 30 minute, 1 hour, or 5 hour task, we would have had to make a decision about if it was even worth trying. Or, we’d have just made a guess and gone with that. When it takes just a couple minutes, why not just see what it looks like? In this way, technical skill enables one to explore a much wider and more nuanced design space. This also means we could be more responsive to the data itself, because we could see what the form actually looked like with the real data rather than what we imagined it might look like.

Radial design, with comparison to average rather than to 0

Legend illustrating the key insight: comparing to the average line

Decisions were rarely all or nothing, but rather nuanced. When we first took a look at radial designs, there were some flaws.

Most importantly, it’s really easy to lose track of the center of the circle with radial line charts as your brain sort of just assumes that the center is in the center of the shape, even if it’s off-set. I had struggled with this in a previous project, weather circles, and recognized it as a challenge here.

Additionally, in both the original rectangular grid charts and our first radial line charts, peaks were more obvious than dips. And, more subtle differences in seasonality got lost.

In rectangular form, it’s hard to tell the difference between the seasonality and volume of spontaneous births (on left) compared to induced births (on right).

As radials, the pattern for spontaneous births shown on left looks much more obviously different than the induced births on right. The “spontaneous” births roughly match daylight, while the induced peak is obviously shifted around the circle to the late afternoon.

In discussing how to deal with these issues, we realized that this wasn’t just a visual issue. It was also about the story in the data itself. The story we most wanted to reveal in this data wasn’t a story about the raw number of babies being born at any given minute.

Rather, the story was bout how the number of babies born compared to typical. Was it a dip or a spike? More or less than usual?

The insight about visual form came from trying to best match the form to the aspects of the data we most wanted to share. Sketching on paper, I suggested trying comparing to average. Nadieh gave it a shot, and we both liked it.

Instead of a line representing the distance from the center (number of babies per minute) we switched to an area chart representing the percent difference in number of babies born per minute compared to average.

Aligning our visual representation with the story we wanted to tell solved for both the visual issues of “losing track of the center” and “how do we see dips.”

Granted, a rectangular area chart comparing the peaks/dips to average would have also solved for both these problems too.

However, the circle had three major benefits over a rectangular form. 1 — The radial form emphasizes small shifts in what time a peak or dip occurs, since a change in time corresponds to a change in angle. 2 — It’s a very compact form, and reads as a cohesive shape that I think enables comparison. 3 — in a rectangular chart showing a cyclical pattern, the impression of the chart is heavily influenced by where you (arbitrarily) break the cycle.

Cracking the circle open

Another issue with circles is that there is no start or end. It’s not obvious where the reader should start “reading” the visualization. It’s also unclear where to put the annotation for the grid lines, since there is no “left side” of the chart.

Nadieh solves for this beautifully by “cracking” the circle at the top, a technique she often uses when creating radial designs. I loved it, writing in an email that “the gap at midnight is great — I think it’s small enough that it doesn’t break the circle too much, but large enough to provide room for the annotation. And, it gives the eye a nice place to start.”

Hour of Day or Day of Week? Neither!

We knew early on that we would focus on minute of day and week of year. But, the third chart was unclear. Hour of day would just repeat the minute of day story, but not tell it as well. And, day of week only had 7 sparse data points.

Nadieh tested a mixed chart, that had a curve for each day’s average value per-hour as a baseline along with scattered points for each hours value. But, it just didn’t click.

Finally, we decided to go with an unusual metric of “hour-of-week.” With 168 points per week there was enough data density to show clean patterns.

As you can see in the final chart, rather than feeling too repetitive, the hours-of-week supplemented the minute of day charts. They showed that those peaks were a weekday effect since the missing Saturday and Sunday peaks jumped out in the hour-of-week chart.

Notice that the whole of Saturday and Sunday are blue/greens, without the bright peaks of the weekdays

Highest peaks breaking the frame

This is a small point, but something that I really enjoyed having the chance to incorporate into this viz.

One of my favorite visualizations of all time is this chart of the prices of cotton in New York in the 1800’s from the US Statistical Atlas published in 1883. There was an obvious challenge: how do you show both the moderate variation in the pre/post-Civil War price of cotton in New York on the same graph as you show the absolutely massive spike in prices during the Civil War? I love the audacity of their answer: create a reasonable frame for the pre/post period and then break it. Oh, and don’t worry about the fact that your spike is not piercing through Chicago in a totally different map.

From the Rumsey Map Collection —Scribner’s statistical atlas of the United States based data from the 10th US Census

I love how breaking the frame helped contextualize local variation while also emphasizing the unusualness of the chart.

The minutely baby data had similar characteristics: a big spike along with smaller variation throughout the rest of the day. Therefore, I was psyched when Nadieh agreed with my proposal that the AM spike would break the frame in the overall minute-per-day and C-section charts. She executed this beautifully!

Breaking the frame!

Being boring IS the story: 0% is the center of the circle

One surprising thing is how much more “boring” the final week-of-year chart is in comparison to the others. This isn’t an accident, but is part of the story. Yes, September is more common than January. But, these variations pale in comparison to the difference between a Saturday evening and Monday morning, or between 6am and 8am on a weekday. In all three charts, we pinned the center to 0 and the average to the same radius. Therefore a 5% change in one chart is equivalent to a 5% change in another, making them comparable.

Jen, our wonderful graphics editor, helped us realize that we weren’t communicating this idea clearly in our drafts and suggested that a better legend might do the trick. More on that on story in Nadieh’s blog post.

We put the magnitude of the seasonality in context by pinning the center to 0, so that percent change is comparable across charts

4. Iteration, refinement, and editing with attention to impactful details

In this part of the creation process, Nadieh was the MVP and I played a supporting role.

There are so many exquisite, meaningful details! You can read about them in detail on her blog, but I’ll highlight a few of my favorites here as well.

Color gradient

As soon as she moved to D3, Nadieh added a color gradient to the area chart. And, soon this became a diverging orange to blue spectrum with a sharp cut-off at the yellow baseline splitting the two. The spectrum is perfect since it shows off both dips and peaks, is color-blind safe, and even subtly matches the diurnal cycle with the dips tending the blue in the night and peaks in the morning/day.

Gradient switching from the blue/greens of the nighttime dip to the bright reds/oranges of the morning peak

Trusting your gut

One of the most critical aspects of design was how to represent the distance from the average line. Should it be bars, or really bar-like slices? An area chart with a smooth gradient? Bars with a gradient? Or, finally, concentric circles creating discrete slices?

This wasn’t just about the design, but also about trusting your gut.

While experimenting, Nadieh wrote:

The bars … are getting really thin for the inner section and you might end up with a Moiré effect or something (I tried to counter that a little, by making the inner bars actually mini pie chart-like slices that become thinner the more they coming inward)

I really do like the areas, but the perfection of it is somehow bothering me a bit. Like it feels too polished, too slick, without character.

On left, the discrete concentric circles used in the major charts. On right, the smooth, continuous gradient used in the smaller more “sprite-like” charts by delivery method

A few iterations later we’d thrown out the bars due to the pervasive Moiré effects that she’d anticipated. Instead, in a middle of the night jetlegged empiphany, she’d come up with an approach that essentially discretized the area chart into concentric circles.

We debated back and forth what made the most sense, and for which charts. I reflected back that:

Discrete: good for comparing/exploring/seeing different parts *within* a chart. You can see how the curve of the data cuts into the concentric colored circles and see the shape of each concentric ring. And, it’s easier to get a sense of distance from the average line by the number of rings, which is especially helpful for comparing the magnitude of dips to bumps.

Smoothed: good for comparing *across* charts, because you see each chart as a singular shape. Also, within a single chart, gives more of an impression of the pattern as a whole (in contrast to comparing a particular peak to another peak or dip).

For the larger charts, we want the viewer to do some comparison across charts. But, we also want them to appreciate & explore a lot of what is happening within each chart. For the smallest charts, we want the viewer to primarily be comparing across charts — to see how different the shape is for induced, c-section, and natural. And, to get a sense of the overall shape for each chart itself. We wouldn’t expect them to be asking more detailed questions about how big is the 8:30am peak vs 9pm dip for c-sections.

… I propose smoothed for the small delivery method charts. I am torn between discrete & smooth for the large ones…What do you think? Do you have a preference between smooth color gradient and discrete color gradient? For all charts? For big? For small?

In reply, from Nadieh to me:

About the gradient/concentric circles. I like the circles better for two reasons, one is exactly what you describe, it helps to “fix” the major problem in circle visuals that it is hard to compare height differences along the angles. My second reason is completely emotional, like I said earlier this week, I actually didn’t like the smooth/perfectness of the gradient. It just felt like “too standard” of a design, not unique enough in a way. And I had some difficulty figuring out what else to do, and then I tried the concentric circles and that completely solved that feeling of “wrongness” for me on the design level.

And — that was the answer. We went with discretized concentric circles for the 3 main plots and a smoother gradient for the smaller ones.

It was exactly the right choice, and came from listening to a feeling of “wrongness.” This led Nadieh both to persevere to come up with the winning visual form, and to understand why it was the right answer.

A few more details: dots, dashes, smoothing, and the legend

Nadieh put the labels inside the ring, with subtle tiny breaks in the line to separate each hour while soft lightly dashed arcs to provide a background grid.

Jen, our visual editor from Scientific American, added the dot to start each month. This created a nice, subtle anchor.

Note the dots to start each month arc, tiny gap between months, subtle dotted line on the axis, and annotations in the gap

Nadieh introduced loess smoothing to the viz, which was critical for keeping near-minutely granularity without getting distracted by jaggedness. This, combined with the per-minute dots showing the exact values, created a lovely balance of exact detail and overview while maintaining as much detail as possible. It built off of the strengths of my very first charts presented at OpenVis, showing the detail of each of the 1440 minutely data points while putting those dots into a smoother, still-granular context. The only chart we showed the detailed dots on was the minute overview, where they looked almost like a dusting of snow on the chart. For the rest, we stuck just with the smoothed gradient. It just looked right that way.

Dots showing exact datapoints coupled with a smoothed Loess curve

5. Writing the blog post

There wasn’t enough room on the printed page for all 6 charts, so on our visual editor Jen’s recommendation we focussed on the 3 levels of seasonality there. And, I was honored that Jen invited me to write an article on Scientific American’s blog going into more detail both visually and textually!

It was especially fun getting to show off how the same visual form could be quite expressive for these more sprite-like delivery method charts as well.

In conclusion…

While the results of these design decisions are all visible in the final product, they were made possible by a few important invisible traits.

These included:

*A low bar to trying out ideas

*Trusting gut feelings, even before we could explain them or had an alternative design

*Open to changing our minds

*Attention to visual detail

*Focus on the data and drive the design from that

*Good communication, even as (especially as) ideas weren’t yet fully formed

*A great visual editor!

Thanks to Nadieh and Jen for a great collaboration, and to Amanda for getting this project off the ground in the first place!

If you want to read more, check out Nadieh’s blog for more of the story behind the viz!

Appendix

Why C-Sections?

A lot has been written on this topic in the past few years, including articles in St. Louis Post Dispatch, NYT, Consumer Reports, Kaiser showing the effects of a different payment model, San Diego Tribune, Sacramento Bee reporting C-section rates by hospital in California ranged from 15% to 64%, the LA Times reporting 2014 rates in California hospitals ranging from 12% to 70%, and a report from the Pacific Business Group on Health. Looking beyond the US, C-sections are even more common in Brazil as reported by the Atlantic.

Part II: One set of data, many stories

Zan — Thu, 15 Jun 2017 19:37:48 GMT

Or, why a dual y-axis chart is not a normalized delta chart

In my original post, One Set of Data, Many Stories, I wrote about how I found a particular dual y-axis chart misleading. The core problem was that it had two y-axis for the same metric, with a different scale for each axis.

original chart from appendix of Brooking’s article

Isn’t it just a delta chart?

Elijah proposed that the problem isn’t about dual y-axis since I could have made it into a “a single axis chart by plotting ‘delta in mortality since 2000’.”

I like framing this in terms of deltas explicitly, because the point they seem to be trying to make with the chart is about change compared to an earlier point in time.

I agree that delta charts and dual y-axis charts are perceptually similar when the scales for the y-axis are the same but the baselines differ in order to pin both lines to a shared reference point.

However, if the y-axis scales are different, then a dual y-axis chart can be perceptually quite different from a delta chart.

Let’s take a look

In the original article, I found very similar data from the CDC and remade the chart as closely as I could. The exact data is slightly different than the original, as it’s for a 10 year age range rather than 5 year. But, it tells the same story.

Because R doesn’t support dual y-axis charts, I’ve made them side-by-side.

I’d argue that side-by-side charts are slightly better than dual-axis because it implies that there is some difference between the charts. However, these charts still share the fundamental problem of the original: implying a direct comparison between two charts which have different y-axis scales for the same type of data. In both side-by-side and dual y-axis views, it appears that mortality rates for whites and blacks saw similar declines through the early 2000’s and the diverged since ~2009 or ~2011.

The fundamental problem isn’t actually that it’s two y-axis on the same chart. Rather, the fundamental problem is visually equating two y-axis that have different scales for the same metric.

Here are the same two charts, but using the same y-axis scale for both.

Now, here is the suggested delta chart, showing difference in mortality compared to each line’s 1999 value.

So, yes — the delta chart and the dual y-axis with shared scale do look essentially the same.

But, they look very different from the original chart.

Percent change

It might be better to compare percent differences rather than absolute differences in this case. Dropping from 50 to 45 deaths per 100,000 people might be more significant than dropping from 150 to 145, it’s a larger drop as a percent of mortality rate.

The red lines look the same, as the scale is determined by the min/max which both come from the red line. The blue line, however, shows greater variation in the percent change chart than the difference one.

This is no longer equivalent to a dual y-axis chart with shared y-axis scale.

But, I think that’s a good thing because the goal is to compare later values to the earlier values — and the percent change better captures the meaningful comparison.

Interestingly, you could get a similar visual effect by using dual axis. This does mean that the y-axis have different scales and different baselines (to align the starting points). In this case, the scale is determined by setting a 5% change from the 1999 value, or a difference of 8.72 for red vs 2.8 for the blue, to be equal. The baseline is then determined by aligning the starting values to be the same distance from the bottom of the chart.

Y-axis scale for each chart chosen so that percent change is comparable between charts. Baseline floating as well.

While possible, I think this approach is quite problematic. It’s not clear from the charts *why* the y-scales and baseline were chosen. There is no clue that percent change is even part of the story, much less driving the scale and baseline. These decisions appear arbitrary and changeable, even though they were determined by the data.

In contrast, in the chart where both lines use the same percent change y-axis, it is explicit that percent change is driving the both the content and form of the visualization.

Focus on the story: Percent change since 2009

Arguably the point of the original chart was to highlight the divergence since 2009, not since 1999. Adam Pearce pointed out that this could have been achieved by using a delta chart that both pinned to the 2009 data and focused only on the data from 2009 onward. In this case, I used percent change since 2009. This makes the implicit comparison to a 2009 explicit.

This emphasizes that mortality rates have declined for blacks/African Americans while they have risen for whites relative to the 2009 data.

This chart doesn’t tell the whole story, of course. No chart does. And, as with any chart, it is worth questioning the extent this story is meaningful without the larger context. That said, I do think this chart tells a succinct story that is supported by the data.

Moreover, not only does it avoid the same problems of the original chart, but the choice of form reflects the story itself. This is story about change relative to a point in time. The chart’s form matches the story, and reflects the data.

My hope is that this chart would create a better shared understanding of exactly what aspects of the data we’re focussed on, and thereby create a better conversation about what we should learn or do based on this data.

In Conclusion

A dual y-axis chart is only similar to a normalized delta chart if the y-axis scales are the same
Splitting a dual y-axis chart into two side-by-side charts doesn’t fundamentally solve the problem of having two y-axis with different scales
Choosing a chart form that is more appropriate to the story in the data is not just about avoiding “breaking a data visualization rule”, but can also be more effective in focussing on and telling the intended story.

Thank you to Elijah & Adam for the conversation that led to this post!

One set of data, many stories

Zan — Mon, 03 Apr 2017 06:37:56 GMT

In March 2017, Brookings Institute put out a paper by Case & Deaton on Mortality and Morbidity in the 21st Century.

This report got picked up in the news.

Google News results for “Case andDenton”

It also inspired a number of blog posts questioning or supporting the findings including from Gelman (more questioning) and Noah Smith (more supporting). Gelman had also questioned parts of their related 2015 article. That said, he “was in agreement with Case and Deaton’s main point, even if I thought they were wrong about the direction of the trend and I was skeptical about their comparisons of different education level.” Instead, he argued that “The news media — left, right, and center — had a pre-existing narrative of middle-aged white malaise, and they slotted the Case and Deaton reports into that narrative.”

Some aspects of the article are certainly concerning. For example, they point out that deaths due to drugs alcohol and suicide are increasing in the US for men & women aged 50–54. This is not the case in comparable countries.

However, they present other charts that I found quite questionable. Specifically, in the Appendix they point to heart disease mortality rates for women aged 50–54.

Upon first glance, my immediate take away is that white mortality rates (in blue) have surpassed black (in red), rising starkly in recent years.

This is not true. Digging in, you might notice that there are two different y-axis being used for the same mortality metric. For whites, the axis goes from 5o-58 deaths per 100K. On the right, for blacks, the y-axis ranges from 115 to 165 deaths per 100K. So, in 2015, the red line for blacks ends in the bottom right corner at 115 deaths per 100K while the blue line for whites ends in the upper right corner representing a much smaller 56 deaths per 100K population.

You might argue that the trend comparison is what is important, and these axis allow this comparison. I question that.

One of the best descriptions I’ve heard for data viz is that: when the data is different, the viz should look different and when the data is similar, the viz should look similar.

If you allow yourself to have two y-axis for the same metric, with both a different scale on each axis and a different base value, then you can make a lot of charts with the exact same data that look very different.

To replicate this, I found very similar data from the CDC. It is slightly different, as it’s for a 10 year age range rather than 5 year, but I think you’ll agree that it’s close enough. Data is here, if you want to play along at home.

I actually can’t use ggplot2 in R to plot two y-axis on the same chart, because Wickham believes “plots with separate y scales…are fundamentally flawed” -stack overflow. So, I’ve plotted them side-by-side. This is for women aged 45–54 in the US, showing deaths due to heart disease per 100K people.

Let’s look at this same data in some other forms. How about with less extreme axis?

Or, with the same y-axis?

What if it included zero also?

Maybe even a higher max y value?

Do you get the same take-away from each of these charts? Or does your experience and impression change based on the different y-axis? Which one is “right”? Keep in mind that these charts all show EXACTLY the same data.

Here is another view, with the two categories on the same chart, sharing a y-axis.

While there might be times where you want to have a different y-axis, perhaps to normalize the data in some way, it should only be done with caution and with thoughtfulness (and explanation) for why it is being done.

In all other cases, be wary of how your choices for axis may impact the interpretation of the chart.

[Edit: keep reading in Part II — all about how delta charts relate to dual axis]

The Shapes of Emotions

Zan — Wed, 27 Jul 2016 19:38:47 GMT

Getting design feedback on the Atlas of Emotions from the Dalai Lama

One of the core visualizations in the Atlas of Emotions shows the range of states of each of the 5 emotions (anger, fear, disgust, sadness, and enjoyment). For example, annoyance, argumentativeness, and fury are all states of anger. But, the states vary quite a bit in terms of their intensity: annoyance is relatively mild, fury is always highly intense, and argumentativeness can be mild, intense, or anything in between. The states of each emotion are shown in a graph. For each emotion, there is a shape, color, and animation that is specific for that emotion’s states.

Designed shapes for each set of emotional states

While many types of visualizations strive to be easy to interpret, the “charts” representing the states of each of the 5 emotions goes further. The reader should get an impression of emotion and intensity of emotional states without ever really consciously interpreting it.

Different states of anger, like fury and annoyance, are represented by red triangles. In contrast, states of fear are purple faceted waves which “feel” very different than sadness’ soft, heavy blue bubbles or disgust’s green glops.

Additionally, the fact that fury is distinct and pointy and set far to the right should give a sense of high intensity compared to annoyance’s smallness.

“Annoyance” is low intensity compared to “fury’s” high intensity

The Atlas of Emotions was a wonderfully collaborative project. In Embracing the Abstract in Data Viz, Nicolette Hayes describes her iterative process designing these shapes. Beyond the original design, defining and creating these shapes in the browser offered some interesting technical challenges, especially since they needed to animate and fit a full spectrum of aspect ratios. Throughout the rest of this article I’ll share some of the visual effects that helped give this intuitive sense of an emotion, the ways I addressed various challenges, and what we learned in that process.

Encoding intensity

For all 5 emotions, each shape is simply a variation of a theoretical triangle in which the corners mark 3 meaningful points. The left and right corners of the base show the minimum and maximum intensity of that emotion, while the height shows the average intensity. For example, an intense state of disgust, like loathing, is very tall and far to the right. In contrast, dislike is of mild to middle intensity so is shown on the left side of the graph, with a wide base, and isn’t very high. These aren’t meant to be exact quantified values, but indicative of the differing intensity of different states of an emotion.

The 7 states of Disgust, each state represented by a green shape indicating the intensity

In addition to position, the increasing intensity from left to right is reinforced in the color gradient from left to right within each state shape.

In the shape of the “amusement” state of enjoyment, the color intensifies from yellow at the left to a rich orange/red

Creating a shape that is a feeling

While intensity is encoded the same way in each emotion, the shape, animation, and color are designed to be different — to match the emotion that they embody.

How do you make a shape that gives a qualitative sense of an emotion? How do you create a shape for fear?

Moreover, because of the entering animation and varied aspect ratios, the shapes had to be automatically defined based on the 3 “corners” of the underlying defining triangle. They couldn’t be brought in as a single image or hardcoded in some simple way.

A core technical challenge

Challenge: Start with 3 coordinate points and use those to define a path (with math) to create a shape that has a feeling.

When I started on the project, Nicolette had already created a set of amazing designs in illustrator and Eric had implemented a working version for each in D3. However, for several of the shapes, we wanted to go further — to get closer to Nicolette’s design and create a shape more specifically designed to match that emotion.

Fear

Nicolette’s original design for fear’s states had hard edges with varied angles which created a sort-of faceted wave shape.

Original design for fear

In D3, Eric defined the control points for bezier curves to capture the wave form of the shape. He directly constructed the path string based on calculating the values for each control point.

original implementation

But, they were still too… beautiful. They literally didn’t have the ‘edginess’ we were hoping for. He left a note in the code:

// TODO: degenerate per “Roughen” Illustrator effect (or similar)

I looked at roughen, but it wasn’t clear what math was behind the effect. Instead, to get closer to the design ideal and get more of a sense of the angularity of fear, I decided to facet the edges. My goal was to keep the overall shape, but break the smooth curves up into a set of line segments.

In short, I wanted to define a point ~30% and ~60% of the way along the curve, and then draw straight lines between those points. To do this, I needed to get the x-y coordinates of the points on this path.

Sketch of relationship between smooth curve and faceted edges

But, how do I calculate the x and y values for those points?

One option was to construct a curve, and then using PointAlongPath to get these points. There was a few reasons why this wasn’t ideal, especially with the animation. Instead I wanted to calculate the points directly if possible.

To do this, I started with the original control points that Eric used. Then used the actual bezier function, as defined on Wikipedia.

Jason Davies has a nice animated illustration of how this works. A quadratic bezier is defined by 3 points: P0, P1, and P2. Additionally, these functions are “parametric”, so they’re defined by a variable “t”. Each value of “t” between 0 and 1 represents a point along the resulting curve. To construct the whole curve, you calculate the value of B(t) for all values of t between 0 and 1.

Based on how these curves are defined, when t=0.3, it doesn’t actually mean that the resulting point is 30% of the way along the curve. It turns out that finding the point that is exactly 30% of the distance along the path takes a lot of calculation which can be difficult and slow.

However, I didn’t need mathematical exactness in this situation. Close was close enough. So, I simply input specific values for t to find points that were part of the way along the left and right curves, which made for a very fast calculation. Then I adjusted to find values of t that looked “right.” In the end .4 and .76 worked well on the left, and .1 and .6 for the right side.

Then I stitched the resulting points together to define the path, with straight lines between the points.

The crux of this solution was:

(1) realizing that I could just use the original mathematical function, rather than the browsers implementation of bezier curves

(2) mathematical exactness wasn’t required in constructing these faceted edges.

Final result: wave form with concave faceted edges

The original design had some convex facets, as well as concave. My initial experiments with this were unsatisfactory, and we were satisfied with this expression of fear. So, it was time to move on to disgust.

Disgust

Disgust was designed to have uncomfortably uneven edges.

Designed images for disgust

Eric had implemented a placeholder shape for disgust, but knew we wanted something less clean and more … disgusting.

Placeholder for disgust shapes

To get this sense of uncomfortable unevenness, I started with a triangle, found a bunch of points randomly dispersed along the edge, offset these points from the edge perpendicularly, and then drew a smooth curve between the new points.

In this method, there were a number of parameters, or “levers”, I could pull to tweak the feeling of the shape. Things like the number of points along the edge, how far offset they were, how much I varied how I placed the points and offset, how I interpolated between the points, etc.

This gave a lot of ways to adjust the shape: from my first attempt, through iteration, to something we liked, to going too far… and finally deciding on a great set of input parameters. The best way to see this iteration is to follow the conversation Nicolette and I had.

First attempt

One key to disgust was creating a system where I could easily tweak parameters to change the feeling. The other was knowing when to stop.

Enjoyment

Eric had originally implemented a beautiful set of bulbous enjoyment shapes.

The only problem was that they were so big and there were so many states of happiness, that they overlapped each other on the chart to become sort of a blur. It was also hard to match the shape to it’s label.

Therefore, Nicolette designed an alternative shape in Illustrator, which I implemented in SVG.

Both implementations were based on using the 3 original points to defining some control points, thereby creating the path string for the shape.

The main difference was: (1) the position of the control points and (2) that I repeated the middle point in order to get the point.

path construction for bulbous enjoyment states

path construction points for pointy enjoyment states

To figure out where I should place the control points, I looked over Nicolette’s shoulder. She literally showed me where they were located in her version in Illustrator, and I generalized this.

The key, in this case, was respecting the fact that just beneath the surface of beautiful illustrations in Illustrator is a whole lot of math. Seeing how the shapes were constructed in one tool showed me how I should define them in SVG. It’s also interesting that two fairly different shapes could be created from very similar code, just by adjusting the control points directly and repeating the middle anchor point.

Final Thoughts

One of the most wonderful parts of the javascript package D3, and the web in general, is that the math is close to the surface. So you can literally draw with math. This may sound quite abstract or un-feeling, but in this case it was the key to creating shapes that conveyed an intuitive sense of a feeling.

For me, it was also a creative exercise, to think about what the crux of the challenge was and what abstraction best defined each shape that Nicolette had designed. And, how could I generalize that abstraction.

Additionally, in some cases, like fear, being imprecise was no problem, while for disgust small changes in parameters meaningfully changed the “feel” of the shape.

Lastly, while Adobe Illustrator is generally thought of as a tool for designers it actually exposes a lot of the math behind the scenes. For example, by showing the actual control points for bezier curves. So, the design itself can be a great starting point for writing generalizable code.

The shapes of emotions weren’t the only part of this project in which the design challenge and technical challenges were intertwined. To find out more, read Eric Socolofsky’s Finding Calm in the Atlas of Emotions.

The Shapes of Emotions was originally published in Hi.Stamen on Medium, where people are continuing the conversation by highlighting and responding to this story.

Why choose? Scrollytelling & Steppers

Zan — Tue, 07 Jun 2016 17:48:37 GMT

“Scrollytelling” is an online storytelling technique in which more and more content is revealed as the user scrolls down the page.

“Steppers” are also a storytelling technique, especially for stories based on a data visualization, in which the user clicks from step to step to see the story develop.

Scrollytelling Example

Scrollytelling exemplified by Tony Chu’s Let’s Free Congress

Stepper Example

Numeric Stepper (1,2,3…) from New York Times

I’ve been hearing a lot of people talking about “scrollytelling vs steppers” recently, sparked in part by Robert Kosara’s blog post “The Scrollyteling Scourge.”

Like the “which visualization is best” and the “are pie charts really evil” debates, these types of questions don’t really make sense to me.

It’s like asking “which is the best tool: a hammer or a wrench?” There is no way to answer that question unless you know what the person is trying to do. Or, it’s like asking “which is best: blue or orange?” You also can’t answer that question without context, and even then it might still be related to personal preference.

The choices we make in data visualization and storytelling depend on so many things. It depends on your data. It depends on the crux of the meaning in the data and the story that you want to share. It depends on your audience. It depends on how you expect your audience to consume what you’ve created. It depends on your goal. It definitely depends on context. It depends on if it’s a one-off, or a tool that people will use again and again. And, it even depends on aesthetics and what the experience feels like. It just depends.

In short, like everything else in data viz, it’s not the one tool is better/worse. Rather it depends on finding the best technique for the situation and being cognizant of potential strengths/weaknesses. This awareness can help us decide when to use one technique or another. Moreover, if you choose one, you could add other affordances to make up for its weaknesses.

Strengths and Weaknesses

context vs lightweight progression

Steppers provides very clear context and very explicit navigation.

Scrollytelling makes it easy to keep going in a very lightweight seamless way, without having to decide to click… and click again… and click again. Alberto Cairo refers to this downside of stepper-style interaction in his post “That Time When I Made Readers Click 50 Times in an Infographic.”

In a recent conversation, Shirley Wu and Miles McCrocklin summarized it this way: “scrolling is valuable because it’s a light, relatively frictionless way for a user to progress through content, while stepping is valuable because it gives context to the user of where they are and out of how many.”

More than just ease, scrolling provides continuity. There is a fantastic talk about the discovery that infinite scroll was a bad idea for Etsy. Having infinite scroll instead of pagination led to fewer clicks, fewer items “favorited”, and fewer purchases. No fun for Etsy, and possibly not as satisfying for users either. There are a lot of theories for why this is, but my favorite is that it’s hard to stop and take action when there is always something more. The end of a page or a stepper creates a point of decisions: do I click to the next set of pages? Or, do I take some other action like clicking, purchasing or favoriting?

Unlike Etsy, the goal in data visualization is often to keep the user engaged throughout. There is rarely a call to action, or it is a single call to action at the end, as in Let’s Free Congress. In these cases, there is value in the continuity of scrolling versus coming to a point of decision to either click ‘next’ on the stepper or do something else.

You could imagine other visualization scenarios in which the reader has a strong incentive to go through all the content, perhaps for a course. For these, perhaps having a purposefully moment to pause between steps would be an asset rather than a detriment.

who controls the pacing between steps

Secondly, steppers and scrollytelling differs based on who controls the timing/experience of the animation between steps.

On the one hand, steppers can offer very crisp/clean animation between steps because the movement between steps is discrete and is triggered. Similarly, the annotations can be very customized, because the steps are discrete. You know exactly what the user will see at each step, and there are no half steps. Or .87ths of steps.

While there are advantages to the creator having tight control over the animation between steps with a stepper, there are also advantages to giving the user tight control of the animation pacing via scrollytelling. Tony Chu’s A Visual Introduction to Machine Learning is a fantastic example of how valuable this can be.

Machine learning is complicated. And, as I’m walking through the explanation, I found myself often backing up slightly to slowly move through an animation. The animations meaningful, not decorative, and show how data is transitioning from chart type to chart type. By being able to effortlessly slow transitions down, speed them up, or pause, I can see (and feel) those context switches.

If scrolling controls the speed of the animation, a user can slow it down, speed it up, back up, and pause.

discrete vs continuous

Kosara argued that it’s bad practice to use “continuous scrolling through a story with discrete steps,” and pointed to this piece as a problematic example.

Whether this principle is right or wrong, I’d actually argue that A Visual Introduction to Machine Learning is not actually a set of discrete steps. Rather it is a continuous experience. The transitions are not the spaces between the steps, but are a meaningful part of the content.

There are other scrollytelling examples, like the New York Times’ price of Oil, in which the user controls when the animation is triggered via scrolling but not the timing of the animation. I love the immersive experience, but I actually wish that it behaved a bit more like the Machine Learning piece in which I could control the speed of the transition with the speed of my scroll.

what to do?

In terms of deciding which approach to use to drive an experience, you should think about what it’s going to feel like. Can you make it feel seamless and lightweight if it’s scrolling? Or make it easy to step if stepping? Do you want to give very clear and explicit context? Who should control the pacing? Are the transitions part of the content? Are you going to lose your users if you give them the choice to click ‘next’ or not?

Idea vs Execution

When evaluating the merits of scrollytelling, steppers, or other forms, we should separate the validity of the concept from the execution.

For example, scrollytelling can be hard to execute well. There is nuance in how to make the scroll feel fluid, be triggered, stop at reasonable spots, and feel natural. It’s also appealing to combine scrolling telling with computationally heavy animations or lots of large media images/videos. As discussed above, one of the greatest strengths of scrollytelling is how effortless it is. So, if it’s sluggish or unresponsive or confusing (“scroll-jacking”), then it’s no longer has the advantage of effortlessness.

For example, I love the idea of the Upshot’s How the Recession Reshaped the Economy, in 255 Charts. But, on my computer it feels sluggish and “heavy” which breaks the “easy” or “light” feeling that scrolling provides. But, if it felt as weightless as Tony’s Let’s Free Congress, then I think it would be delightful.

Scroll-jacking is also a challenge, as the browser might seem to take control of the experience away from the user which can feel jarring. Ironically, when I opened this article on Scrolljacking and Accessibility the text moved around several times before it settled. First a big ad opened on top, and then another big visual thing appeared from the bottom. It was the kind of thing in which I literally took my hands away from the keyboard while waiting for it all to settle down.

Maybe the key question is who should have the control over pacing/position, and if the form matches that.

For steppers, a downside can be the “cost” of clicking. But, based on execution, this can be very central and natural to the experience. In others, you might have to move your mouse or be tempted to scroll to written content below instead of stepping through.

Do I have to choose?

In a recent conversation, Kennedy Elliot said “throughout scrollytelling vs stepper discussions, I often wonder if there are ways to further hybridize the two.”

Looking around the internet, I found a number of examples in which people mash-up these two basic concepts, or address similar needs/challenges in other ways. In particular, there were a number of examples of adding context to scrollytelling or brining the stepper more into the center of the experience. But, there is likely so much more we could do.

Here are just a few interesting adaptations.

scrollytelling with chapters

To break up a very long article into manageable scroll-able meaningful chunks, Sarah Slobin used scrollytelling, but with chapters, in this Wall Street Journal long form article.

Trials from Wall Street Journal

The footer is also a photo-based “stepper” or table of contents, with also a next/previous arrow showing previous and upcoming chapters.

Trials visual “Table of Contents”

Off-topic, but I also just love Sarah’s handwritten style font. The story is so much about the mix of real life children and hard core science/medicine, and it’s interesting to see this reflected in the design.

scrollytelling with a progress bar

Another Wall Street Journal article by Sarah heavily uses scrolling to tell the story, but provides a subtle black progress bar at the top to show you how far you’ve gone.

WSJ: In Shattered Syria War Divides Neighbors

It also has steppers within parts of the multi-media article. Not sure it works totally seamlessly (at least it got a little sluggish on my computer), but it’s an interesting idea.

small multiples for static “stepping”

In print, there is no option for scrolling or stepping. But, Hannah Fairfield used annotated small multiples to step the reader through different parts of the story, in a sort-of small multiple time line.

For print: small multiples step you through the story in Hannah Fairfield’s Driving Shifts Into Reverse

forward/backwards stepper embedded

While many steppers are outside the visualization, this one is embedded. It only offers forwards and backwards, with a progress bar, so is a bit of a hybrid between a stepper and scrollytelling since you can’t skip steps.

Embedded Stepper from Vox

scrolling with stepper dots for context

This New York Times article on helmets and motorcycle fatalities is navigated by scrolling, with a vertical indicator to show how far you’ve gone.

New York Times: Fewer Helmets, More Deaths

stepping down or sideways

In Stamen’s Atlas of Emotions (which I worked on), you can step “down” to the next concept … or side to side to get to a different emotion.

a video moves you through the story, but gives a chance to interact

In The Fallen of World War II, the video moves you through the story instead of having to scroll or click. But, it also pauses to give you a chance to interact on several occasions.

Fallen animation

So… How Do You Do It?

Fortunately, whether you want to make a stepper or do scrollytelling, Jim Vallandingham has a tutorial to teach you how to do it: Steps for Building a Stepper Visualization & So You Want To Build a Scroller(OpenVisConf 2015 video here). Tony has also built a bl.ock example to share/explain his technique.

Want more?

Bostock’s How To Scroll

Adam Pearce’s graph-scroll.js intro

Amanda Cox’s 2012 Eyeo Talk including discussing steppers

Scrollytelling Examples, collected by Jim

PS. Thank you to Kerry Rodden for inspiring me to write this!

Exploring the Amazon with Code and Data

Zan — Wed, 06 Apr 2016 01:02:33 GMT

A project with Stamen and National Geographic

As a child, I dreamed of being a National Geographic photographer. What could be better than going exploring to find just the right perspective to help everyone appreciate and better understand this amazing world we call home.

I never expected that I would partially realize this dream in a completely different way. Instead of a camera’s lens, my tools included code, design, maps, and data. My first project with Stamen was creating an interactive page where users would compare and contrast maps showing various types of human impact across the Amazon Basin. In this way, I collaborated with National Geographic to find the right perspective to help people to explore data to appreciate and understand a very special part of our world.

Interactive “Explore and Compare” section, part of the Amazon Under Threat project

From both a technical and design perspective, it was an interesting challenge. The goal was to add functionality on a project that had already launched, which brings in a different set of constraints and challenges than starting from scratch.

In this beautiful project, viewers are introduced to the Amazon’s water cycle, forest types, forest strata, and varied human impacts. This last section walks them through a set of detailed maps, each highlighting a different type of human impact on the Amazon Basin. For example, the page shown here highlights mineral resources throughout Amazonia.

Mineral Resources in the Amazon Basin

National Geographic is well known for their map-making, and with good reason. These are wonderfully crafted maps, with careful attention to details about labeling, color, and background context including terrain and rivers. The project was published both online and in print, with shared maps and design between the two formats and between the two collaborating teams (Stamen and National Geographic).

My challenge was to create an interactive page in which users could compare and contrast any collection of these maps overlaid on each other. On the one hand, this isn’t a hard problem. There are plenty of maps in which you can toggle layers on and off. On the other hand, there was so much rich information to show and so many colors that had never been designed to be seen together. How could I keep it from being a visual cacophony?

I had the additional constraint of needing this new page to feel like a seamless part of the already published project, that was both beautifully and thoughtfully designed.

Constraints create good challenges.

My first goal was to identify what “levers” I could pull within the existing code and design framework. I felt like I was on Iron Chef: I couldn’t choose my ingredients but if I could find the right way to use them together it would turn out great.

One major goal was to focus the user on the foreground: the types of human impact data that they would compare and contrast. To do this, I wanted to pull the wonderful rich, contextual parts of the map like terrain and labels into the background without losing that context entirely.

I explored applying SVG filters to the HTML images the make up the map, in order to shift the green in the terrain layers to a more backgrounded grey or sepia. This proved problematic for several reasons. On a technical level, SVG filters are only supported for SVG elements, and not for HTML images, in Internet Explorer. Moreover, this diverged too much from the original design and feel of the project.

Instead, I achieved this objective by creating a slider to control a set of parameters that together brought these layers into the background or to the foreground. Most obviously, the slider controlled the opacity of the terrain base layer and was preset to be mostly opaque when the user first visited. As the slider is increased to full opacity, the terrain becomes an increasingly stronger green, slowly rivers are introduced, and then labels and country borders fade in.

The second challenge was around the selection interface, legend, and how to encourage thematically meaningful and visually compelling comparisons while also providing the viewers freedom to also compare across any types of human impact data.

We started with presenting a mostly blank slate for exploration, because viewers had already been introduced to each type of data they might select to compare. Therefore, viewers first see only the mostly-opaque context layers selected and a legend grouped by them. The legend is both informational and where viewers select data to view. Upon selection, the data appears on the map and the detailed legend is revealed.

We grouped legend items by theme, reflecting the established themes. Layers within a theme had been designed to be viewed together, so we also wanted to encourage those comparisons as they were likely to be meaningful and also visually harmonious. For example, viewing Fire, Deforestation, and Transport layers together shows that many areas that are affected by all three.

Comparing transport, deforestation, and fire throughout the Amazon

We might also notice, that in the center of Brazil, there is an area with a lot of fire where there are almost no roads. Interesting…

Notice the mostly east/west band of fire, with few contiguous roads

Bringing back in the context layers, and we can see that those fires track the Amazon and the river’s major tributaries. Perhaps this is because rivers also enable transportation, just in a different form than roads.

Adding context, shows the river’s path through these section

By focusing on the data comparisons first without the context layers, we can observe and explore. Inevitably this inspires questions. With the slider we bring back detailed context layers to help us begin to address our new questions, questions we might not have thought to ask if we were looking at the context and data layers all that the same time.

In addition to comparisons between themes, cross-theme comparisons can also be quite interesting. For example, in some areas it appears that transport (in red) and minerals (in purple) might be related but not in other areas.

Comparing human impact, or potential impact, across themes

Bringing back in the base layers, we can see that the area on the left where minerals and roads overlap is Ji-Paraná while south of Sinop they don’t. This doesn’t answer the question by itself, but could be the first step.

Adding context to the comparison

There were numerous other details that helped make everything feel more harmonious. For example, I set opacity levels and ordering separately for each data layer to reflect its color and visual dominance so they’d look best when combined. Also, including hash mark fragment identifiers to the URL to keep track of selections meant that if somebody found an interesting comparison they could share those same selections with a friend or colleague.

Most of all, while I’ve spoken mostly of the challenges, it was great to be adding to something that already had so much good thought and work put into it that I could build upon. I’m happy that a few larger decisions, and many tiny tweaks, were able to make this addition feel like a part of the whole while opening up new opportunities for exploration and comparison. It also helped make a dream come true for me to collaborate with National Geographic, an organization that has inspired me and so many others for over a hundred years.

A huge thanks to Nicolette for her designs and her advice throughout this project, to Seth, Alan, and Eric S for both technical expertise and valuable perspective, to the whole Stamen team for being so welcoming, and for Ryan at National Geographic for being such a great collaborator!

Check out the full project here.

Intro to the Amazonia Under Threat project, which the map comparisons are a part of

Exploring the Amazon with Code and Data was originally published in Hi.Stamen on Medium, where people are continuing the conversation by highlighting and responding to this story.

Trouble, cunning, and a fighting spirit — the key to visualizing data

Zan — Thu, 08 May 2014 22:42:50 GMT

treasures from the past: Anscombe

Today’s quote from the past: “The user is not showered with graphical displays. He can get them only with trouble, cunning, and a fighting spirit. It’s time that was changed.”

This is the last sentence of Anscombe’s 1973 paper which also presented this now classic image (often now shown in color, as in the title and on wikipedia).

From “Graphs in Statistical Analysis”: visualizing four datasets with identical summary stats reveals differences

All four datasets have the same major summary stats, including the same linear regression equation and same estimated error.

From “Graphs in Statistical Analysis”: the high level summary stats for the 4 graphs shown above are identical

Visually, however, the datasets are obviously quite different from each other. While these data sets are obviously contrived, this example illustrates the importance of looking beyond our summary stats to look at the data itself. This type of issue where summary stats don’t tell the full story can, and does, happen with “real” data.

40 years later we are lucky to have many more tools for graphical displays. Color is something we take for granted (and often overuse!), but must have been something Anscombe longed for. Graphs in spreadsheet tools like Google Spreadsheets or Excel are easily accessible to anyone working with data. Analysts can pivot/facet data into “small multiples” nearly instantly with ggplot in R. The New York Times and developers building with D3 are putting more and more complex data at the average person’s fingertips through their dynamic visualizations.

Yet, I’d argue that we still need that cunning and fighting spirit. While it’s easier to create graphs now, it’s also easier to create graphs thoughtlessly. Graphs are often a nice-to-have, something we’re asked to add to prove a point (“just give me a graph that shows X”), or are used as pseudo-look-up tables in dashboards. Often we use the easiest graph to create, which is not necessarily the one that we need. They aren’t a natural enough part of our analytic workflows. Nor are we as a society as used to questioning data/analysis the way we question written work. And, as the amount of data we create increases, the challenge of looking at the flood of data increases.

The fight now is less about being able to create a graph in the first place. Rather, now it is more about how we use graphs in analysis, how we question, and how we dig deeper. We need to keep building good visualization into our analysis process, coupled of course with robust statistics, and into how we communicate.

So, to those who work with data today, let’s keep up Anscombe’s fighting spirit and visualize our data in a way that will reveal what’s really happening under the surface.

Original 1973 paper here: Graphs.pdf